<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator>
  <link href="https://trino.io/feed.xml" rel="self" type="application/atom+xml" />
  <link href="https://trino.io/" rel="alternate" type="text/html" />
  <updated>2026-04-11T18:56:10+00:00</updated>
  <id>https://trino.io/feed.xml</id>

  <title>Trino RSS Feed</title>
  <description>This feed combines blog posts and Trino Community Broadcast episodes in one chronological feed.</description>

  <subtitle>Trino is a high performance, distributed SQL query engine for big data.</subtitle>
    <entry>
      <title>Introducing the NUMBER data type</title>
      <link href="https://trino.io/blog/2026/03/25/number-data-type.html" rel="alternate" type="text/html" title="Introducing the NUMBER data type" />
      <published>2026-03-25T00:00:00+00:00</published>
      <updated>2026-03-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2026/03/25/number-data-type</id>
      <content type="html" xml:base="https://trino.io/blog/2026/03/25/number-data-type.html">&lt;p&gt;One of Trino’s core strengths is breaking down data silos—enabling data
engineers to query diverse data sources through a single SQL interface. However,
when those sources use high-precision numeric types beyond Trino’s 38-digit
DECIMAL limit, that promise breaks down. Users faced an impossible choice: skip
the columns entirely and lose access to critical data, or accept lossy rounding
that compromises data integrity.&lt;/p&gt;

&lt;p&gt;This challenge required a new approach: a dedicated data type for high-precision,
variable-scale decimals.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Adding a new built-in data type to Trino is exceptionally rare. The last time we
introduced a new type was the UUID type in May 2019—nearly seven years ago.
Types are fundamental building blocks that touch many parts of the system, from
the type registry, through coercion rules to connectors, functions, and the protocol.
They require careful design and long-term commitment.&lt;/p&gt;

&lt;p&gt;With Trino 480, we’re excited to introduce the NUMBER type—a high-precision
decimal type that breaks down these data silos and enables seamless access to
numeric data across diverse database systems. This addition is particularly
powerful for data engineers working with Oracle, PostgreSQL, MySQL, MariaDB, and
SingleStore, which support numeric precision beyond the traditional 38-digit
DECIMAL limit.&lt;/p&gt;

&lt;p&gt;Let’s explore why NUMBER matters, how it works, and how it will simplify your
data integration workflows.&lt;/p&gt;

&lt;h2 id=&quot;the-challenge-precision-beyond-38-digits&quot;&gt;The challenge: precision beyond 38 digits&lt;/h2&gt;

&lt;p&gt;Trino’s DECIMAL type has long supported exact numeric values with precision up
to 38 decimal digits, which covers the vast majority of use cases. However,
many database systems support higher precision:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Oracle NUMBER&lt;/strong&gt;: when declared as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER(p, s)&lt;/code&gt;, precision must be in [1, 38] and
scale in [-84, 127]. When declared as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt; without precision/scale, each value
can have different scale, and actual precision can reach 40 decimal digits. Oracle can
store values from 10^-130 to (but not including) 10^126.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;PostgreSQL NUMERIC&lt;/strong&gt;: supports precision and scale in range from -1000 to 1000;
supports very high precision numbers with up to 131,072 digits before the decimal point.
When declared without precision/scale constraints, each value can have different scale.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;MySQL, MariaDB, SingleStore DECIMAL&lt;/strong&gt;: up to 65 digits of precision (scale 0-30)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before Trino 480, accessing these high-precision numeric columns required
choosing between two unsatisfying options:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Skip the columns entirely&lt;/strong&gt; and lose access to potentially critical data.
This was the default behavior.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Accept lossy conversions&lt;/strong&gt; - Use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping=ALLOW_OVERFLOW&lt;/code&gt; with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-default-scale=S&lt;/code&gt; to force values into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(38, S)&lt;/code&gt;, losing precision
through rounding and failing for numbers greater than or equal to 10^(38-S).
For example, with scale 10, values ≥ 10^28 would fail.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Neither option is ideal for data federation and warehousing scenarios where
preserving data fidelity is essential.&lt;/p&gt;

&lt;h2 id=&quot;enter-number-arbitrary-precision-decimals-in-trino&quot;&gt;Enter NUMBER: arbitrary-precision decimals in Trino&lt;/h2&gt;

&lt;p&gt;The NUMBER type solves this problem by supporting floating-point decimal numbers
of high precision and flexible scale. In practice, NUMBER supports values with
up to 200 digits of precision – far exceeding what most database workloads require.
Each value can have a different scale, allowing for values as small as 10^-16000
(or even smaller) and as large as 10^16000 (or even larger) within the same column.&lt;/p&gt;

&lt;p&gt;Here’s what NUMBER looks like in action:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- High-precision literal (50+ digits)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;3.1415926535897932384626433832795028841971693993751&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; 3.1415926535897932384626433832795028841971693993751
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Scientific notation with extreme precision&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;12345678901234567890123456789012345678901234567890e30&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; 1.234567890123456789012345678901234567890123456789E+79
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Verify the type&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; number
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;special-values&quot;&gt;Special values&lt;/h3&gt;

&lt;p&gt;NUMBER also supports special values similar to IEEE 754 floating-point types:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Infinity&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;positive_infinity&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;-Infinity&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;negative_infinity&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;NaN&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;not_a_number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; positive_infinity | negative_infinity | not_a_number
-------------------+-------------------+--------------
 +Infinity         | -Infinity         | NaN
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;These special values follow intuitive comparison and ordering semantics that
follow DOUBLE behavior. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; compares as inequal to all values, including
itself. Any comparison with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; returns false. When sorting, values are
ordered as follows: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Infinity&lt;/code&gt;, all finite values, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+Infinity&lt;/code&gt; followed by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The special values are particularly useful for handling edge cases in source data.
In particular, PostgreSQL’s NUMERIC type can represent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Infinity&lt;/code&gt;, and
these values are now seamlessly mapped to NUMBER when queried through the PostgreSQL
connector.&lt;/p&gt;

&lt;h2 id=&quot;seamless-connector-integration&quot;&gt;Seamless connector integration&lt;/h2&gt;

&lt;p&gt;The real power of NUMBER becomes apparent when querying external databases. Five
connectors now automatically map high-precision numeric types to NUMBER,
requiring &lt;strong&gt;no configuration changes&lt;/strong&gt;:&lt;/p&gt;

&lt;h3 id=&quot;oracle-connector&quot;&gt;Oracle connector&lt;/h3&gt;

&lt;p&gt;Oracle’s NUMBER type supports variable precision and scale. The Oracle connector
now maps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER(p, s)&lt;/code&gt; where p &amp;gt; 38 → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt; without precision/scale → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt; with extreme scale values → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Query an Oracle table with high-precision columns&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;order_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unit_price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;extended_price&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;extended_price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1000000000000000000000000&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;postgresql-connector&quot;&gt;PostgreSQL connector&lt;/h3&gt;

&lt;p&gt;PostgreSQL’s NUMERIC type supports very high precision and even “unconstrained”
precision. The connector automatically handles:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMERIC(p, s)&lt;/code&gt; where p &amp;gt; 38 → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMERIC&lt;/code&gt; without precision/scale → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Access PostgreSQL scientific data without precision loss&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;measurement_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;precise_value&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- a NUMERIC column&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lab&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;measurements&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;mysql-mariadb-and-singlestore-connectors&quot;&gt;MySQL, MariaDB, and SingleStore connectors&lt;/h3&gt;

&lt;p&gt;These MySQL-compatible databases support DECIMAL precision up to 65 digits. The
connectors now map:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(p, s)&lt;/code&gt; where p &amp;gt; 38 → Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NUMBER&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Join across different databases with high precision&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;account_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mysql_balance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle_balance&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mysql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;banking&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accounts&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;banking&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accounts&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;account_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;account_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;abs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;balance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;0.01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;backwards-compatibility-and-migration&quot;&gt;Backwards compatibility and migration&lt;/h2&gt;

&lt;p&gt;The NUMBER type integration is designed to be seamless and backward compatible:&lt;/p&gt;

&lt;h3 id=&quot;automatic-mapping&quot;&gt;Automatic mapping&lt;/h3&gt;

&lt;p&gt;If you previously relied on the default behavior (no &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping&lt;/code&gt;
configuration), your queries now automatically use NUMBER for high-precision
columns. No configuration changes needed.&lt;/p&gt;

&lt;h3 id=&quot;legacy-configurations-still-work&quot;&gt;Legacy configurations still work&lt;/h3&gt;

&lt;p&gt;If you explicitly configured &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping=ALLOW_OVERFLOW&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping=STRICT&lt;/code&gt;, your existing configuration continues to work. The
NUMBER mapping is disabled when these options are set, ensuring no surprises.&lt;/p&gt;

&lt;p&gt;However, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping&lt;/code&gt; configuration and related session properties
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal_mapping&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal_default_scale&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal_rounding_mode&lt;/code&gt;) are now
&lt;strong&gt;deprecated&lt;/strong&gt; and will be removed in a future Trino release. We recommend
migrating to NUMBER-based workflows:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (with lossy conversion):&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# catalog/postgresql.properties
&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;connection-url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;jdbc:postgresql://host:5432/database&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-user&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;user&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;password&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;decimal-mapping&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;ALLOW_OVERFLOW&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;decimal-default-scale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;10&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;decimal-rounding-mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;HALF_UP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;After (lossless with NUMBER):&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# catalog/postgresql.properties
&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;connection-url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;jdbc:postgresql://host:5432/database&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-user&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;user&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;password&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# No decimal-mapping needed - NUMBER is used automatically!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For Oracle, if you previously used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;oracle.number.rounding-mode&lt;/code&gt; to handle
high-precision NUMBER columns, you can now remove this configuration to enable
native NUMBER mapping.&lt;/p&gt;

&lt;h2 id=&quot;working-with-number&quot;&gt;Working with NUMBER&lt;/h2&gt;

&lt;h3 id=&quot;type-conversions&quot;&gt;Type conversions&lt;/h3&gt;

&lt;p&gt;NUMBER integrates naturally with Trino’s type system:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Convert from other numeric types&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;DECIMAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.45&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_decimal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12345&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;123&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; from_decimal | from_integer | from_double
--------------+--------------+-------------
 123.45       | 12345        | 123.45
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Convert NUMBER to other types&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DOUBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_double&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;123.456&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DECIMAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_decimal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; to_bigint | to_double | to_decimal
-----------+-----------+------------
 123       | 123.456   | 123.46
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;aggregate-functions&quot;&gt;Aggregate functions&lt;/h3&gt;

&lt;p&gt;Common aggregate functions work naturally with NUMBER:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Aggregate high-precision values&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;department&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;average_revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min_revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revenue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_revenue&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;department&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;creating-tables-with-number-columns&quot;&gt;Creating tables with NUMBER columns&lt;/h3&gt;

&lt;p&gt;The Oracle and PostgreSQL connectors support creating tables with NUMBER columns:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Create a PostgreSQL table with NUMBER column&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;measurements&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;precise_value&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Create an Oracle table with NUMBER column&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scientific_data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;experiment_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;measurement&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NUMBER&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;technical-characteristics-and-limitations&quot;&gt;Technical characteristics and limitations&lt;/h2&gt;

&lt;p&gt;While NUMBER provides high precision, it’s important to understand its
characteristics:&lt;/p&gt;

&lt;h3 id=&quot;precision-and-scale&quot;&gt;Precision and scale&lt;/h3&gt;

&lt;p&gt;Trino’s NUMBER type characteristics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Supported precision&lt;/strong&gt;: currently 200 decimal digits.
While we consider this an implementation detail that may change in future releases,
it is unlikely that maximum precision will be decreased.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scale range&lt;/strong&gt;: -16,384 to 16,383&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Variable scale&lt;/strong&gt;: each value can have a different scale, similar to
PostgreSQL NUMERIC and Oracle NUMBER&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Special values&lt;/strong&gt;: supports &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Infinity&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Infinity&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Comparison of decimal numeric types across database systems:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Database&lt;/th&gt;
      &lt;th&gt;Max Precision&lt;/th&gt;
      &lt;th&gt;Scale Range&lt;/th&gt;
      &lt;th&gt;Variable Scale&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Oracle NUMBER(p, s)&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;-84 to 127&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Oracle NUMBER&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;Approximately -130 to 126&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;PostgreSQL NUMERIC(p, s)&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;-1000 to 1000&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;PostgreSQL NUMERIC&lt;/td&gt;
      &lt;td&gt;131,072&lt;/td&gt;
      &lt;td&gt;-1000 to 1000&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;MySQL/MariaDB/SingleStore DECIMAL&lt;/td&gt;
      &lt;td&gt;65&lt;/td&gt;
      &lt;td&gt;0 to 30&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Trino DECIMAL&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;0 to 38&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Trino NUMBER&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;200&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-16,384 to 16,383&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;storage-and-representation&quot;&gt;Storage and representation&lt;/h3&gt;

&lt;p&gt;NUMBER uses a variable-width binary format optimized for flexibility:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;2-byte header encoding sign and scale&lt;/li&gt;
  &lt;li&gt;Variable-length magnitude in big-endian format&lt;/li&gt;
  &lt;li&gt;The binary format is considered unstable and may evolve in future releases to
enable optimizations and performance improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility allows Trino to improve NUMBER’s internal representation over
time without breaking connector compatibility.
Trino SPI provides a stable API for connectors to read and write NUMBER values,
abstracting away the internal format.&lt;/p&gt;

&lt;h3 id=&quot;performance-considerations&quot;&gt;Performance considerations&lt;/h3&gt;

&lt;p&gt;NUMBER uses Java’s BigDecimal for arithmetic operations, which provides exact
precision at the cost of being slower than fixed-precision types like BIGINT,
DOUBLE or DECIMAL. For this reason, NUMBER is designed for scenarios where
precision is more important than computational speed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Best for&lt;/strong&gt;: reading and storing high-precision data from source systems,
data federation, reporting, data warehousing&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Not optimal for&lt;/strong&gt;: computational heavy-lifting, complex mathematical
operations, high-performance analytics on numeric columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your workload involves extensive numeric computation, consider whether DECIMAL
(for up to 38 digits), DOUBLE (for approximate arithmetic), or BIGINT (for
integer arithmetic) might be more appropriate.&lt;/p&gt;

&lt;h3 id=&quot;function-support&quot;&gt;Function support&lt;/h3&gt;

&lt;p&gt;NUMBER supports essential operations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Arithmetic: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Aggregations: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;avg()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;min()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Rounding functions: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;abs()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sign()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceiling()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;floor()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;truncate()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;round()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Special value checks: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_nan()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_finite()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_infinite()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many advanced mathematical functions (trigonometric, logarithmic, etc.)
do not work with NUMBER directly and require explicit type conversions to DOUBLE or DECIMAL.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;The NUMBER type support will continue to evolve. Additional connectors are
planned for future releases:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt;: for Decimal256 type mapping&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Apache Ignite&lt;/strong&gt;: for high-precision numeric support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re also exploring performance optimizations and expanding function support
based on community feedback.&lt;/p&gt;

&lt;h2 id=&quot;getting-started&quot;&gt;Getting started&lt;/h2&gt;

&lt;p&gt;NUMBER support is available now in Trino 480. To start using it:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Upgrade to Trino 480&lt;/strong&gt; - NUMBER is available out of the box&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Remove deprecated configs&lt;/strong&gt; - If you used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal-mapping&lt;/code&gt; configurations,
consider removing them to enable automatic NUMBER mapping&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Query your data&lt;/strong&gt; - High-precision columns are now accessible without
configuration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For detailed documentation, refer to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/language/types.html&quot;&gt;NUMBER type reference&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/oracle.html&quot;&gt;Oracle connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/postgresql.html&quot;&gt;PostgreSQL connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/mysql.html&quot;&gt;MySQL connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/mariadb.html&quot;&gt;MariaDB connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/singlestore.html&quot;&gt;SingleStore connector documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Have questions or feedback? Join the discussion on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino community
Slack&lt;/a&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#dev&lt;/code&gt; channel, or open an issue on
&lt;a href=&quot;https://github.com/trinodb/trino/issues&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The NUMBER type represents a significant milestone in Trino’s evolution,
eliminating precision loss barriers and making high-precision numeric data from
diverse sources readily accessible for analytics and reporting. We’re excited to
see how the community uses this powerful new capability!&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>One of Trino’s core strengths is breaking down data silos—enabling data engineers to query diverse data sources through a single SQL interface. However, when those sources use high-precision numeric types beyond Trino’s 38-digit DECIMAL limit, that promise breaks down. Users faced an impossible choice: skip the columns entirely and lose access to critical data, or accept lossy rounding that compromises data integrity. This challenge required a new approach: a dedicated data type for high-precision, variable-scale decimals.</summary>

      
      
    </entry>
  
    <entry>
      <title>78: A view with a view with a view</title>
      <link href="https://trino.io/episodes/78.html" rel="alternate" type="text/html" title="78: A view with a view with a view" />
      <published>2026-01-16T00:00:00+00:00</published>
      <updated>2026-01-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/78</id>
      <content type="html" xml:base="https://trino.io/episodes/78.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Sr. Principal
DevRel Engineer at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Senior Developer
Advocate at &lt;a href=&quot;https://www.influxdata.com/&quot;&gt;InfluxData&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/robfromboulder/&quot;&gt;Rob Dickinson&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;h3 id=&quot;trino-478&quot;&gt;&lt;a href=&quot;/docs/current/release/release-478.html&quot;&gt;Trino 478&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for multiple plugin directories.&lt;/li&gt;
  &lt;li&gt;Propagate queryId to the Open Policy Agent authorizer.&lt;/li&gt;
  &lt;li&gt;Add support for reading encrypted Parquet files with the Hive connector.&lt;/li&gt;
  &lt;li&gt;Add numerous performance improvements and bug fixes for the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Update Docker container to use Java 25.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-479&quot;&gt;&lt;a href=&quot;/docs/current/release/release-479.html&quot;&gt;Trino 479&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Require Java 25 to build and run Trino.&lt;/li&gt;
  &lt;li&gt;Publish processing time for a query in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINISHING&lt;/code&gt; state to event
listeners.&lt;/li&gt;
  &lt;li&gt;Deprecate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt; type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LOGICAL&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTRIBUTED&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add a extraHeaders option to support sending arbitrary HTTP headers to the
JDBC driver and the CLI.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;APPLICATION_DEFAULT&lt;/code&gt; authentication type for GCS.&lt;/li&gt;
  &lt;li&gt;Remove support for unauthenticated access when GCS authentication type is set
to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SERVICE_ACCOUNT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for setting and dropping column defaults via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE ...
ALTER COLUMN&lt;/code&gt; to the memory connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;View &lt;a href=&quot;https://www.youtube.com/watch?v=7clvlAxGFOI&amp;amp;t=6s&amp;amp;pp=ygUSbWFuZnJlZCBtZW50b3JzIDEw&quot;&gt;Manfred mentors 10&lt;/a&gt; for a more detailed discussion.&lt;/p&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h3 id=&quot;other-releases-and-news&quot;&gt;Other releases and news&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call minutes are available:
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-22-oct-2025&quot;&gt;October 2025&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-26-nov-2025&quot;&gt;November 2025&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-query-ui&quot;&gt;Trino query UI&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.npmjs.com/package/trino-query-ui&quot;&gt;v0.1.1 successfully released&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Now blocked by npm process change and necessary work to adapt to it&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;OpenText and Vertica connector
    &lt;ul&gt;
      &lt;li&gt;OpenText is looking for expression of interest from users - contact Manfred
or comment on the &lt;a href=&quot;https://github.com/trinodb/trino/pull/26904&quot;&gt;PR for potential
removal&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Working on collaboration to set up test environment with Trino project&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PowerBI connector for Trino
    &lt;ul&gt;
      &lt;li&gt;Manfred working with Microsoft and others to figure out future plans&lt;/li&gt;
      &lt;li&gt;Microsoft is looking for &lt;a href=&quot;https://community.fabric.microsoft.com/t5/Fabric-Ideas/Trino-connector/idi-p/4849124&quot;&gt;your votes for a Trino Fabric
connector&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Trino 480 and Trino Gateway 17 are hopefully coming soon&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/playlist?list=PLHdo8mJLIMWALFrGgA6-wWcWgyZmjAex-&quot;&gt;Manfred
mentors&lt;/a&gt;
videos up to episode 10 now about various Trino topics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-rob&quot;&gt;Introducing Rob&lt;/h2&gt;

&lt;p&gt;Rob tells us about his history with Trino, software engineering, and management.&lt;/p&gt;

&lt;h2 id=&quot;a-view-with-a-view-with-a-view&quot;&gt;A view with a view with a view&lt;/h2&gt;

&lt;p&gt;We recap Rob’s past presentation and concepts from Trino Summit 2024 about views
and hierarchies of views. Then we move on to discuss all his recent development
and work. There include the
&lt;a href=&quot;https://github.com/robfromboulder/virtual-view-manifesto&quot;&gt;virtual-view-manifesto&lt;/a&gt;
and the &lt;a href=&quot;https://github.com/robfromboulder/viewmapper&quot;&gt;viewmapper&lt;/a&gt; and
&lt;a href=&quot;https://github.com/robfromboulder/viewzoo&quot;&gt;viewzoo&lt;/a&gt; projects.&lt;/p&gt;

&lt;p&gt;We also chat about Rob’s journey with AI tooling.&lt;/p&gt;

&lt;p&gt;A comparison of application code access to database storage with the different
approaches of an ORM layer, a micro service and API layer, and query engine and
view layer approach:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb78_virtual_view_comparison.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A detailed topology of an application taking advantage of virtual view
hierarchies:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb78_virtual_view_topology.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A concrete example of a view hierarchy for events – two swappable layers, one
for mapping to physical databases, and one for calculating event priority:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb78_virtual_view_example.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/robfromboulder/virtual-view-manifesto&quot;&gt;virtual-view-manifesto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/robfromboulder/viewmapper&quot;&gt;viewmapper&lt;/a&gt; for view storage&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/robfromboulder/viewzoo&quot;&gt;viewzoo&lt;/a&gt; for view visualization&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;28 Jan 2026 - &lt;a href=&quot;/community.html#events&quot;&gt;Trino Contributor Call&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;7 Feb 2026 - &lt;a href=&quot;https://www.meetup.com/trino-apac/events/312457635/&quot;&gt;Trino meetup in Bangalore&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Looking for guests and topics for Trino Community Broadcast 79 and beyond&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>77: One tool to proxy them all</title>
      <link href="https://trino.io/episodes/77.html" rel="alternate" type="text/html" title="77: One tool to proxy them all" />
      <published>2025-10-29T00:00:00+00:00</published>
      <updated>2025-10-29T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/77</id>
      <content type="html" xml:base="https://trino.io/episodes/77.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Sr. Principal
DevRel Engineer at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jordanzimmerman/&quot;&gt;Jordan Zimmerman&lt;/a&gt;, Senior
Staff Engineer at &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/pablo-arteaga-20b547101/&quot;&gt;Pablo Arteaga&lt;/a&gt;,
Software Engineer at
&lt;a href=&quot;https://www.bloomberg.com/company/values/tech-at-bloomberg/&quot;&gt;Bloomberg&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;Trino 478 is in the final staging of getting to release. We will talk about the
details in the next episode.&lt;/p&gt;

&lt;h3 id=&quot;other-releases-and-news&quot;&gt;Other releases and news&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-22-oct-2025&quot;&gt;August contributor call recap and
recording&lt;/a&gt;
is available.&lt;/li&gt;
  &lt;li&gt;New video tutorials for working on Trino and other open source projects
&lt;a href=&quot;https://www.youtube.com/playlist?list=PLHdo8mJLIMWALFrGgA6-wWcWgyZmjAex-&quot;&gt;Manfred
mentors&lt;/a&gt;
is live now and looking for &lt;a href=&quot;https://github.com/sponsors/mosabua&quot;&gt;sponsors&lt;/a&gt;.
Details about the tasks are available in the &lt;a href=&quot;https://github.com/simpligility/contributions&quot;&gt;contribution tracker
project&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-jordan-and-pablo&quot;&gt;Introducing Jordan and Pablo&lt;/h2&gt;

&lt;p&gt;Manfred chats with Pablo and Jordan about their involvement in the Trino
community. We end up chatting a bunch about the Airlift framework that is a
foundation for Trino since Jordan has been involved in that project for a long
time. Pablo has been involved in Trino itself and worked on the OPA plugin and
the Trino Gateway, among other things.&lt;/p&gt;

&lt;h2 id=&quot;aws-proxy&quot;&gt;aws-proxy&lt;/h2&gt;

&lt;p&gt;The AWS Proxy is an open-source Java toolkit and library, not a standalone
application, designed to act as a transparent proxy for AWS Simple Storage
Service (S3) compatible object storage protocols.&lt;/p&gt;

&lt;p&gt;It was created by developers from Starburst, Bloomberg and other organizations
in the Trino community to address the need for enhanced governance and security
with tools like Apache Spark that lack security controls. It also supports
direct data access to S3 or S3-compatible systems, like MinIO or Dell ECS.&lt;/p&gt;

&lt;h3 id=&quot;key-functionality-and-use-cases&quot;&gt;Key functionality and use cases&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Security and governance layer&lt;/strong&gt;: The primary goal is to prevent client
applications from bypassing governance systems by accessing S3 directly. It
ensures all data access is channeled through the proxy, where custom business
logic can be applied.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Signature handling:&lt;/strong&gt; It handles the complex AWS Signature Version 4 (SIGv4)
protocol used for authenticating requests, which was the most challenging part
of its development.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Emulated credentials&lt;/strong&gt;: Clients are configured to use fake, worthless
credentials that are only recognized by the proxy. The proxy then validates
the user’s identity and request against security policies (like OPA), signs
the request with the real, secure AWS keys (kept safe behind the firewall),
and forwards it to the real S3 store.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Extensibility&lt;/strong&gt;: It’s built on the Airlift framework and uses a simple
Service Provider Interface (SPI) plugin mechanism. This allows users to add
custom logic authorization, object storage abstraction from buckets to tables,
redirection, and other use cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In essence, it takes standard S3 requests from data tools and mediates them,
applying security, control, and abstraction before forwarding them to the actual
data lake storage.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb77-aws-proxy.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/aws-proxy&quot;&gt;aws-proxy&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://github.com/Randgalt/record-builder&quot;&gt;Jordan’s record-builder open source project&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Looking for guests and topics for Trino Community Broadcast 78&lt;/li&gt;
  &lt;li&gt;26 November 2025 - &lt;a href=&quot;/community.html#events&quot;&gt;Trino Contributor Call&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>76: Triple platform treat</title>
      <link href="https://trino.io/episodes/76.html" rel="alternate" type="text/html" title="76: Triple platform treat" />
      <published>2025-09-26T00:00:00+00:00</published>
      <updated>2025-09-26T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/76</id>
      <content type="html" xml:base="https://trino.io/episodes/76.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Sr. Principal
DevRel Engineer at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jo-perez-data/&quot;&gt;Jo Perez&lt;/a&gt;, Founding Solutions
Engineer at &lt;a href=&quot;https://www.getcollate.io/&quot;&gt;Collate&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/shawn-gordon-37b9916/&quot;&gt;Shawn Gordon&lt;/a&gt;, Sr.
Developer Advocate at &lt;a href=&quot;https://www.getcollate.io/&quot;&gt;Collate&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;Finally shipped a huge new release:&lt;/p&gt;

&lt;h3 id=&quot;trino-477&quot;&gt;&lt;a href=&quot;/docs/current/release/release-477.html&quot;&gt;Trino 477&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Add Lakehouse connector.&lt;/li&gt;
  &lt;li&gt;Add SQL language features including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... SET
AUTHORIZATION&lt;/code&gt;, default column values, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER VIEW ... REFRESH&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add new SQL functions like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cosine_distance()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_geojson_geometry()&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add lots of new features to the preview UI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are too many connector improvements to list them all. Check out the
release notes. Also inspect the changes on the SPI since there are quite a few.&lt;/p&gt;

&lt;p&gt;Importantly, this release also includes some breaking changes.&lt;/p&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;And before Trino 477 we also shipped Trino Gateway:&lt;/p&gt;

&lt;h3 id=&quot;trino-gateway-16&quot;&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#16&quot;&gt;Trino Gateway 16&lt;/a&gt;&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Add numerous UI improvements and fixes.&lt;/li&gt;
  &lt;li&gt;Require Java 24 and PostgreSQL 17 or higher.&lt;/li&gt;
  &lt;li&gt;Allow default routing group configuration.&lt;/li&gt;
  &lt;li&gt;Improve error propagation with external routing service.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;other-releases-and-news&quot;&gt;Other releases and news&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/charts/tags&quot;&gt;trino-1.41.0 and trino-gateway-1.16.0 Helm charts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-python-client/releases/tag/0.336.0&quot;&gt;trino-python-client 0.336.0&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-27-jun-2025&quot;&gt;July contributor call recap and recording&lt;/a&gt; is available.&lt;/li&gt;
  &lt;li&gt;The August contributor call recap and recording from Wednesday is in the works.&lt;/li&gt;
  &lt;li&gt;Java 25 shipped and adoption in Trino is on the way.&lt;/li&gt;
  &lt;li&gt;The new &lt;a href=&quot;https://github.com/trinodb/trino-odbc&quot;&gt;trino-odbc&lt;/a&gt; project was
contributed by &lt;a href=&quot;https://github.com/rileymcdowell&quot;&gt;Riley McDowell&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dprophet&quot;&gt;Erik Anderson&lt;/a&gt; is stepping up as subproject
maintainer for the ODBC driver.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/vagaerg&quot;&gt;Pablo Arteaga&lt;/a&gt; will lead the new efforts for
better OPA tooling and support in the &lt;a href=&quot;https://github.com/trinodb/trino-opa-tools&quot;&gt;trino-opa-tools
repository&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;We send our thanks to &lt;a href=&quot;https://github.com/mosiac1&quot;&gt;Cristian Osiac&lt;/a&gt; for his
contributions as subproject maintainer for
&lt;a href=&quot;https://github.com/trinodb/aws-proxy&quot;&gt;aws-proxy&lt;/a&gt;. He is unfortunately
stepping down from this work.&lt;/li&gt;
  &lt;li&gt;Trino recently overtook the old Presto in the &lt;a href=&quot;https://db-engines.com/en/ranking&quot;&gt;DB-Engines
ranking&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-jo-and-shawn&quot;&gt;Introducing Jo and Shawn&lt;/h2&gt;

&lt;p&gt;We chat with Jo and Shawn about their background in the big data and data lake
community and beyond.&lt;/p&gt;

&lt;h2 id=&quot;collate&quot;&gt;Collate&lt;/h2&gt;

&lt;p&gt;We talk about the &lt;a href=&quot;https://open-metadata.org/&quot;&gt;OpenMetadata open source project&lt;/a&gt;
as a unified platform for data discovery, observability, and governance, with
80+ data connectors and a collaborative interface.&lt;/p&gt;

&lt;p&gt;Jo and Shawn teach us about how OpenMetadata can help  build and manage high
quality data assets at scale, with case studies, documentation, and community
resources and we dive into how Collate offers a platform around OpenMetadata and
more.&lt;/p&gt;

&lt;h2 id=&quot;triple-platform-treat&quot;&gt;Triple platform treat&lt;/h2&gt;

&lt;p&gt;Building a modern data platform isn’t just about picking tools—it’s about
creating a unified ecosystem where performance, governance, and trust work
seamlessly together. See how the power trio of Trino, Collate, and Apache Ranger
transforms your data operations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino: Lightning-fast analytics at scale. Query across any data source, any
format, anywhere—without the complexity of data movement or vendor lock-in.&lt;/li&gt;
  &lt;li&gt;Collate: Intelligent data trust and discovery AI-powered profiling, automated
quality testing, and smart alerting that keeps your data reliable and
discoverable.&lt;/li&gt;
  &lt;li&gt;Apache Ranger: Enterprise-grade security and governance, fine-grained access
controls, policy management, and audit trails that keep your data secure and
compliant.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The integration advantage: Watch these three platforms work together to deliver
what every data team needs—fast queries, trusted data, and bulletproof
security—all in one cohesive stack.&lt;/p&gt;

&lt;p&gt;Jo and Shawn tell us more about “Trino + Collate + Apache Ranger = Data Platform
Excellence”, talk about the components and value provided by each of them, and
dive in with a demo, while Manfred and Cole ask more questions to dive deeper.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb76-collate.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://open-metadata.org/&quot;&gt;OpenMetadata&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getcollate.io/&quot;&gt;Collate&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.getcollate.io/connectors/database/trino&quot;&gt;Collate Trino connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://youtu.be/x4BvgSMitL0&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Apache Ranger sink for
revere metadata with Collate&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Community Broadcast 77: One tool to proxy them all (aws-proxy) planned
for October&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>75: Your app sees clearly into Trino</title>
      <link href="https://trino.io/episodes/75.html" rel="alternate" type="text/html" title="75: Your app sees clearly into Trino" />
      <published>2025-07-05T00:00:00+00:00</published>
      <updated>2025-07-05T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/75</id>
      <content type="html" xml:base="https://trino.io/episodes/75.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, open source hacker at 
&lt;a href=&quot;https://github.com/simpligility&quot;&gt;simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/trevor-denning/&quot;&gt;Trevor Denning&lt;/a&gt;, Solutions
Engineer at &lt;a href=&quot;https://insightsoftware.com/&quot;&gt;insightsoftware&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;What’s going on with our releases?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Summer slump&lt;/li&gt;
  &lt;li&gt;Reduced maintainer work&lt;/li&gt;
  &lt;li&gt;Necessary migration for Maven Central as release blocker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other announcements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-27-jun-2025&quot;&gt;June contributor call recap and recording&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/foundation.html&quot;&gt;Trino Software Foundation&lt;/a&gt; and
&lt;a href=&quot;/sponsor.html&quot;&gt;documentation for supporting the project&lt;/a&gt; on
the website.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-trevor&quot;&gt;Introducing Trevor&lt;/h2&gt;

&lt;p&gt;Trevor has been developing software for over 20 years and has deep knowledge of
ODBC and JDBC drivers for databases. He tells us more about his experience and
how he came to learn about Trino.&lt;/p&gt;

&lt;h2 id=&quot;more-about-insightsoftware&quot;&gt;More about insightsoftware&lt;/h2&gt;

&lt;p&gt;We untangle the long history of Simba, Logi Symphony, and insigtsoftware with
the Trino project to the current status, before we dive into the technical
details.&lt;/p&gt;

&lt;h2 id=&quot;odbc-and-jdbc&quot;&gt;ODBC and JDBC&lt;/h2&gt;

&lt;p&gt;After talking a bit about Trino, Iceberg, data lakes and related topics, we get
into the details about Simba Trino data connectivity with the ODBC and JDBC
drivers.&lt;/p&gt;

&lt;h2 id=&quot;demo&quot;&gt;Demo&lt;/h2&gt;

&lt;p&gt;Trevor shows us how you can use the ODBC driver to query Trino catalogs from
Microsoft Excel, which arguably the most widely used reporting and analytics
tool, despite really being a spreadsheet application. After that demo he moves
on to some business intelligence analytics with PowerBI.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb75-insightsoftware.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/drivers/trino-odbc-jdbc/&quot;&gt;Simba Trino ODBC &amp;amp; JDBC Drivers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://documentation.insightsoftware.com/simba-home-olh/content/homepage/trino.htm&quot;&gt;Simba Trino Driver Documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/client-application.html#logi-symphony&quot;&gt;Logi Symphony&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/resources/scaling-bi-with-trino-and-apache-iceberg/&quot;&gt;Video: Scaling BI with Trino and Apache Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/blog/unlocking-trinos-full-potential-with-simba-drivers-for-bi-etl/&quot;&gt;Blog post: Unlocking Trino’s Full Potential With Simba Drivers for BI &amp;amp; ETL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://insightsoftware.com/blog/enhance-trino-performance-with-simbas-powerful-connectivity/&quot;&gt;Blog post: Enhance Trino Performance With Simba’s Powerful Connectivity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;We give a quick update on where to see Cole or Manfred next, and talk about
upcoming Trino events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Meet Manfred at the &lt;a href=&quot;https://www.chainguard.dev/&quot;&gt;Chainguard&lt;/a&gt; booth at the Black Hat conference in Las Vegas&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings&quot;&gt;Trino Contributor
Call&lt;/a&gt; planned for
the 23rd of July&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast: One tool to proxy them all (aws-proxy)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>74: Insights from a Norse god</title>
      <link href="https://trino.io/episodes/74.html" rel="alternate" type="text/html" title="74: Insights from a Norse god" />
      <published>2025-06-06T00:00:00+00:00</published>
      <updated>2025-06-06T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/74</id>
      <content type="html" xml:base="https://trino.io/episodes/74.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jeschkies/&quot;&gt;Karsten Jeschkies&lt;/a&gt; from &lt;a href=&quot;https://grafana.com/&quot;&gt;Grafana
Labs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-475.html&quot;&gt;Trino 475&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CORRESPONDING&lt;/code&gt; clause in set operations.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AUTO&lt;/code&gt; grouping set that includes all non-aggregated
columns in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Allow cross-region data retrieval when using the S3 native filesystem.&lt;/li&gt;
  &lt;li&gt;Add support for all storage classes when using the S3 native filesystem for
writes.&lt;/li&gt;
  &lt;li&gt;Numerous improvements on Iceberg, Hive, and Delta Lake connectors.&lt;/li&gt;
  &lt;li&gt;SPI - Remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LazyBlock&lt;/code&gt; class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-476.html&quot;&gt;Trino 476&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another big release with lots of changes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Require JDK 24 as runtime.&lt;/li&gt;
  &lt;li&gt;Add support for comparing values of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;geometry&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Remove Example HTTP connector from binaries.&lt;/li&gt;
  &lt;li&gt;New required JVM config for BigQuery and Snowflake connectors.&lt;/li&gt;
  &lt;li&gt;Fix regression with graceful shutdown from Trino 474.&lt;/li&gt;
  &lt;li&gt;Improve performance of selective joins for federated queries for nearly all
connectors.&lt;/li&gt;
  &lt;li&gt;Add columns to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$all_manifests&lt;/code&gt; metadata tables for Iceberg tables.&lt;/li&gt;
  &lt;li&gt;Add support for user-assigned managed identity authentication for AzureFS for
object storage connectors.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR TIMESTAMP AS OF&lt;/code&gt; clause in Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;Other releases and announcements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Gateway 16 still delayed, but Trino Gateway Helm chart 1.15.2&lt;/li&gt;
  &lt;li&gt;Trino Helm chart with 475 -&amp;gt; 1.39.1&lt;/li&gt;
  &lt;li&gt;Trino Python client &lt;a href=&quot;https://github.com/trinodb/trino-python-client/releases/tag/0.334.0&quot;&gt;0.334.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-karsten-and-grafana-labs&quot;&gt;Introducing Karsten and Grafana Labs&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/jeschkies/&quot;&gt;Karsten Jeschkies&lt;/a&gt; is an experienced software
engineer:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;2013 - 2016 Engineer at the Core Machine Learning team at Amazon&lt;/li&gt;
  &lt;li&gt;2016 - 2020 Mesosphere and D2IQ, maintainer of Marathon, a container
orchestrator for Mesos&lt;/li&gt;
  &lt;li&gt;2020 - now Maintainer of Loki for two years and now Cloud Provider
observability engineer at Grafana Labs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://grafana.com/&quot;&gt;Grafana Labs&lt;/a&gt; is the home of the well-known Grafana for
visualizations and dashboard and other powerful products such as Grafana Tempo, Grafana Mimir,
and Grafana Loki. Grafana is also involved in well-known projects such as
Prometheus and OpenTelemetry.&lt;/p&gt;

&lt;h2 id=&quot;log-management-with-loki&quot;&gt;Log management with Loki&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://grafana.com/oss/loki/&quot;&gt;Loki&lt;/a&gt; is a horizontally-scalable,
highly-available, multi-tenant log aggregation system inspired by Prometheus. It
helps you to drill into petabytes of logging data.&lt;/p&gt;

&lt;h2 id=&quot;analytics-with-trino&quot;&gt;Analytics with Trino&lt;/h2&gt;

&lt;p&gt;Karsten tells about the motivation to create a Trino connector, how the two
tools work together, what features are there, and what his plans are for the
future.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb74-loki-connector.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jeschkies/loki-trino-demo&quot;&gt;Demo source code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://grafana.com/oss/loki/&quot;&gt;Grafana Loki website&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/grafana/loki&quot;&gt;Loki source code repo&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/loki.html&quot;&gt;Loki connector documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Quick update on where to see Cole or Manfred next, and then join us for the
upcoming Trino events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call - May skipped, June edition to be determined&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast: Visualizing with Logi Symphony and ODBC&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast: One tool to proxy them all (aws-proxy)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>73: Wrapping Trino packages with a bow</title>
      <link href="https://trino.io/episodes/73.html" rel="alternate" type="text/html" title="73: Wrapping Trino packages with a bow" />
      <published>2025-04-09T00:00:00+00:00</published>
      <updated>2025-04-09T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/73</id>
      <content type="html" xml:base="https://trino.io/episodes/73.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-473.html&quot;&gt;Trino 473&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for array literals.&lt;/li&gt;
  &lt;li&gt;Add LDAP-based group provider.&lt;/li&gt;
  &lt;li&gt;Remove the deprecated glue-v1 metastore type.&lt;/li&gt;
  &lt;li&gt;Remove the deprecated Databricks Unity catalog integration.&lt;/li&gt;
  &lt;li&gt;Remove the Kudu connector.&lt;/li&gt;
  &lt;li&gt;Remove the Phoenix connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But don’t use 473 since there were &lt;a href=&quot;https://github.com/trinodb/trino/issues/25381&quot;&gt;some breaking changes&lt;/a&gt;, fixed in…&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-474.html&quot;&gt;Trino 474&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Fix a correctness bug in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt; queries with a large number
of unique groups.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalUser&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;authenticatedUser&lt;/code&gt; as resource group selectors.&lt;/li&gt;
  &lt;li&gt;Use JDK 24 as the runtime in the Docker container.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well. Java 24 is coming as requirement soon - test the container!&lt;/p&gt;

&lt;p&gt;Releases continue to be slower. Trino needs your help.&lt;/p&gt;

&lt;p&gt;Other releases and announcements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Gateway 16 delayed, but Trino Gateway Helm chart 1.15.1&lt;/li&gt;
  &lt;li&gt;Trino Helm chart with 474 -&amp;gt; 1.38.0&lt;/li&gt;
  &lt;li&gt;New book: &lt;a href=&quot;/blog/2025/03/27/olap-principles-book.html&quot;&gt;Core Principles and Design Practices of OLAP Engines from Yiteng Xu
and Gary Gao&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Massive new contribution looking for helpers - &lt;a href=&quot;https://github.com/trinodb/trino-query-ui&quot;&gt;trino-query-ui&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s explore the query ui repo a bit more…&lt;/p&gt;

&lt;h2 id=&quot;application-packaging-and-trino&quot;&gt;Application packaging and Trino&lt;/h2&gt;

&lt;p&gt;Manfred and Cole muse about the package artifacts from Trino, their history,
scope and pain points:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;RPM&lt;/li&gt;
  &lt;li&gt;tarball&lt;/li&gt;
  &lt;li&gt;Docker container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of them have and had issues, and everyone knew about them. Manfred
documented a lot the usage in &lt;a href=&quot;/trino-the-definitive-guide&quot;&gt;Trino: The Definitive
Guide&lt;/a&gt;. Finally some time in 2024
Manfred put some ideas down and in the last months implemented a lot of it.&lt;/p&gt;

&lt;p&gt;We discuss a few aspects such as the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Plugin architecture of Trino&lt;/li&gt;
  &lt;li&gt;What plugins are core or optional?&lt;/li&gt;
  &lt;li&gt;Are artifacts ready to use or not?&lt;/li&gt;
  &lt;li&gt;How painful is configuration?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;In our demo session we look at some of the changes and the new trino-packages
repository:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;RPM removal from Trino, and replacement module&lt;/li&gt;
  &lt;li&gt;trino-server-core tarball in Trino and plugin selection&lt;/li&gt;
  &lt;li&gt;trino-server-custom module&lt;/li&gt;
  &lt;li&gt;trinodb/trino-core:latest Docker container in Trino&lt;/li&gt;
  &lt;li&gt;custom-docker module&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred runs a build, shows the results, and walks through the packages
repository structure and instruction. To finish of we talk about next steps such
as removing plugins from the default binaries and therefore making them
optional.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-packages&quot;&gt;trino-packages repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22597&quot;&gt;Packaging improvement issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/installation/plugins.html&quot;&gt;Trino plugin documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Quick update on where to see Cole or Manfred next, and then join us for the
upcoming Trino events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call - 23rd of April&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 74: One tool to proxy them all (aws-proxy)&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 75: Insights from a Norse god (Loki connector)&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 76: Visualizing with Logi Symphony and ODBC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know if you want to be a guest in a future broadcast.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Core Principles and Design Practices of OLAP Engines</title>
      <link href="https://trino.io/blog/2025/03/27/olap-principles-book.html" rel="alternate" type="text/html" title="Core Principles and Design Practices of OLAP Engines" />
      <published>2025-03-27T00:00:00+00:00</published>
      <updated>2025-03-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/03/27/olap-principles-book</id>
      <content type="html" xml:base="https://trino.io/blog/2025/03/27/olap-principles-book.html">&lt;p&gt;Yiteng Xu and Yingju Gao are proudly announcing the new book “Core Principle and
Design Practices of OLAP Engines” from China Machine Press. This is great news
for the Trino community, since the book is based on the open source project
Trino, specifically Trino 350. It took more than four years for the two authors
to finish writing. All concepts and details are explained with Trino falvor and
generalized to all OLAP engines. Let us walk throught the chapters and you will
find out the two author dive deep into the source code layer and bring you so
many treasures.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;author-introduction&quot;&gt;Author introduction&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/medsmeds&quot;&gt;Yiteng (Ivan) Xu&lt;/a&gt;: is a data security engineer and
is currently utilizing Trino, Spark, and Calcite for SQL analysis. His work
encompasses various scenarios, including data warehouse metrics, SQL
auto-rewriting, SQL purpose detection, and the development of SQL-based
Purpose-Aware Access Control System.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/garyelephant&quot;&gt;Yingju (Gary) Gao&lt;/a&gt; is an Apache Seatunnel PMC
member and the lead of the time series database team. He currently serves as the
technical lead for the observability-engine team, and is responsible for
building the ecosystem for observability data, including metrics, trace, log,
and event data, providing a high-performance, high-throughput data pipeline from
ingestion to consumption, storage, querying, and data warehousing. Additionally,
he oversees metrics stability, multi-tenant access, and user requirement
integration.&lt;/p&gt;

&lt;p&gt;Both authors are passionate about sharing their technical knowledge. They have
delved deep into source code and excel in technical writing, breaking down
complex underlying principles into a linear and comprehensible format for
readers. They firmly believe that sharing is a virtue and are committed to
continuing their technical contributions.&lt;/p&gt;

&lt;p&gt;So now it is time to get the book, or read on for a walk through of the content:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://product.dangdang.com/11974653727.html&quot;&gt;
        Get the book from dangdang.com
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://item.m.jd.com/product/10136949561522.html&quot;&gt;
        Get the book from jd.com
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;walk-through&quot;&gt;Walk through&lt;/h2&gt;

&lt;p&gt;Let’s have a look at the different chapters in a high-level walk through.&lt;/p&gt;

&lt;h3 id=&quot;part-1-background-knowledge&quot;&gt;Part 1: Background knowledge&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 1&lt;/strong&gt;: Introduce the concept of OLAP (Online Analytical Processing),
provide comparsion among different engines like Trino, Impala, Doris and others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 2&lt;/strong&gt;: Provides a comprehensive introduction to the Trino engine,
covering its principles, architecture, enterprise use cases, compilation, and
execution. It also compares Trino with the Presto project and introduces the
SQL statements that are referenced throughout the book.&lt;/p&gt;

&lt;h3 id=&quot;part-2-core-principles&quot;&gt;Part 2: Core principles&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 3&lt;/strong&gt;: Offers an overview of the distributed SQL query process, serving
as a high-level introduction to the subsequent chapters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 4&lt;/strong&gt;: Begins with the generation of query execution plans, including
the transformation of SQL into abstract syntax trees, semantic analysis, and the
creation of initial logical plans. It then delves into the theoretical knowledge
of optimizers and the overall framework of the Trino optimizer.&lt;/p&gt;

&lt;h3 id=&quot;part-3-classic-sql&quot;&gt;Part 3: Classic SQL&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 5&lt;/strong&gt;: Explains the generation and optimization of execution plans for
SQL statements involving only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Filter&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Project&lt;/code&gt; operations,
along with their scheduling and execution processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 6&lt;/strong&gt;: Focuses on SQL statements with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Limit&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Sort&lt;/code&gt; operations,
detailing the generation and optimization of execution plans, as well as their
scheduling and execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 7&lt;/strong&gt;: Introduces the basic principles of aggregate queries. It then
covers the generation and optimization of execution plans for grouped and
non-grouped aggregate SQL statements, along with their scheduling and execution
processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 8&lt;/strong&gt;: Discusses SQL statements with count distinct and multiple
aggregate operations, explaining the generation and optimization of execution
plans, as well as their scheduling and execution. This includes the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Scatter-Gather&lt;/code&gt; model and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MarkDistinct&lt;/code&gt; optimization. Finally, a complex SQL
statement is used to tie together the concepts from Chapters 5 to 8.&lt;/p&gt;

&lt;h3 id=&quot;part-4-data-exchange-mechanism&quot;&gt;Part 4: Data exchange mechanism&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 9&lt;/strong&gt;: Introduces the overall concept of data exchange mechanisms and
how data exchange is incorporated during the query optimization phase via the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AddExchanges&lt;/code&gt; optimizer, along with the design principles for scheduling and
execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 10&lt;/strong&gt;: Explains how tasks establish connections during the query
scheduling phase and the mechanisms for upstream and downstream data flow during
execution. It also covers the principles of intra-task data exchange, RPC
interaction mechanisms, and analyzes backpressure, Limit semantics, and
out-of-order request handling.&lt;/p&gt;

&lt;h3 id=&quot;part-5-plugin-mechanisms-and-connectors&quot;&gt;Part 5: Plugin mechanisms and connectors&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 11&lt;/strong&gt;: Begins with an introduction to Trino’s plugin system and SPI
mechanism, including plugin loading and JVM’s class loading principles. It then
dissects connectors, covering metadata modules, read modules, pushdown
optimization, and providing in-depth insights into connector design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 12&lt;/strong&gt;: Uses the example-http connector to help readers understand
connector design and implements a simple data source using Python’s Flask
framework.&lt;/p&gt;

&lt;h3 id=&quot;part-6-function-principles-and-development&quot;&gt;Part 6: Function principles and development&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chapter 13&lt;/strong&gt;: Provides an overview of Trino’s function system, including
function types, lifecycle, and several function development methods. It delves
into the data structures and annotations related to functions and explains the
function registration and parsing process during semantic analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chapter 14&lt;/strong&gt;: Focuses on how to write a udf in practice. It covers
annotation-based development methods for scalar functions, as well as low-level
development methods using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;codeGen&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;methodHandle&lt;/code&gt; APIs. For aggregate
functions, it introduces annotation-based development methods and low-level
methods where developers handle serialization and state on their own.&lt;/p&gt;

&lt;h3 id=&quot;why-trino&quot;&gt;Why Trino?&lt;/h3&gt;

&lt;p&gt;In 2020, one of the authors, Yiteng Xu, encountered a scenario at work where
data needed to be read from two Hive instances, each modified by different
internal teams. The company’s infrastructure team attempted a simple solution by
registering virtual tables and using MapReduce for federated queries. However,
this approach proved inadequate for the agile analysis needs of data analysts,
with complex queries taking nearly 12 hours to complete. One mistake per SQL
meant an entire day was wasted.&lt;/p&gt;

&lt;p&gt;Later, another team researched and adopted Presto (before Trino became
independent). By adapting the Hive engine at the connector level, they enabled
federated queries across the two Hive instances without data migration or
extensive code changes. Users only needed to be aware of a catalog prefix,
making the process incredibly convenient. The author later had the opportunity
to participate in the project and developed a strong interest in its source
code. The elegance of the open-source project, its plugin design, and the inner
workings of connectors and Airlift framework sparked a deep curiosity, leading
the author on a journey of source code exploration. As the PrestoSQL project was
more active and receptive to developer feedback, the author chose to continue
following the Trino project when it emerged in late 2020.&lt;/p&gt;

&lt;h2 id=&quot;get-your-copy&quot;&gt;Get your copy&lt;/h2&gt;

&lt;p&gt;Now it is time for you to get your copy of &lt;strong&gt;Core Principles and Design Practices of OLAP Engines&lt;/strong&gt;:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://product.dangdang.com/11974653727.html&quot;&gt;
        Get the book from dangdang.com
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://item.m.jd.com/product/10136949561522.html&quot;&gt;
        Get the book from jd.com
    &lt;/a&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Yiteng Xu, Yingju Gao, Manfred Moser</name>
        </author>
      

      <summary>Yiteng Xu and Yingju Gao are proudly announcing the new book “Core Principle and Design Practices of OLAP Engines” from China Machine Press. This is great news for the Trino community, since the book is based on the open source project Trino, specifically Trino 350. It took more than four years for the two authors to finish writing. All concepts and details are explained with Trino falvor and generalized to all OLAP engines. Let us walk throught the chapters and you will find out the two author dive deep into the source code layer and bring you so many treasures.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/core-principles-olap-book.jpg" />
      
    </entry>
  
    <entry>
      <title>72: Keeping the lake clean</title>
      <link href="https://trino.io/episodes/72.html" rel="alternate" type="text/html" title="72: Keeping the lake clean" />
      <published>2025-03-17T00:00:00+00:00</published>
      <updated>2025-03-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/72</id>
      <content type="html" xml:base="https://trino.io/episodes/72.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Dev Rel Engineer
at &lt;a href=&quot;https://chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/viktor-kessler&quot;&gt;Viktor Kessler&lt;/a&gt;, Co-founder at Vakamo&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/thielc&quot;&gt;Christian Thiel&lt;/a&gt;, Co-founder at Vakamo&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-472.html&quot;&gt;Trino 472&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Color the server console output for improved readability.&lt;/li&gt;
  &lt;li&gt;Fix initialization failure for the DuckDB connector on Docker container.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; type and generate empty values for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map&lt;/code&gt;,
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; types in the Faker connector.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$partition&lt;/code&gt; hidden column in the Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#15&quot;&gt;Trino Gateway 15&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Pop up messages in UI&lt;/li&gt;
  &lt;li&gt;Consistent use of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.yaml&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Use of OpenMetrics data from Trino clusters&lt;/li&gt;
  &lt;li&gt;Fix query errors when adhoc routing group has no healthy backends.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-viktor-and-christian&quot;&gt;Introducing Viktor and Christian&lt;/h2&gt;

&lt;p&gt;We talk with Viktor and Christian about there experience in software engineering
and the world of big data, and what led them to start Vakamo together.&lt;/p&gt;

&lt;h2 id=&quot;metastores-and-catalogs&quot;&gt;Metastores and catalogs&lt;/h2&gt;

&lt;p&gt;We talk about data lakes, data lakehouses, object storage and the role of
metadata. Details we cover include the Hive Metatstore Service, the Thrift
protocol, Amazon Glue, and the new wave of catalogs. Specifically we also talk
about Apache Iceberg and the Iceberg REST catalog standard as a basis for
Lakekeeper, and then learn all the details about Lakekeeper.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/lakekeeper-small.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;In their demo Viktor and Christian show a multi-user Trino cluster secured by
OAuth 2, Open Policy Agent, and Lakekeeper.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakekeeper.io/&quot;&gt;Lakekeeper&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.lakekeeper.io/&quot;&gt;Lakekeeper documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lakekeeper/lakekeeper&quot;&gt;Lakekeeper source&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lakekeeper/lakekeeper/tree/main/examples/trino-opa&quot;&gt;Example project with Trino and OPA&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/iceberg.html&quot;&gt;Iceberg connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/metastores.html#rest-catalog&quot;&gt;Iceberg REST catalog documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to a guest:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Community Broadcast 73: Wrapping Trino packages with a bow&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Twenty four</title>
      <link href="https://trino.io/blog/2025/03/03/java-24.html" rel="alternate" type="text/html" title="Twenty four" />
      <published>2025-03-03T00:00:00+00:00</published>
      <updated>2025-03-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/03/03/java-24</id>
      <content type="html" xml:base="https://trino.io/blog/2025/03/03/java-24.html">&lt;p&gt;Six month ago &lt;a href=&quot;/blog/2024/09/17/java-23.html&quot;&gt;we adopted Java 23 as requirement&lt;/a&gt;, following our standard procedure to upgrade with each Java version as soon
as it becomes available. This allows us to take advantage of all the great
improvement each release brings. The upgrade to 23 was pretty easy since the
changes from 22 to 23 were not that big. The story turns out to be a bit
different now with our upgrade to Java 24.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;java-24-features&quot;&gt;Java 24 features&lt;/h2&gt;

&lt;p&gt;We have been &lt;a href=&quot;https://github.com/trinodb/trino/issues/23498&quot;&gt;planning and working towards the
upgrade&lt;/a&gt; consistently since the
23 bump in September. Java 24 is set to be released in March 2025 and the list
of changes is quite significant:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;JEP 450 Compact Object Headers (Experimental)&lt;/li&gt;
  &lt;li&gt;JEP 472 Prepare to Restrict the Use of JNI&lt;/li&gt;
  &lt;li&gt;JEP 475 Late Barrier Expansion for G1&lt;/li&gt;
  &lt;li&gt;JEP 478 Key Derivation Function API (Preview)&lt;/li&gt;
  &lt;li&gt;JEP 483 Ahead-of-Time Class Loading &amp;amp; Linking&lt;/li&gt;
  &lt;li&gt;JEP 484 Class-File API&lt;/li&gt;
  &lt;li&gt;JEP 485 Stream Gatherers&lt;/li&gt;
  &lt;li&gt;JEP 486 Permanently Disable the Security Manager&lt;/li&gt;
  &lt;li&gt;JEP 487 Scoped Values (Fourth Preview)&lt;/li&gt;
  &lt;li&gt;JEP 488 Primitive Types in Patterns, instanceof, and switch (Second Preview)&lt;/li&gt;
  &lt;li&gt;JEP 489 Vector API (Ninth Incubator)&lt;/li&gt;
  &lt;li&gt;JEP 490 ZGC: Remove the Non-Generational Mode&lt;/li&gt;
  &lt;li&gt;JEP 491 Synchronize Virtual Threads without Pinning&lt;/li&gt;
  &lt;li&gt;JEP 492 Flexible Constructor Bodies (Third Preview)&lt;/li&gt;
  &lt;li&gt;JEP 494 Module Import Declarations (Second Preview)&lt;/li&gt;
  &lt;li&gt;JEP 495 Simple Source Files and Instance Main Methods (Fourth Preview)&lt;/li&gt;
  &lt;li&gt;JEP 496 Quantum-Resistant Module-Lattice-Based Key Encapsulation Mechanism&lt;/li&gt;
  &lt;li&gt;JEP 497 Quantum-Resistant Module-Lattice-Based Digital Signature Algorithm&lt;/li&gt;
  &lt;li&gt;JEP 498 Warn upon Use of Memory-Access Methods in sun.misc.Unsafe&lt;/li&gt;
  &lt;li&gt;JEP 499 Structured Concurrency (Fourth Preview)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The list of new features is also quite large. You can find more details
in the &lt;a href=&quot;https://jdk.java.net/24/release-notes&quot;&gt;release notes&lt;/a&gt; and each
individual JEP.&lt;/p&gt;

&lt;h2 id=&quot;trino-perspective&quot;&gt;Trino perspective&lt;/h2&gt;

&lt;p&gt;From a Trino perspective we want to specifically take advantage of performance
improvements to MemorySegment (mismatch, copy, fill), “JEP 491 Synchronize
Virtual Threads without Pinning” and “JEP 475 Late Barrier Expansion for G1”. On
the other hand &lt;a href=&quot;https://openjdk.org/jeps/486&quot;&gt;JEP 486 Permanently Disable the Security
Manager&lt;/a&gt; turned out to be the most impactful.&lt;/p&gt;

&lt;p&gt;Since Trino and its connectors have a large footprint of dependencies there was
a high chance that some projects as not keeping up with the security manager
removal, although it was first deprecated with Java 17 in 2021.&lt;/p&gt;

&lt;p&gt;At this stage the Kafka, Kudu, and Phoenix connectors are affected. The Kafka
project is planning to make a new compatible release available in time and we
will adopt that version.&lt;/p&gt;

&lt;p&gt;The Kudu and Phoenix connectors however will be removed, since it is not
possible to use them with Java 24 as requirement. Both connectors are not
heavily used in our community as we learned from our communication with numerous
users, integrators, and the results from our &lt;a href=&quot;/blog/2025/01/07/2024-and-beyond.html&quot;&gt;user survey&lt;/a&gt;. We are tracking progress for each removal in the
issues &lt;a href=&quot;https://github.com/trinodb/trino/issues/24419&quot;&gt;#24419 Phoenix connector&lt;/a&gt;
and &lt;a href=&quot;https://github.com/trinodb/trino/issues/24417&quot;&gt;#24417 Kudu connector&lt;/a&gt;. If
either of these communities ends up supporting Java 24, or a newer version as
required by Trino, in the future, we can potentially add the connectors back in
if community members contribute updated versions.&lt;/p&gt;

&lt;h2 id=&quot;release-plans&quot;&gt;Release plans&lt;/h2&gt;

&lt;p&gt;In terms of shipping the changes we follow our established pattern:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Clean up codebase and get it ready, specifically this include the removal of
the Kudu and Phoenix connectors.&lt;/li&gt;
  &lt;li&gt;Cut a release that is completely ready to be used with Java 24, but does not
yet make it a hard requirement&lt;/li&gt;
  &lt;li&gt;Allow for community testing and feedback using Java 24.&lt;/li&gt;
  &lt;li&gt;Introduce Java 24 as hard requirement in another release.&lt;/li&gt;
  &lt;li&gt;Adopt Java 24 features and bring the benefits to our users with following
releases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you see, there is a bunch of work waiting, we we better back to it. As usual,
if you have questions or comments, chime in on the relevant issue or chat with
us on &lt;a href=&quot;/slack.html&quot;&gt;Trino Slack&lt;/a&gt; in the &lt;a href=&quot;https://trinodb.slack.com/messages/C07ABNN828M&quot;&gt;core-dev
channel&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Mateusz Gajewski</name>
        </author>
      

      <summary>Six month ago we adopted Java 23 as requirement, following our standard procedure to upgrade with each Java version as soon as it becomes available. This allows us to take advantage of all the great improvement each release brings. The upgrade to 23 was pretty easy since the changes from 22 to 23 were not that big. The story turns out to be a bit different now with our upgrade to Java 24.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/coffee-24.png" />
      
    </entry>
  
    <entry>
      <title>71: Fake it real good</title>
      <link href="https://trino.io/episodes/71.html" rel="alternate" type="text/html" title="71: Fake it real good" />
      <published>2025-02-27T00:00:00+00:00</published>
      <updated>2025-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/71</id>
      <content type="html" xml:base="https://trino.io/episodes/71.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/janwas/&quot;&gt;Jan Waś&lt;/a&gt;, 
Software Engineer at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the recent releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-471.html&quot;&gt;Trino 471&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add &lt;a href=&quot;https://trino.io/docs/current/functions/ai.html&quot;&gt;AI functions&lt;/a&gt; for textual
tasks on data using OpenAI, Anthropic, or other LLMs using Ollama as backend.&lt;/li&gt;
  &lt;li&gt;Add support for logging output to the console in JSON format (useful in containers..).&lt;/li&gt;
  &lt;li&gt;Support additional Python libraries for use with Python user-defined functions.&lt;/li&gt;
  &lt;li&gt;Remove the RPM package.&lt;/li&gt;
  &lt;li&gt;Add &lt;a href=&quot;https://trino.io/docs/current/object-storage/file-system-local.html&quot;&gt;local file system support&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for S3 Tables in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#14&quot;&gt;Trino Gateway 14&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our first Trino Gateway release of 2025 shipped, and it is packed with great new
features and fixes. Some examples are the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Rules editor in the web interface&lt;/li&gt;
  &lt;li&gt;Automatic database schema update and support for Oracle&lt;/li&gt;
  &lt;li&gt;Trino cluster monitoring with JMX and OpenMetrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-jan-waś&quot;&gt;Introducing Jan Waś&lt;/h2&gt;

&lt;p&gt;Jan, also known as &lt;a href=&quot;https://github.com/nineinchnick/&quot;&gt;nineinchnick on GitHub&lt;/a&gt;,
is a very active Trino contributor with a wide range of his own plugins and
projects. He is subproject maintainer for the Helm charts and the Grafana
plugin, and is heavily involved in GitHub actions setup and numerous other
efforts. Jan resides in Poland. When he is not working on Trino, you can find
him at metal, electronics, and even opera concerts across Europe or at home
playing video games.&lt;/p&gt;

&lt;h2 id=&quot;datafaker-faker-connector-and-trino&quot;&gt;Datafaker, Faker connector, and Trino&lt;/h2&gt;

&lt;p&gt;We talk about using simulated data from the TPC-H and TPC-DS connectors to learn
SQL and use it for other scenarios such as benchmarking, testing for SQL
support, and validating other connectors and data sources. This leads us to the
limitations of these connectors and how the Faker connector is the next step.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/datafaker-small.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Jan tells us about the Datafaker library and his motivation to create a
connector, and how it eventually landed in Trino itself.&lt;/p&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Jan shows us how to configure the connector and then demoes a number of use
cases from learning SQL to populating and testing other data sources.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/faker.html&quot;&gt;Faker connector documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/data-source.html#datafaker&quot;&gt;Datafaker project&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/reports&quot;&gt;Trino reports repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/nineinchnick/&quot;&gt;Other project repositories from Jan&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/28/trino-fest-2023-starburst-recap.html&quot;&gt;Zero-cost reporting, presented at Trino Fest 2023&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Watch the &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings&quot;&gt;recording of the Trino contributor call or read the
minutes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to a guest:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Community Broadcast 72: Keeping the lake clean, all about
&lt;a href=&quot;https://lakekeeper.io/&quot;&gt;Lakekeeper&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 73: Wrapping Trino packages with a bow&lt;/li&gt;
&lt;/ul&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>70: Previewing a new UI</title>
      <link href="https://trino.io/episodes/70.html" rel="alternate" type="text/html" title="70: Previewing a new UI" />
      <published>2025-02-13T00:00:00+00:00</published>
      <updated>2025-02-13T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/70</id>
      <content type="html" xml:base="https://trino.io/episodes/70.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/peter-kosztolanyi-5617938/&quot;&gt;Peter Kosztolanyi&lt;/a&gt;, 
Analytics Platform Lead at &lt;a href=&quot;https://wise.com/&quot;&gt;Wise&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Following are some highlights of the Trino releases since episode 69:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-470.html&quot;&gt;Trino 470&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New DuckDB connector&lt;/li&gt;
  &lt;li&gt;New Grafana Loki connector&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH SESSION&lt;/code&gt; for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; queries&lt;/li&gt;
  &lt;li&gt;Raise minimum runtime requirement to Java 11 for JDBC driver and CLI&lt;/li&gt;
  &lt;li&gt;Remove Kinesis connector&lt;/li&gt;
  &lt;li&gt;Deprecate use of the legacy file system support for Azure Storage, Google
Cloud Storage, IBM Cloud Object Storage, S3 and S3-compatible object storage
systems - &lt;a href=&quot;/blog/2025/02/10/old-file-system.html&quot;&gt;check out the blog post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h2 id=&quot;introducing-peter-kosztolanyi&quot;&gt;Introducing Peter Kosztolanyi&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/koszti&quot;&gt;Peter Kosztolanyi&lt;/a&gt; is the Analytics Platform Lead at
&lt;a href=&quot;https://wise.com/&quot;&gt;Wise&lt;/a&gt; and he &lt;a href=&quot;https://youtu.be/K5RmYtbeXAc&quot;&gt;presented about their data
lake&lt;/a&gt; with Abdullah Alkhawatrah at &lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;Trino Summit
2024&lt;/a&gt;. Peter has a lot
of experience in the data and business intelligence fields.&lt;/p&gt;

&lt;p&gt;He also contributes to the Trino Python client, and worked on his own phone and
messaging app for iOS and Android in the past.&lt;/p&gt;

&lt;h2 id=&quot;trino-legacy-web-ui&quot;&gt;Trino legacy web UI&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;existing main web UI for
Trino&lt;/a&gt; has been around
for a long time, and sees very limited development and maintenance. It lacks
documentation, a modern look, a clean codebase, and is inconsistent across
screens. It is also very technical and developer focussed, and lacks features
like a SQL console to run queries.&lt;/p&gt;

&lt;h2 id=&quot;efforts-for-a-new-ui&quot;&gt;Efforts for a new UI&lt;/h2&gt;

&lt;p&gt;While we all knew about the problems of the old UI, nobody with enough UI coding
knowledge or time and motivation ever took up the banner to change the
situation. We did however get a great new UI contributed in Trino Gateway, and
that motivated some people in the community, especially Peter.&lt;/p&gt;

&lt;p&gt;Peter started with the same stack, pulled in maintainers like Mateusz Gajewski
and Manfred Moser, and kept working on improvements. We talk more about the
following aspects:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Problems with the old UI and its technology stack&lt;/li&gt;
  &lt;li&gt;Trino Gateway UI&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22697&quot;&gt;Roadmap issue&lt;/a&gt; and discussion around the new UI&lt;/li&gt;
  &lt;li&gt;What is the stack now?&lt;/li&gt;
  &lt;li&gt;Look at the
&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/core/trino-web-ui&quot;&gt;codebase&lt;/a&gt;,
tools, development, and
&lt;a href=&quot;/docs/current/admin/preview-web-interface.html&quot;&gt;documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Current status and next steps&lt;/li&gt;
  &lt;li&gt;What do we need from others?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Peter shows us the new UI from his development setup - the latest and greatest
set of features.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/admin/preview-web-interface.html&quot;&gt;Preview Web UI documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/core/trino-web-ui&quot;&gt;Preview Web UI codebase&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22697&quot;&gt;Roadmap issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Legacy Web UI documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to be the next guest.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino contributor call, 27th of February&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast 71 with Jan Waś about the new &lt;a href=&quot;/docs/current/connector/faker.html&quot;&gt;Faker
connector&lt;/a&gt;, 27th of
February&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>Out with the old file system</title>
      <link href="https://trino.io/blog/2025/02/10/old-file-system.html" rel="alternate" type="text/html" title="Out with the old file system" />
      <published>2025-02-10T00:00:00+00:00</published>
      <updated>2025-02-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/02/10/old-file-system</id>
      <content type="html" xml:base="https://trino.io/blog/2025/02/10/old-file-system.html">&lt;p&gt;What a long journey it has been! From the start Trino supported querying Hive
data and used libraries from the Hive and Hadoop ecosystem. With the release of
&lt;a href=&quot;/docs/current/release/release-470.html&quot;&gt;Trino 470&lt;/a&gt; we mark
another milestone to more features and better performance for data lake and
lakehouse querying with Trino. We deprecated the legacy file system support, and
will permanently remove them in an upcoming release.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;Trino always had a focus on performance and security. As a result we implemented
custom readers for file formats like Apache ORC and Apache Parquet many years
ago. We also have improved libraries for compression and decompression of files
from object storage and and implemented our own support for other table formats
with the Apache Iceberg, Delta Lake and Apache Hudi connectors.&lt;/p&gt;

&lt;p&gt;For the underlying object storage solutions and file systems, we originally
extended the libraries around the Hive system and added implementations for
Amazon S3, Azure Storage, Google Cloud Storage and others. Over time the
mismatch of the HDFS libraries and the cloud-centric usage with modern file
systems became more and more of a maintenance headache. It also represented an
unnecessary complexity overhead, resulted in performance problems, and forced us
to carry the Hadoop dependencies with all their baggage of old Java code and
security issues.&lt;/p&gt;

&lt;p&gt;In the end David Philips, as our file system lead, decided in 2022 that it was
time to write our own file system support as needed for Trino. By summer of 2023
and with Trino 419 a &lt;a href=&quot;https://github.com/trinodb/trino/pull/17498&quot;&gt;first support for
S3&lt;/a&gt; became available for the
Iceberg and Delta Lake connectors. Over a year later in September 2024 and with
&lt;a href=&quot;/docs/current/release/release-458.html&quot;&gt;Trino 458&lt;/a&gt;, we declared
the old file system support on top of the Hadoop libraries legacy and advised
users to migrate.&lt;/p&gt;

&lt;p&gt;Since then you are required to declare what file system you want to enable in
each catalog with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-azure.enabled=true&lt;/code&gt;,&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-gcs.enabled=true&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-s3.enabled=true&lt;/code&gt;. If you are truly using HDFS, or if you insist on
using the old legacy support you can also use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.hadoop.enabled=true&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;trino-470&quot;&gt;Trino 470&lt;/h2&gt;

&lt;p&gt;With the recent &lt;a href=&quot;/docs/current/release/release-470.html&quot;&gt;Trino 470
release&lt;/a&gt; from February
2025, we took the next step. All catalog configuration properties for using the
old, legacy support for accessing Azure Storage, Google Cloud Storage, S3, and
S3-compatible file systems are now &lt;strong&gt;deprecated&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These properties include all names starting with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.azure&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.cos&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.gcs&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.s3&lt;/code&gt;. The result of this deprecation is that Trino emits
warnings during the startup for each of these properties in the server log.&lt;/p&gt;

&lt;p&gt;We also removed all documentation for the old properties, leaving only relevant
migration guides in place.&lt;/p&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next steps&lt;/h2&gt;

&lt;p&gt;Within the next weeks or months we will completely remove all these properties
and the underlying code. We therefore renew our call out from numerous
contributor calls, Trino Community Broadcast episodes, and our Trino Fest and
Trino Summit events:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Stop using the old legacy file systems today.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you need help, have a look at the documentation for your connector, the file
system you use, and the migration guide for each file system:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Delta Lake connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Hive connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Hudi connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive.html&quot;&gt;Iceberg connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/file-system-azure.html&quot;&gt;Azure Storage file system support&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/file-system-gcs.html&quot;&gt;Google Cloud Storage file system support&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/object-storage/file-system-s3.html&quot;&gt;S3 file system support&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new systems are more stable and performant, and save you time and money.
Migrate today, and if you encounter any issues, or find that there are features
missing, ping us on &lt;a href=&quot;/slack./html&quot;&gt;Slack&lt;/a&gt; and chime in on the
&lt;a href=&quot;https://github.com/trinodb/trino/issues/24878&quot;&gt;roadmap issue for the removal of the legacy file system
support&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, David Phillips, Mateusz Gajewski</name>
        </author>
      

      <summary>What a long journey it has been! From the start Trino supported querying Hive data and used libraries from the Hive and Hadoop ecosystem. With the release of Trino 470 we mark another milestone to more features and better performance for data lake and lakehouse querying with Trino. We deprecated the legacy file system support, and will permanently remove them in an upcoming release.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/hadoop-trashcan.png" />
      
    </entry>
  
    <entry>
      <title>69: Client protocol improvements</title>
      <link href="https://trino.io/episodes/69.html" rel="alternate" type="text/html" title="69: Client protocol improvements" />
      <published>2025-01-30T00:00:00+00:00</published>
      <updated>2025-01-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/69</id>
      <content type="html" xml:base="https://trino.io/episodes/69.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;, Sr. Staff Software Engineer at 
&lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Follow are some highlights of the first release of 2025. It took us a bit longer to work through release blockers this time:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-469.html&quot;&gt;Trino 469&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FIRST&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt; clauses to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE ...
ADD COLUMN&lt;/code&gt; for Iceberg, MySQL, and MariaDB.&lt;/li&gt;
  &lt;li&gt;SSE-C in S3 security mapping for Delta Lake, Hive, Hudi, and Iceberg&lt;/li&gt;
  &lt;li&gt;Allow configuration for Google Cloud Storage endpoint with object storage
connectors.&lt;/li&gt;
  &lt;li&gt;Allow connection validation and add more stats for JDBC driver.&lt;/li&gt;
  &lt;li&gt;Remove support for connector-level event listeners.&lt;/li&gt;
  &lt;li&gt;Misc improvements for the Faker connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h2 id=&quot;other-news&quot;&gt;Other news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Python client 0.332.0 with spooling support&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-23-jan-2025&quot;&gt;Trino contributor call&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-wendigo&quot;&gt;Introducing wendigo&lt;/h2&gt;

&lt;p&gt;What can we say? Top contributor and maintainer, and all around hacker on Trino,
numerous Trino subprojects, Airlift, and beyond.&lt;/p&gt;

&lt;h2 id=&quot;main-topic&quot;&gt;Main topic&lt;/h2&gt;

&lt;p&gt;Let’s talk about the Trino client protocol. Following are some topics we cover:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is the client protocol for?&lt;/li&gt;
  &lt;li&gt;History of the client protocol&lt;/li&gt;
  &lt;li&gt;Available client drivers and client applications&lt;/li&gt;
  &lt;li&gt;Architecture and flow&lt;/li&gt;
  &lt;li&gt;Motivation to improve the protocol&lt;/li&gt;
  &lt;li&gt;Direct and spooling modes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mateusz walks through the presentation and Cole and Manfred ask a lot of
questions:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;/assets/episode/tcb69-client-protocol.pdf&quot;&gt;
        Presentation
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Mateusz show us his example and testing setup with Starburst Galaxy clusters
configured for spooling protocol use and shares some of the performance gains he
observes.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/episode/tcb69-client-protocol.pdf&quot;&gt;Presentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/client/client-protocol.html&quot;&gt;Client protocol documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/client-driver.html&quot;&gt;Available client drivers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/ecosystem/client-application.html&quot;&gt;Available client applications&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Join us for upcoming events and let us know if you want to be the next guest.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>68: Year of the Snake - Python UDFs</title>
      <link href="https://trino.io/episodes/68.html" rel="alternate" type="text/html" title="68: Year of the Snake - Python UDFs" />
      <published>2025-01-16T00:00:00+00:00</published>
      <updated>2025-01-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/68</id>
      <content type="html" xml:base="https://trino.io/episodes/68.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director/Open
Source Engineering and Trino maintainer at
&lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; -
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;David Phillips&lt;/a&gt;, Trino co-creator and maintainer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases&quot;&gt;Releases&lt;/h2&gt;

&lt;p&gt;Follow are some highlights of the Trino releases since episode 67:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-465.html&quot;&gt;Trino 465&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for customer-provided SSE key in S3 file system relevant for Hive,
Iceberg, Delta Lake and Hudi connectors.&lt;/li&gt;
  &lt;li&gt;Deterministic data, locale support, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;random_string&lt;/code&gt; function for the Faker
connector.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;extra_properties&lt;/code&gt; in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;geometry&lt;/code&gt; type in the PostgreSQL connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;docs/current/release/release-466.html&quot;&gt;Trino 466&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Remove Python requirement for Trino by replacing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;launcher&lt;/code&gt; script.&lt;/li&gt;
  &lt;li&gt;Improve client protocol throughput by introducing the spooling protocol and
ship it with documentation, including implementation in the JDBC driver and
the CLI.&lt;/li&gt;
  &lt;li&gt;Add support for data access control with Apache Ranger, including support for
column masking, row filtering, and audit logging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;docs/current/release/release-467.html&quot;&gt;Trino 467&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Change default for internal communication to HTTP/1.1.&lt;/li&gt;
  &lt;li&gt;Add support for OpenTelemetry tracing to the HTTP, Kafka, and MySQL event
listeners.&lt;/li&gt;
  &lt;li&gt;Remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;microdnf&lt;/code&gt; package manager from the Docker image.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$all_manifests&lt;/code&gt; metadata tables in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$transactions&lt;/code&gt; metadata table in the Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-468.html&quot;&gt;Trino 468&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add &lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Rename SQL routines to SQL user-defined functions.&lt;/li&gt;
  &lt;li&gt;Add cluster overview to the Preview Web UI.&lt;/li&gt;
  &lt;li&gt;Improve bucket execution for Hive and Iceberg.&lt;/li&gt;
  &lt;li&gt;Add support for non-transactional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statements for PostgreSQL.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;h2 id=&quot;other-news&quot;&gt;Other news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/#13&quot;&gt;Trino Gateway 13&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;Trino Summit recap&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2025/01/07/2024-and-beyond.html&quot;&gt;Trino in 2024 and beyond&lt;/a&gt;, answer
our survey!&lt;/li&gt;
  &lt;li&gt;December 2024 Trino maintainer and contributor calls took place virtually.&lt;/li&gt;
  &lt;li&gt;Trino Python client 0.332.0 includes support for spooling mode of client
protocol.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;user-defined-functions-in-trino&quot;&gt;User-defined functions in Trino&lt;/h2&gt;

&lt;p&gt;First there were &lt;a href=&quot;/docs/current/develop/functions.html&quot;&gt;custom plugins with user defined
functions&lt;/a&gt;, and for a long
time, that was all there is.&lt;/p&gt;

&lt;p&gt;In 2023, David contributed SQL user-defined functions, also known as SQL
routines, and we ran a &lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;competition for examples&lt;/a&gt;. Manfred wrote the docs and did a &lt;a href=&quot;/blog/2023/11/29/sql-training-4.html&quot;&gt;training session with
Dain and Martin&lt;/a&gt;. And even back then,
David had plans to add other languages, and started working on Python.&lt;/p&gt;

&lt;p&gt;At &lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;Trino Summit in 2024&lt;/a&gt; Martin Traverso announced the new upcoming feature in the keynote, and with
&lt;a href=&quot;/docs/current/release/release-468.html&quot;&gt;Trino 468&lt;/a&gt; we shipped
support for &lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;

&lt;p&gt;Why support Python for user-defined functions, as compared to just SQL? Simply
put, more is better, and Python is everywhere. We chat with David about the
details.&lt;/p&gt;

&lt;h2 id=&quot;development-history-and-collaboration&quot;&gt;Development history and collaboration&lt;/h2&gt;

&lt;p&gt;David tell us more about figuring out how to make it all work at all. He touches
on topics such as security, performance, deployment, monitoring, and
collaboration with other projects. We also talk about why other approaches like
using local CPython were discarded.&lt;/p&gt;

&lt;h2 id=&quot;architecture-and-consequences&quot;&gt;Architecture and consequences&lt;/h2&gt;

&lt;p&gt;In this discussion we talk try to cover the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;How does it all work?&lt;/li&gt;
  &lt;li&gt;What are some restrictions?&lt;/li&gt;
  &lt;li&gt;What performance can users expect?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s chat about this nesting:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/tcb68-python-udf-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;examples-and-demo&quot;&gt;Examples and demo&lt;/h2&gt;

&lt;p&gt;A simple example from the documentation:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;python_udf_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_parameter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_data_type&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;LANGUAGE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PYTHON&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;handler&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;python_function&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$$&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;python_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;err&quot;&gt;$$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;David shows us more, and we talk about the details.&lt;/p&gt;

&lt;h2 id=&quot;feedback-and-future-work&quot;&gt;Feedback and future work&lt;/h2&gt;

&lt;p&gt;We are looking for feedback:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;More examples for the documentation for our users&lt;/li&gt;
  &lt;li&gt;Use cases and experience testing the feature&lt;/li&gt;
  &lt;li&gt;Production deployment experiences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Future work depends on the feedback but definitely includes the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Performance improvements&lt;/li&gt;
  &lt;li&gt;Fine-tuning of available Python packages&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.python.org/&quot;&gt;Python&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://webassembly.org/&quot;&gt;WebAssembly (Wasm)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://chicory.dev/&quot;&gt;Chicory&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/udf.html&quot;&gt;Trino user-defined functions overview&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-wasm-python&quot;&gt;trino-wasm-python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;You are all invited to chat with us about development at the Trino contributor
call on the 23rd of January.&lt;/li&gt;
  &lt;li&gt;Join us on the 30th of January with Mateusz Gajewski to learn about client
protocol improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino in 2024 and beyond</title>
      <link href="https://trino.io/blog/2025/01/07/2024-and-beyond.html" rel="alternate" type="text/html" title="Trino in 2024 and beyond" />
      <published>2025-01-07T00:00:00+00:00</published>
      <updated>2025-01-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2025/01/07/2024-and-beyond</id>
      <content type="html" xml:base="https://trino.io/blog/2025/01/07/2024-and-beyond.html">&lt;p&gt;Wow, what an amazing year 2024 was for Trino! Martin Traverso presented about
the achievements and progress of the project at the &lt;a href=&quot;/blog/2024/12/18/trino-summit-2024-quick-recap.html&quot;&gt;recent Trino Summit
2024&lt;/a&gt;. Let me dive
deeper into the content of his keynote and elaborate some more about our amazing
plans for the future.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;statistics&quot;&gt;Statistics&lt;/h2&gt;

&lt;p&gt;In his first slide of the presentation &lt;strong&gt;Enduring with persistence to reach the
summit&lt;/strong&gt; Martin presented some of the amazing statistics of the year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Over 30 releases packed with features and improvements - &lt;a href=&quot;/docs/current/release.html#releases-2024&quot;&gt;Trino releases 436-467&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;5,000+ additional commits to the 40,000+ total commits since project start&lt;/li&gt;
  &lt;li&gt;225+ unique contributors in 2024, 925+ total&lt;/li&gt;
  &lt;li&gt;10.5k+ stars on GitHub&lt;/li&gt;
  &lt;li&gt;13,500+ Slack members&lt;/li&gt;
  &lt;li&gt;Trino Community Broadcast episodes 54-67&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;improvements&quot;&gt;Improvements&lt;/h2&gt;

&lt;p&gt;Some of the major improvements in Trino are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Access controls with
&lt;a href=&quot;/docs/current/security/opa-access-control.html&quot;&gt;Open Policy Agent&lt;/a&gt; and
&lt;a href=&quot;/docs/current/security/ranger-access-control.html&quot;&gt;Apache Ranger&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Improved observability with &lt;a href=&quot;/docs/current/admin/event-listeners-openlineage.html&quot;&gt;OpenLineage&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/admin/opentelemetry.html&quot;&gt;OpenTelemetry&lt;/a&gt;, OpenMetrics, and 
&lt;a href=&quot;/docs/current/admin/event-listeners-kafka.html&quot;&gt;Kafka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Significant &lt;a href=&quot;/docs/current/client/client-protocol.html&quot;&gt;client protocol&lt;/a&gt; improvements&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/udf/python.html&quot;&gt;Python user-defined functions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;New connectors such as &lt;a href=&quot;/docs/current/connector/faker.html&quot;&gt;Faker&lt;/a&gt;,
&lt;a href=&quot;/docs/current/connector/snowflake.html&quot;&gt;Snowflake&lt;/a&gt;, or
&lt;a href=&quot;/docs/current/connector/vertica.html&quot;&gt;Vertica&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Numerous improvements on object storage connectors and integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course we also paid a lot of attention to bug fixes and shipped tremendous
performance improvements.&lt;/p&gt;

&lt;h2 id=&quot;slides-and-video&quot;&gt;Slides and video&lt;/h2&gt;

&lt;p&gt;If you want to find out all the details, have a look at the
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-keynote.pdf&quot;&gt;&lt;strong&gt;slides&lt;/strong&gt;&lt;/a&gt;
and the video recording:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=wmR6kzOCo-I&quot;&gt;&lt;img src=&quot;https://img.youtube.com/vi/wmR6kzOCo-I/0.jpg&quot; alt=&quot;YouTube&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;other-projects&quot;&gt;Other projects&lt;/h2&gt;

&lt;p&gt;Martin also talked about the many improvements in other Trino projects such as
&lt;a href=&quot;https://trinodb.github.io/trino-gateway/&quot;&gt;Trino Gateway&lt;/a&gt;,
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt;, the new
&lt;a href=&quot;https://github.com/trinodb/trino-js-client&quot;&gt;trino-js-client&lt;/a&gt;, and the new
&lt;a href=&quot;https://github.com/trinodb/trino-csharp-client&quot;&gt;trino-csharp-client&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;plans-for-2025&quot;&gt;Plans for 2025&lt;/h2&gt;

&lt;p&gt;For 2025, we have some pretty big plans in addition to our continued software
supply chain attention, performance improvemsnts and bug fixes.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Secrets management and dynamic catalogs&lt;/li&gt;
  &lt;li&gt;Client protocol improvements for all client drivers&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22597&quot;&gt;Packaging improvements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;More connectors such as DuckDB, LanceDB, HsqlDB, Loki, …&lt;/li&gt;
  &lt;li&gt;Continued and even increased work on performance improvements&lt;/li&gt;
  &lt;li&gt;Research and prototype towards a next generation optimizer&lt;/li&gt;
  &lt;li&gt;SQL language improvements such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ASOF&lt;/code&gt; joins, …&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, what really happens in 2025 and Trino depends on you all. The project
lives and breathes only thanks to the efforts of all our contributors and
maintainers and we look forward to working with you all.&lt;/p&gt;

&lt;h2 id=&quot;trino-survey&quot;&gt;Trino survey&lt;/h2&gt;

&lt;p&gt;Besides filing issues, sending pull requests, and discussing topics on Slack and
GitHub, we also have some specific questions and would really appreciate your
feedback. Answering should take less than a minute.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://docs.google.com/forms/d/e/1FAIpQLSfrEIZ_5iyj17_hMJMdFhCIx9bQyHm6G-x6-CIq2VajURm6cQ/viewform?usp=sharing&quot;&gt;
        Help by answering the Trino survey
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;With Trino as a huge collaborative effort only one thing is for certain:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;2025 will be an exciting year for Commander Bun Bun, Trino, and the Trino project.&lt;/p&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Wow, what an amazing year 2024 was for Trino! Martin Traverso presented about the achievements and progress of the project at the recent Trino Summit 2024. Let me dive deeper into the content of his keynote and elaborate some more about our amazing plans for the future.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino Summit 2024 resources</title>
      <link href="https://trino.io/blog/2024/12/18/trino-summit-2024-quick-recap.html" rel="alternate" type="text/html" title="Trino Summit 2024 resources" />
      <published>2024-12-18T00:00:00+00:00</published>
      <updated>2024-12-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/12/18/trino-summit-2024-quick-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2024/12/18/trino-summit-2024-quick-recap.html">&lt;p&gt;What a view we had at the summit! Over 700 live attendees enjoyed the sessions
and learned more about Trino-related use cases and projects. Now it is time for
the additional 1000 registrants, our 13000+ Trino users on
&lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;, and everyone else in the Trino community
and beyond to enjoy the presentations and recordings at their leisure.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;day-1-sessions&quot;&gt;Day 1 sessions&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Enduring with persistence to reach the summit&lt;/strong&gt;
&lt;br /&gt;   Presented by Martin Traverso, co-creator of Trino and CTO at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/wmR6kzOCo-I&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-keynote.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Running Trino as exabyte-scale data warehouse&lt;/strong&gt;
&lt;br /&gt;   Presented by Alagappan Maruthappan from &lt;a href=&quot;/users.html#netflix&quot;&gt;Netflix&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/WuUS73QPuZE&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-netflix.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Data lake at Wise powered by Trino and Iceberg&lt;/strong&gt;
&lt;br /&gt;   Presented by Peter Kosztolanyi and Abdullah Alkhawatrah from &lt;a href=&quot;https://wise.com&quot;&gt;Wise&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/K5RmYtbeXAc&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Using Trino as a strangler fig&lt;/strong&gt;
&lt;br /&gt;   Presented by Trevor Kennedy from &lt;a href=&quot;https://www.fanduel.com/&quot;&gt;Fanduel&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/cVA5IPWdHRs&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-fanduel.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;A lakehouse that simply works&lt;/strong&gt;
&lt;br /&gt;   Presented by Vincenzo Cassaro from &lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt; 
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/6xdPRqpA8FA&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-prezi.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Empowering self-serve data analytics with a text-to-SQL assistant at LinkedIn&lt;/strong&gt; 
&lt;br /&gt;   Presented by Gaurav Ahlawat, Albert Chen, and Manas Bundele from
&lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/rl4GLNEVkjo&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-linkedin-ai.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;How Trino and dbt unleashed many-to-many interoperability at Bazaar&lt;/strong&gt;
&lt;br /&gt;   Presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from
  &lt;a href=&quot;/users.html#bazaar_technologies&quot;&gt;Bazaar&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/G9jafHdH8FY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-bazaar.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Maximizing cost efficiency in data analytics with Trino and Iceberg&lt;/strong&gt;
&lt;br /&gt;   Presented by Gopi Bhagavathula from &lt;a href=&quot;https://www.branch.io/&quot;&gt;Branch&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/Yaz7fwvOPdY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-branch.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Lessons and news from the AI world for Trino&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred Moser, panel moderator and Trino maintainer at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Gunther Hagleitner, CEO and Co-founder at &lt;a href=&quot;https://waii.ai/&quot;&gt;Waii&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Rong Rong, Software Engineer at &lt;a href=&quot;https://character.ai/&quot;&gt;CharacterAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;William Chang, Co-founder and CTO of &lt;a href=&quot;/users.html#canner&quot;&gt;Canner&lt;/a&gt; and
&lt;a href=&quot;/ecosystem/client-application.html#wren-ai&quot;&gt;WrenAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Mustafa Sakalsiz, Founder and CEO at &lt;a href=&quot;/users.html#peaka&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino co-creator and CTO at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/gobl6PhIWeE&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;day-2-sessions&quot;&gt;Day 2 sessions&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trino for observability at Intuit&lt;/strong&gt; 
&lt;br /&gt;   Presented by Ujjwal Sharma and Riya John from &lt;a href=&quot;https://www.intuit.com/&quot;&gt;Intuit&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/47dMrURt7us&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-intuit.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Hassle-free dynamic policy enforcement in Trino&lt;/strong&gt;
&lt;br /&gt;   Presented by Ramanathan Ramu and Pratham Desai from &lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/GAudNEmbvsc&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-linkedin-policy.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Empowering HugoBank’s digital services through Trino&lt;/strong&gt;
&lt;br /&gt;   Presented by Mustafa Mirza and Razi Moosa from &lt;a href=&quot;https://www.hugobank.com.pk&quot;&gt;HugoBank&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/51JVd25behQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-hugobank.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and security&lt;/strong&gt; 
&lt;br /&gt;   Presented by Sebastian Daberdaku from &lt;a href=&quot;https://cardoai.com&quot;&gt;CardoAI&lt;/a&gt; and
Jan Waś from &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/MGuOf45cGwA&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-cardoai.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Virtual view hierarchies with Trino&lt;/strong&gt;
&lt;br /&gt;   Presented by Rob Dickinson from &lt;a href=&quot;https://graylog.org/&quot;&gt;Graylog&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/z8eh_3vBpvg&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-graylog.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Opening up the Trino Gateway&lt;/strong&gt;
&lt;br /&gt;   Presented by Manfred Moser and Will Morrison from &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;, 
&lt;br /&gt;   Vishal Jadhav from &lt;a href=&quot;https://www.bloomberg.com/company/values/tech-at-bloomberg/&quot;&gt;Bloomberg&lt;/a&gt;, and Jaehoo Yoo from &lt;a href=&quot;/users.html#naver&quot;&gt;Naver&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/MiQEngRJk8g&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-trino-gateway.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Wvlet: A new flow-style query language for functional data modeling and interactive analysis&lt;/strong&gt;
&lt;br /&gt;   Presented by Taro L. Saito from &lt;a href=&quot;/users.html#treasuredata&quot;&gt;Treasure Data&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/ot7z7J6h9rM&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-wvlet.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Securing data pipelines at the storage layer&lt;/strong&gt;
&lt;br /&gt;   Presented by Andrew MacKay from &lt;a href=&quot;https://superna.io/&quot;&gt;Superna&lt;/a&gt;.
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/Lxr4Rzn27cw&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-superna.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Empowering pharmaceutical drug launches with Trino-powered sales data analytics&lt;/strong&gt;
&lt;br /&gt;   Presented by Harpreet Singh from &lt;a href=&quot;https://www.gilead.com/&quot;&gt;Gilead&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/ELsBGx1Sv3o&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Connecting to Trino with C# and ADO.net&lt;/strong&gt; 
&lt;br /&gt;   Presented by George Fischer from &lt;a href=&quot;https://www.microsoft.com&quot;&gt;Microsoft&lt;/a&gt;
&lt;br /&gt;   &lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://youtu.be/x2rF6IEjFK0&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2024/trino-summit-2024-csharp-client.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our thanks go out to all our speakers as well as our event sponsor:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/users.html#starburst&quot;&gt;
&lt;img src=&quot;/assets/images/logos/starburst.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See you at Trino Fest 2025, one of our &lt;a href=&quot;/community.html#events&quot;&gt;other events and
meetings&lt;/a&gt;, and on &lt;a href=&quot;/slack.html&quot;&gt;Trino
Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Monica, and Anna&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Monica Miller, Anna Schibli</name>
        </author>
      

      <summary>What a view we had at the summit! Over 700 live attendees enjoyed the sessions and learned more about Trino-related use cases and projects. Now it is time for the additional 1000 registrants, our 13000+ Trino users on Slack, and everyone else in the Trino community and beyond to enjoy the presentations and recordings at their leisure.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/recap-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>The long journey to Apache Ranger</title>
      <link href="https://trino.io/blog/2024/12/02/ranger.html" rel="alternate" type="text/html" title="The long journey to Apache Ranger" />
      <published>2024-12-02T00:00:00+00:00</published>
      <updated>2024-12-02T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/12/02/ranger</id>
      <content type="html" xml:base="https://trino.io/blog/2024/12/02/ranger.html">&lt;p&gt;&lt;a href=&quot;/ecosystem/add-on.html#apache-ranger&quot;&gt;Apache Ranger&lt;/a&gt; has
arrived! With the new &lt;a href=&quot;/docs/current/release/release-466.html&quot;&gt;Trino
466&lt;/a&gt; you all get another
jam-packed release of Trino awesomeness. One of the goodies is a new plugin for
access control for your data with Apache Ranger, and it has gone through a long
story to get here.&lt;/p&gt;

&lt;p&gt;Apache Ranger has a long history and wide adoption as an access control system
for data lakes using Hadoop and Hive. Since Trino brings fast analytics to this
space, and also supports modern data lakehouses and other data sources, Apache
Ranger is a natural fit for access control on a Trino-powered data platform.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;the-beginnings&quot;&gt;The beginnings&lt;/h2&gt;

&lt;p&gt;Apache Ranger has been in use with Trino for a long time - in fact there are
&lt;a href=&quot;https://github.com/trinodb/trino/pull/244&quot;&gt;early&lt;/a&gt;,
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1069&quot;&gt;rudimentary&lt;/a&gt; pull requests from
2019 that implemented some support. And even before then, various hacks existed.
In 2020, a plugin for PrestoSQL was added to Apache Ranger. Aakash Nand blogged
about &lt;a href=&quot;https://towardsdatascience.com/integrating-trino-and-apache-ranger-b808f6b96ad8&quot;&gt;Integrating Trino and Apache
Ranger&lt;/a&gt;
in 2021 to adjust for the changes to Trino. Jeff Xu followed up with
&lt;a href=&quot;https://medium.com/@jeff.xu.z/integrating-trino-and-apache-ranger-in-a-kerberos-secured-enterprise-environment-997c95cd10e9&quot;&gt;Integrating Trino and Apache Ranger in a Kerberos-secured enterprise
environment&lt;/a&gt;
in 2022, followed quickly by the addition of the Trino support to the Apache
Ranger repository.&lt;/p&gt;

&lt;h2 id=&quot;testing-and-container-images&quot;&gt;Testing and container images&lt;/h2&gt;

&lt;p&gt;However that was only half of the needed support. The Trino project moves very
fast with nearly weekly releases, so the best approach is to have the supporting
plugin in Trino directly so every release includes the relevant updates. &lt;a href=&quot;https://github.com/dprophet&quot;&gt;Erik
Anderson&lt;/a&gt; created a more mature plugin that was in
production use for quite a while for Trino. His &lt;a href=&quot;https://github.com/trinodb/trino/pull/13297&quot;&gt;pull request from July
2022&lt;/a&gt; included great background
reasoning for having the plugin in Trino. One of the issues that Erik solved for
the Trino project is testing. Trino plugins require the availability of a
container image for testing whatever integration. Apache Ranger did still not
ship a container in 2022, but thanks to the lobbying efforts of Erik this
changed and a container image became available over the months.&lt;/p&gt;

&lt;h2 id=&quot;a-long-sprint&quot;&gt;A long sprint&lt;/h2&gt;

&lt;p&gt;Unfortunately, focus changed and while the PR from Erik existed and was useful,
it never made it to merge due to waning priorities. That changed when &lt;a href=&quot;https://github.com/mneethiraj&quot;&gt;Madhan
Neethiraj&lt;/a&gt; from the Apache Ranger project stepped
up and created &lt;a href=&quot;https://github.com/trinodb/trino/pull/22675&quot;&gt;new PR&lt;/a&gt; in July 2024.&lt;/p&gt;

&lt;p&gt;We knew this could be another shot at it, and it would require a lot of work to
get it done, since we put a high focus on quality so that we can maintain the
Trino codebase for the long run. Monitoring all PRs regularly &lt;a href=&quot;https://github.com/mosabua&quot;&gt;I (Manfred
Moser)&lt;/a&gt; noticed it and jumped in with first help.&lt;/p&gt;

&lt;p&gt;Erik and other interested users chimed in.
&lt;a href=&quot;https://github.com/lozbrown&quot;&gt;lozbrown&lt;/a&gt; and Manfred helped with documentation
and getting other developers interested. The heavy technical reviews and lots of
guidance came from &lt;a href=&quot;https://github.com/ksobolew&quot;&gt;Krzysztof Sobolewski&lt;/a&gt; and
&lt;a href=&quot;https://github.com/kokosing&quot;&gt;Grzegorz Kokosiński&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;During the whole process, Madhan had to react to comments, update the code, and
also regularly rebase his PR to adjust for the constantly changing Trino
codebase in the master branch. Starburst recognized Madhan’s effort and
&lt;a href=&quot;https://www.starburst.io/community/trino-champions/&quot;&gt;featured him as Starburst Trino
Champion&lt;/a&gt;. Interestingly,
the container image ended up not being used for testing, however it will be
crucially important for many users deploying Apache Ranger on Kubernetes anyway.
Nearly 400 comments and over four months later we all got to celebrate. The
Trino maintainer Grzegorz took on the responsibility and merged the PR. &lt;a href=&quot;https://github.com/ebyhr&quot;&gt;Yuya
Ebihara&lt;/a&gt; and &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin
Traverso&lt;/a&gt; followed up with
&lt;a href=&quot;https://github.com/trinodb/trino/pull/24238&quot;&gt;minor&lt;/a&gt;
&lt;a href=&quot;https://github.com/trinodb/trino/pull/24252&quot;&gt;cleanups&lt;/a&gt;, and we finally shipped
the plugin as part of &lt;a href=&quot;/docs/current/release/release-466.html&quot;&gt;Trino
466&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;A huge congratulations and thank you goes out to everyone involved.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now it is your turn to have a look at the
&lt;a href=&quot;/docs/current/security/apache-ranger-access-control.html&quot;&gt;documentation&lt;/a&gt;,
learn more about Trino and Apache Ranger, and maybe even proceed to help us
improve the integration.&lt;/p&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next steps&lt;/h2&gt;

&lt;p&gt;Beyond our celebration, more tasks are waiting for all of us:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Test it out in your usage and migrate from any old or custom versions.&lt;/li&gt;
  &lt;li&gt;Help us improve the
&lt;a href=&quot;/docs/current/security/apache-ranger-access-control.html&quot;&gt;documentation&lt;/a&gt;
significantly to allow easier adoption.&lt;/li&gt;
  &lt;li&gt;Work with lozbrown on adding support to the &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm chart&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Check out the codebase and help us fix bugs and add features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And last, but not least - join us all to celebrate Trino at the upcoming &lt;a href=&quot;/blog/2024/11/22/trino-summit-2024-lineup.html&quot;&gt;Trino
Summit 2024 for two days of amazing sessions and interaction with your peers
from the Trino community&lt;/a&gt;
and the &lt;a href=&quot;/community.html#events&quot;&gt;Trino Contributor Call&lt;/a&gt; for
more open community chat and discussion.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Apache Ranger has arrived! With the new Trino 466 you all get another jam-packed release of Trino awesomeness. One of the goodies is a new plugin for access control for your data with Apache Ranger, and it has gone through a long story to get here. Apache Ranger has a long history and wide adoption as an access control system for data lakes using Hadoop and Hive. Since Trino brings fast analytics to this space, and also supports modern data lakehouses and other data sources, Apache Ranger is a natural fit for access control on a Trino-powered data platform.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/apache-ranger.png" />
      
    </entry>
  
    <entry>
      <title>The glorious lineup for Trino Summit 2024</title>
      <link href="https://trino.io/blog/2024/11/22/trino-summit-2024-lineup.html" rel="alternate" type="text/html" title="The glorious lineup for Trino Summit 2024" />
      <published>2024-11-22T00:00:00+00:00</published>
      <updated>2024-11-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/11/22/trino-summit-2024-lineup</id>
      <content type="html" xml:base="https://trino.io/blog/2024/11/22/trino-summit-2024-lineup.html">&lt;p&gt;We just wrapped up our mini training series &lt;a href=&quot;/blog/2024/11/21/sql-basecamps-view.html&quot;&gt;SQL basecamps before Trino
Summit&lt;/a&gt;, and now Trino Summit 2024
is less than three busy weeks away. It’s a good thing that we have also been
working hard on all the preparations for the summit. Everything is coming
together, and we are excited to share the full lineup for the free, virtual, two
day event today.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In &lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;our first glimpse at the summit&lt;/a&gt; we were able to share a few sessions with
more details. Now have a look at the whole lineup with speakers from all these
and many other companies:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2024/summit-wall.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Make sure you register to get up to date information and more details for all
the sessions. It will allow you to join us live, chat with the speakers during
the event. You will also get important session follow up information, including
recordings and slide decks becoming available, so you can review, watch anything
you missed, and share sessions with your peers.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; target=&quot;_blank&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;[…]mpaign=NORAM-FY25-Q4-CM-Trino-Summit-2024&amp;amp;utm_content=blog-3&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;keynote&quot;&gt;Keynote&lt;/h2&gt;

&lt;p&gt;In the keynote &lt;strong&gt;Enduring with persistence to reach the summit&lt;/strong&gt; Martin
Traverso, co-creator of Trino and CTO at
&lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;, covers the developments from
2024 in the Trino projects and the Trino community. Martin also reveals details
about new features, new projects, and plans for 2025.&lt;/p&gt;

&lt;h2 id=&quot;panel-discussion&quot;&gt;Panel discussion&lt;/h2&gt;

&lt;p&gt;The hype and reality of AI has swept through the industry. In the panel
discussion &lt;strong&gt;Lessons and news from the AI world for Trino&lt;/strong&gt;, Manfred Moser is
moderating experts from the community:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Gunther Hagleitner, CEO and Co-founder at &lt;a href=&quot;https://waii.ai/&quot;&gt;Waii&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Rong Rong, Software Engineer at &lt;a href=&quot;https://character.ai/&quot;&gt;CharacterAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;William Chang, Co-founder and CTO of &lt;a href=&quot;/users.html#canner&quot;&gt;Canner&lt;/a&gt; and
&lt;a href=&quot;/ecosystem/client#wren-ai&quot;&gt;WrenAI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Mustafa Sakalsiz, Founder and CEO at &lt;a href=&quot;/users.html#peaka&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino co-creator and CTO at &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All panelists have have extensive experience with AI and Trino, and will share
their knowledge and different perspectives.&lt;/p&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;The following sessions allow our speakers to really dig into the details of
their topic:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and
security&lt;/strong&gt; presented by Sebastian Daberdaku from
&lt;a href=&quot;https://cardoai.com/&quot;&gt;CardoAI&lt;/a&gt; and Jan Waś from
&lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Trino for Observability at Intuit&lt;/strong&gt; presented by Ujjwal Sharma and Riya John
from &lt;a href=&quot;https://www.intuit.com/&quot;&gt;Intuit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Opening up the Trino Gateway&lt;/strong&gt; presented by the Trino Gateway maintainers&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Data Lake at Wise powered by Trino and Iceberg&lt;/strong&gt; presented by Peter
Kosztolanyi and Abdallah Alkhawatrah from &lt;a href=&quot;https://wise.com&quot;&gt;Wise&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Hassle-free dynamic policy enforcement in Trino&lt;/strong&gt; presented by Ramanathan
Ramu and Pratham Desai from &lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Empowering self-serve data analytics with a text-to-SQL assistant at
LinkedIn&lt;/strong&gt; presented by Gaurav Ahlawat, Albert Chen, and Manas Bundele from
&lt;a href=&quot;/users.html#linkedin&quot;&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;A Lakehouse that simply works&lt;/strong&gt; presented by Vincenzo Cassaro from
  &lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Securing data pipelines at the storage layer&lt;/strong&gt; presented by Andrew MacKay
from &lt;a href=&quot;https://superna.io/&quot;&gt;Superna&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Maximizing cost efficiency in data analytics with Trino and Iceberg&lt;/strong&gt;
presented by Gopi Bhagavathula from &lt;a href=&quot;https://www.branch.io/&quot;&gt;Branch&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Wvlet: A new flow-style query language for functional data modeling and
interactive analysis&lt;/strong&gt; presented by Taro L. Saito from &lt;a href=&quot;/users.html#treasuredata&quot;&gt;Treasure
Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Running Trino as exabyte-scale data warehouse&lt;/strong&gt; presented by Alagappan
Maruthappan from &lt;a href=&quot;/users.html#netflix&quot;&gt;Netflix&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;lightning-talks&quot;&gt;Lightning talks&lt;/h2&gt;

&lt;p&gt;Our lightning talks provide inspiration with some great examples of Trino
adoption and usage:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Using Trino as a strangler fig&lt;/strong&gt; presented by Trevor Kennedy from
&lt;a href=&quot;https://www.fanduel.com/&quot;&gt;Fanduel&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Virtual view hierarchies with Trino&lt;/strong&gt; presented by Rob Dickinson from
&lt;a href=&quot;https://graylog.org/&quot;&gt;Graylog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Empowering HugoBank’s digital services through Trino&lt;/strong&gt; presented by Mustafa
Mirza and Razi Moosa from &lt;a href=&quot;https://www.hugobank.com.pk&quot;&gt;HugoBank&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;How Trino and dbt unleashed many-to-many interoperability at Bazaar&lt;/strong&gt;
presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from
&lt;a href=&quot;/users.html#bazaar_technologies&quot;&gt;Bazaar&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Connecting to Trino with C# and ADO.net&lt;/strong&gt; presented by George Fischer from
&lt;a href=&quot;https://www.microsoft.com&quot;&gt;Microsoft&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our special thanks go out to all our speakers as well as our event sponsor:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/users.html#starburst&quot;&gt;
&lt;img src=&quot;/assets/images/logos/starburst.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See you on the summit.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Monica, and Anna&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Monica Miller, Anna Schibli</name>
        </author>
      

      <summary>We just wrapped up our mini training series SQL basecamps before Trino Summit, and now Trino Summit 2024 is less than three busy weeks away. It’s a good thing that we have also been working hard on all the preparations for the summit. Everything is coming together, and we are excited to share the full lineup for the free, virtual, two day event today.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/lineup-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>View the SQL basecamps before Trino Summit</title>
      <link href="https://trino.io/blog/2024/11/21/sql-basecamps-view.html" rel="alternate" type="text/html" title="View the SQL basecamps before Trino Summit" />
      <published>2024-11-21T00:00:00+00:00</published>
      <updated>2024-11-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/11/21/sql-basecamps-view</id>
      <content type="html" xml:base="https://trino.io/blog/2024/11/21/sql-basecamps-view.html">&lt;p&gt;Trino Summit is inching closer fast, and we are busy with all the preparation.
Nevertheless, we thought we bring you some more SQL and Trino-related training.
The two live classes from our &lt;a href=&quot;/blog/2024/10/07/sql-basecamps.html&quot;&gt;SQL basecamps before Trino Summit&lt;/a&gt; are now available for you all to enjoy, just in
case you missed it.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the two classes I teamed up with Dain Sundstrom and Martin Traverso, and
created a interview-style training classes. Hopefully you learned something from
their insights, and my guidance and questions.&lt;/p&gt;

&lt;p&gt;Check out the two session recordings and the supporting material:&lt;/p&gt;

&lt;h2 id=&quot;moving-supplies&quot;&gt;Moving supplies&lt;/h2&gt;

&lt;p&gt;In the first episode &lt;strong&gt;SQL basecamp 1 – Moving supplies&lt;/strong&gt; Dain and I discussed
the core concepts of a Trino-powered lakehouse, getting data in and maintaining
the lakehouse.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
  &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://trinodb.github.io/presentations/presentations/moving-supplies/index.html&quot;&gt;
    Look at the slides
  &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/LyBSHiCd2A8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;getting-ready-to-summit&quot;&gt;Getting ready to summit&lt;/h2&gt;

&lt;p&gt;The second episode &lt;strong&gt;SQL Basecamp 2 – Getting ready to summit&lt;/strong&gt; builds on the
foundation established in episode 1. Martin and I discussed some further details
for lakehouse usage and then looked at structural data types and views.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
  &lt;a class=&quot;btn btn-pink&quot; target=&quot;_blank&quot; href=&quot;https://trinodb.github.io/presentations/presentations/getting-ready-to-summit/index.html&quot;&gt;
    Look at the slides
  &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/32uGABdBCTQ&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;next-up-trino-summit&quot;&gt;Next up, Trino Summit&lt;/h2&gt;

&lt;p&gt;If you think those two sessions were great, how about two days worth of great
presentations at Trino Summit?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024&amp;amp;utm_content=sql-series-recap-blog&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Trino Summit is inching closer fast, and we are busy with all the preparation. Nevertheless, we thought we bring you some more SQL and Trino-related training. The two live classes from our SQL basecamps before Trino Summit are now available for you all to enjoy, just in case you missed it.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/sql-basecamps-2024.png" />
      
    </entry>
  
    <entry>
      <title>Trino and Javascript?! YES!</title>
      <link href="https://trino.io/blog/2024/11/18/javascript.html" rel="alternate" type="text/html" title="Trino and Javascript?! YES!" />
      <published>2024-11-18T00:00:00+00:00</published>
      <updated>2024-11-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/11/18/javascript</id>
      <content type="html" xml:base="https://trino.io/blog/2024/11/18/javascript.html">&lt;p&gt;Trino is written in Java. Trino contributors and maintainers are often veterans
in the Java ecosystem and community, and Trino is very modern when it comes to
Java. For example, Trino now requires the latest Java version and actively uses
new features.&lt;/p&gt;

&lt;p&gt;When it comes to JavaScript however, the story is a bit more complicated. Of
course, JavaScript is commonly used in the Trino ecosystem and codebase. Let’s
look at some of the specifics.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;client-driver-and-applications&quot;&gt;Client driver and applications&lt;/h2&gt;

&lt;p&gt;Client applications that allow users to submit queries to Trino, and then
receive the results are written in numerous languages. Trino has good support
for &lt;a href=&quot;/ecosystem/index.html#clients&quot;&gt;many of them&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks to the collaboration with &lt;a href=&quot;https://github.com/regadas&quot;&gt;Filipe Regadas&lt;/a&gt;
and the contribution of his JavaScript client driver to the Trino community, we
now have an official
&lt;a href=&quot;https://github.com/trinodb/trino-js-client&quot;&gt;trino-js-client&lt;/a&gt; project. After his
initial donation we have applied numerous improvements and recently cut our
first release.&lt;/p&gt;

&lt;p&gt;The client is already used in the &lt;a href=&quot;/ecosystem/client#vscode&quot;&gt;VisualCode
support&lt;/a&gt;, the &lt;a href=&quot;/ecosystem/client#emacs&quot;&gt;Emacs
support&lt;/a&gt;, the example project discussed
in &lt;a href=&quot;/episodes/63.html&quot;&gt;Trino Community Broadcast episode 63&lt;/a&gt;,
and numerous other applications.&lt;/p&gt;

&lt;p&gt;And we have big plans as well:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for more authentication methods supported in Trino&lt;/li&gt;
  &lt;li&gt;Improve documentation and example projects&lt;/li&gt;
  &lt;li&gt;Add support for the new spooling client protocol from Trino&lt;/li&gt;
  &lt;li&gt;Test with Trino Gateway and adjust as needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While this project is a great addition for many users of Trino and their custom
web applications, there are numerous other usages of JavaScript in the project.&lt;/p&gt;

&lt;h2 id=&quot;user-interfaces&quot;&gt;User interfaces&lt;/h2&gt;

&lt;p&gt;Web-based user interfaces are one important use of JavaScript. Trino includes
the &lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Trino Web UI&lt;/a&gt; and
the ongoing effort to replace it with a more modern and feature rich UI -
currently called the &lt;a href=&quot;/docs/current/admin/preview-web-interface.html&quot;&gt;Preview
UI&lt;/a&gt;. It was
inspired by the replacement of the legacy UI for &lt;a href=&quot;https://trinodb.github.io/trino-gateway/&quot;&gt;Trino
Gateway&lt;/a&gt; with a new UI based on
current tools and libraries.&lt;/p&gt;

&lt;p&gt;All three user interfaces require constant work in terms of upkeep to current
libraries, bug fixes, and addition of new features.&lt;/p&gt;

&lt;h2 id=&quot;other-projects&quot;&gt;Other projects&lt;/h2&gt;

&lt;p&gt;Beyond the user interfaces we also provide a &lt;a href=&quot;https://github.com/trinodb/grafana-trino&quot;&gt;plugin for
Grafana&lt;/a&gt; that is mostly written in
Javascript, and there might be more projects on the way.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h2&gt;

&lt;p&gt;The skills and experience needed for all these JavaScript-based efforts are
different enough to ensure that there are developers out there who can help in
these efforts without knowing much about Trino and Java.&lt;/p&gt;

&lt;p&gt;If that is you, we want to hear from you. And if you are also knowledgable in
Trino, Java, and many other things, and also interested to help on the
JavaScript stuff, we also want to hear from you. There is always more stuff we
want to get done and we need your help.&lt;/p&gt;

&lt;p&gt;So have a look at the codebase that interests you the most, chat with us on
&lt;a href=&quot;/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, join an &lt;a href=&quot;/community.html#events&quot;&gt;upcoming Trino contributor
call&lt;/a&gt; and &lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;Trino Summit&lt;/a&gt;, and let me know if you would be
interested in a regular Trino JavaScript call - for example monthly?&lt;/p&gt;

&lt;p&gt;And if you don’t want to code in Java or JavaScript? Well, you can help us write
&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/docs&quot;&gt;documentation in Markdown&lt;/a&gt;,
work on the &lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Python client&lt;/a&gt;, the
&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;Go client&lt;/a&gt;, or maybe even
contribute a client we don’t even have yet.&lt;/p&gt;

&lt;p&gt;In all cases, we look forward to your help.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Trino is written in Java. Trino contributors and maintainers are often veterans in the Java ecosystem and community, and Trino is very modern when it comes to Java. For example, Trino now requires the latest Java version and actively uses new features. When it comes to JavaScript however, the story is a bit more complicated. Of course, JavaScript is commonly used in the Trino ecosystem and codebase. Let’s look at some of the specifics.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/javascript-small.png" />
      
    </entry>
  
    <entry>
      <title>67: Extra speed with Exasol and Trino</title>
      <link href="https://trino.io/episodes/67.html" rel="alternate" type="text/html" title="67: Extra speed with Exasol and Trino" />
      <published>2024-10-30T00:00:00+00:00</published>
      <updated>2024-10-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/67</id>
      <content type="html" xml:base="https://trino.io/episodes/67.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt; - 
&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://www.firebolt.io/&quot;&gt;Firebolt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/thomas-bestfleisch/&quot;&gt;Thomas Bestfleisch&lt;/a&gt;, 
Senior Product Manager at &lt;a href=&quot;https://www.exasol.com/&quot;&gt;Exasol&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;Follow are some highlights of the recent Trino releases:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-461.html&quot;&gt;Trino 461&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;add_files&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;add_files_from_table&lt;/code&gt; procedures in the
Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-462.html&quot;&gt;Trino 462&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for read operations when using the Unity catalog as Iceberg REST
catalog in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Improve performance and memory usage when decoding data in the CLI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-463.html&quot;&gt;Trino 463&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Enable HTTP/2 for internal communication by default.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timezone()&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Include table functions with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW FUNCTIONS&lt;/code&gt; output.&lt;/li&gt;
  &lt;li&gt;Add support for writing change data feed when deletion vector is enabled to
the Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-464.html&quot;&gt;Trino 464&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Require JDK 23 to run Trino.&lt;/li&gt;
  &lt;li&gt;Add the Faker connector.&lt;/li&gt;
  &lt;li&gt;Add the Vertica connector.&lt;/li&gt;
  &lt;li&gt;Remove the Accumulo connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As always, numerous performance improvements, bug fixes, and other features were
added as well.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino maintainer call - great sync with some exciting news coming to the community soon.&lt;/li&gt;
  &lt;li&gt;Trino contributor call - &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-24-oct-2024&quot;&gt;recording and minutes available now&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Trino Kubernetes operator meeting - minutes coming soon.&lt;/li&gt;
  &lt;li&gt;Trino Summit call for speakers closed - stay tuned for announcements and
&lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;don’t forget to register&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-thomas-and-exasol&quot;&gt;Introducing Thomas and Exasol&lt;/h2&gt;

&lt;p&gt;Exasol is a lightning fast, in-memory database for analytics. And this is not
just a marketing slogan. Exasol has been at the top of the TPC-H benchmarks for
a long time now. Thomas tells more about the database and his role.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/exasol-small.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;exasol-and-trino&quot;&gt;Exasol and Trino&lt;/h2&gt;

&lt;p&gt;Trino and Exasol bridge the gap between extreme performance with in-memory usage
from Exasol, and massive scale from a lakehouse with Trino.&lt;/p&gt;

&lt;p&gt;We learn more about Exasol as Thomas guides us through his &lt;a href=&quot;/assets/episode/tcb67-exasol.pdf&quot;&gt;presentation about
Exasol and Trino&lt;/a&gt;, and take
the opportunity to question him for more details.&lt;/p&gt;

&lt;p&gt;The pull request for the Exasol connector has been a long time in the works and
was finally merged for Trino 452. We talk about the motivation, the process,
the results, and the future for the connector.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.exasol.com/&quot;&gt;Exasol&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/exasol.html&quot;&gt;Trino’s Exasol connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.exasol.com/exasol-saas/&quot;&gt;Exasol SaaS&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/exasol/ai-lab&quot;&gt;Exasol AI lab&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://hub.docker.com/r/exasol/docker-db&quot;&gt;Exasol container&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/10/07/sql-basecamps.html&quot;&gt;SQL basecamps before Trino Summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/10/17/trino-summit-2024-tease.html&quot;&gt;Trino Summit 2024&lt;/a&gt;:
Information about first sessions and more available. Call for speakers closed.
Announcements coming soon.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>A glimpse at the summit</title>
      <link href="https://trino.io/blog/2024/10/17/trino-summit-2024-tease.html" rel="alternate" type="text/html" title="A glimpse at the summit" />
      <published>2024-10-17T00:00:00+00:00</published>
      <updated>2024-10-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/10/17/trino-summit-2024-tease</id>
      <content type="html" xml:base="https://trino.io/blog/2024/10/17/trino-summit-2024-tease.html">&lt;p&gt;Our efforts around &lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt; are ramping up and the event
is creeping closer and closer. We are really looking forward to the two-day,
free, virtual event in December about all things Trino.&lt;/p&gt;

&lt;p&gt;While we are working hard to put together the &lt;a href=&quot;/blog/2024/10/07/sql-basecamps.html&quot;&gt;SQL basecamps before Trino Summit
training sessions&lt;/a&gt; and &lt;a href=&quot;/community.html#events&quot;&gt;other community
events&lt;/a&gt;, a number of your awesome peers
from the Trino community submitted session proposals, and we are excited to
share that glimpse on the agenda for Trino Summit 2024.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;first-batch-of-sessions&quot;&gt;First batch of sessions&lt;/h2&gt;

&lt;p&gt;Let’s see what already settled on the agenda.&lt;/p&gt;

&lt;h3 id=&quot;running-trino-as-exabyte-scale-data-warehouse&quot;&gt;Running Trino as exabyte-scale data warehouse&lt;/h3&gt;

&lt;p&gt;Presented by Alagappan Maruthappan from &lt;a href=&quot;https://netflix.com&quot;&gt;Netflix&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Netflix operates over 15 Trino clusters, efficiently handling more than 10
million queries each month. As the initial creator of the Apache Iceberg,
Netflix has over 1 million Iceberg tables extensively using the Trino Iceberg
connector. In this session we talk about the operational challenges faced,
internal efficiency improvements, and our experience with upgrading to the
latest Trino version.&lt;/p&gt;

&lt;h3 id=&quot;a-lakehouse-that-simply-works&quot;&gt;A Lakehouse that simply works&lt;/h3&gt;

&lt;p&gt;Presented by Vincenzo Cassaro from &lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the billions of tech and vendor proposal, it’s easy to loose track of what
truly matters. Vincenzo would like to show how a simple combination of established,
maintained, open source technologies can make a lakehouse that truly works for a
150M users company.&lt;/p&gt;

&lt;h3 id=&quot;how-trino-and-dbt-unleashed-many-to-many-interoperability-at-bazaar&quot;&gt;How Trino and dbt unleashed many-to-many interoperability at Bazaar&lt;/h3&gt;

&lt;p&gt;Presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from
&lt;a href=&quot;/users.html#bazaar_technologies&quot;&gt;Bazaar&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Learn how Bazaar leveraged the combined power of Trino and dbt to scale their
data platform effectively. This talk delves into the strategies and technologies
used to enable many-to-many integration, fueling data-driven decision-making
across the organization.&lt;/p&gt;

&lt;h3 id=&quot;maximizing-cost-efficiency-in-data-analytics-with-trino-and-iceberg&quot;&gt;Maximizing cost efficiency in data analytics with Trino and Iceberg&lt;/h3&gt;

&lt;p&gt;Presented by Gopi Bhagavathula from &lt;a href=&quot;https://www.branch.io/&quot;&gt;Branch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At Branch, we realized that our existing architecture, was not only expensive
but also becoming unsustainable as data volumes grew for one of our business
units and we decided to adopt Trino and Apache Iceberg. Our journey of migrating
from Apache Druid to Trino and Iceberg taught us that the right combination of
tools can transform data analytics for one of our internal business units,
offering the perfect balance between cost savings, performance, and scalability.
Learn more how we achieved 7-figure savings with a few “compromises”.&lt;/p&gt;

&lt;h3 id=&quot;using-trino-as-a-strangler-fig&quot;&gt;Using Trino as a strangler fig&lt;/h3&gt;

&lt;p&gt;Presented by Trevor Kennedy from &lt;a href=&quot;https://www.fanduel.com/&quot;&gt;Fanduel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This talk discusses how FanDuel uses Trino to migrate analysts from Redshift to
Delta Lake using Martin Fowler’s Strangler Fig pattern. Trino slowly took roots
after initial trials, started replacing parts of the legacy system, and
eventually will be a complete replacement with a shadow of the original system.&lt;/p&gt;

&lt;h3 id=&quot;enduring-with-persistence-to-reach-the-summit&quot;&gt;Enduring with persistence to reach the summit&lt;/h3&gt;

&lt;p&gt;Presented by Martin Traverso from &lt;a href=&quot;/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the keynote Martin presents the latest and greatest news from the Trino
project and the Trino community. With more contributors, more maintainers, and a
larger community we got a lot done since Trino Fest in June. Find out the
details from the co-creator of Trino.&lt;/p&gt;

&lt;p&gt;Surely, you don’t need any more convincing and you are ready to proceed to&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=blog-2&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;continued-call-for-speakers&quot;&gt;Continued call for speakers&lt;/h2&gt;

&lt;p&gt;Now that you registered and saw what others have submitted and got accepted, we
are sure you are thinking:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Well, thats interesting, but I can submit a talk like that and even better!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We agree and know you are up to it, so go ahead and submit a proposal:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://sessionize.com/trino-summit-2024&quot;&gt;
        Submit a talk!
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;And if necessary, check the &lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;original announcement&lt;/a&gt; for more tips and ideas.&lt;/p&gt;

&lt;h2 id=&quot;sponsor-trino-summit&quot;&gt;Sponsor Trino Summit&lt;/h2&gt;

&lt;p&gt;To make the event a smashing hit, we are also looking for more sponsors.
Starburst, as the organizing sponsor of the event, is excited and interested to
collaborating with other organizations from the Trino community. If you are
interested in sponsoring, email
&lt;a href=&quot;mailto:events@starburstdata.com?subject=Sponsor%20Trino%20Summit&quot;&gt;events@starburstdata.com&lt;/a&gt;
for information.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Monica Miller, Anna Schibli</name>
        </author>
      

      <summary>Our efforts around Trino Summit 2024 are ramping up and the event is creeping closer and closer. We are really looking forward to the two-day, free, virtual event in December about all things Trino. While we are working hard to put together the SQL basecamps before Trino Summit training sessions and other community events, a number of your awesome peers from the Trino community submitted session proposals, and we are excited to share that glimpse on the agenda for Trino Summit 2024.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/lineup-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>A Kubernetes operator for Trino?</title>
      <link href="https://trino.io/blog/2024/10/10/operator.html" rel="alternate" type="text/html" title="A Kubernetes operator for Trino?" />
      <published>2024-10-10T00:00:00+00:00</published>
      <updated>2024-10-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/10/10/operator</id>
      <content type="html" xml:base="https://trino.io/blog/2024/10/10/operator.html">&lt;p&gt;Trino is deployed everywhere – on-premise, in private data centers, in the cloud
with hosting providers, on bare metal servers, on virtual machines, and with
containers. With all these options for deployments, a Kubernetes-based platform
with a container emerged as the most widely used approach.&lt;/p&gt;

&lt;p&gt;The Trino project caters for this usage with our &lt;a href=&quot;/docs/current/installation/containers.html&quot;&gt;container
images&lt;/a&gt; for every
release and our &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm chart&lt;/a&gt;. However we keep
hearing from people who want to use a Kubernetes operator…&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;existing-operators&quot;&gt;Existing operators&lt;/h2&gt;

&lt;p&gt;We know that various companies have Kubernetes operators developed internally,
and we also know that open source ones exist, for example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/stackabletech/trino-operator&quot;&gt;trino-operator&lt;/a&gt; from
Stackable with integration in
&lt;a href=&quot;https://github.com/stackabletech/trino-lb&quot;&gt;trino-lb&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://charmhub.io/trino-k8s&quot;&gt;Charmed Trino K8s Operator&lt;/a&gt; from Canonical&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ideally these separate efforts can combine their work, and create a great
operator in the Trino project that is closely aligned with Trino itself, and
also suitable for future integration with Trino Gateway. In fact, the Trino
Gateway is a good example where different parties came together and considerably
innovated together. Hopefully we can achieve the same with the operator. It can
still be expandable and modular to suite for specific needs on different
platforms and for different users.&lt;/p&gt;

&lt;p&gt;We also know that this is &lt;a href=&quot;https://github.com/trinodb/trino/issues/396&quot;&gt;a long standing community wish from the
issue&lt;/a&gt; and various discussions with
users.&lt;/p&gt;

&lt;h2 id=&quot;discussing-next-steps&quot;&gt;Discussing next steps&lt;/h2&gt;

&lt;p&gt;However there are some complications such as choice of programming language or
commitment to help within the Trino project as subproject maintainer. We kicked
off some of these discussion in the past at Trino contributor meetings, and hope
that now is a good time to continue.&lt;/p&gt;

&lt;p&gt;To that end we are arranging a community meeting:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Virtual video call&lt;/li&gt;
  &lt;li&gt;30th of October 2024&lt;/li&gt;
  &lt;li&gt;8:00 PDT / 11:00 EDT / 15:00 GMT / 16:00 CET&lt;/li&gt;
  &lt;li&gt;Invite available from Manfred on Trino Slack or via email:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;mailto:manfred@starburst.io?subject=trino-k8s-operator&quot;&gt;
        Tell Manfred you want to join
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;We will also post connection details on the #kubernetes channel and we are
collecting related discussion points on
&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-kubernetes-operator-discussion-30-oct-2024&quot;&gt;our contributor meeting page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Looking forward to a great discussion.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso</name>
        </author>
      

      <summary>Trino is deployed everywhere – on-premise, in private data centers, in the cloud with hosting providers, on bare metal servers, on virtual machines, and with containers. With all these options for deployments, a Kubernetes-based platform with a container emerged as the most widely used approach. The Trino project caters for this usage with our container images for every release and our Helm chart. However we keep hearing from people who want to use a Kubernetes operator…</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/kubernetes.png" />
      
    </entry>
  
    <entry>
      <title>SQL basecamps before Trino Summit</title>
      <link href="https://trino.io/blog/2024/10/07/sql-basecamps.html" rel="alternate" type="text/html" title="SQL basecamps before Trino Summit" />
      <published>2024-10-07T00:00:00+00:00</published>
      <updated>2024-10-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/10/07/sql-basecamps</id>
      <content type="html" xml:base="https://trino.io/blog/2024/10/07/sql-basecamps.html">&lt;p&gt;Later in December your knowledge of our Trino SQL query engine will certainly
peak again at &lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;. To reach those heights and
absorb all there is to learn at Trino Summit, you need to get ready.&lt;/p&gt;

&lt;p&gt;That is why I teamed up with our &lt;a href=&quot;/development/roles#benevolent-dictators-for-life-&quot;&gt;Trino creators and
BDFLs&lt;/a&gt; –
Martin Traverso, Dain Sundstrom, and David Phillips. We aim to be your coaches
and trainers to get you ready and get to the summit without the need for oxygen
masks and sherpas. Join us for the &lt;strong&gt;“SQL basecamps before Trino Summit”&lt;/strong&gt;,
where we expand on our &lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;list=PLFnr63che7wYzZoo5yyEF5R1QrOH6VRq3&quot;&gt;past SQL training
series&lt;/a&gt;
with two new episodes.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/sql-basecamps-before-trino-summit/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-SQL-Basecamps-Before-Trino-Summit&amp;amp;utm_content=blog-1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Both planned sessions provide a high-level overview and some practical tips and
tricks over the course of an hour. The sessions are completed by an open
questions and answers section with the speakers.&lt;/p&gt;

&lt;h2 id=&quot;moving-supplies&quot;&gt;Moving supplies&lt;/h2&gt;

&lt;p&gt;In the first episode &lt;strong&gt;SQL basecamp 1 – Moving supplies&lt;/strong&gt; David and Dain will
help me provide an overview of the wide range of possibilities when it comes to
moving data to Trino and moving data with Trino.&lt;/p&gt;

&lt;p&gt;We specifically look at the strengths of Trino for running your data lakehouse
and migrating to it from legacy data lakes or other systems. SQL skills
discussed include tips for creating schemas and tables, adding and updating
data, and inspecting metadata. We talk about table procedures for data
management and also cover some operational aspects. For example, we talk about
the right configuration in your catalogs for your object storage, specifically
the new file system support in Trino.&lt;/p&gt;

&lt;h2 id=&quot;getting-ready-to-summit&quot;&gt;Getting ready to summit&lt;/h2&gt;

&lt;p&gt;The second episode &lt;strong&gt;SQL Basecamp 2 – Getting ready to summit&lt;/strong&gt; builds on the
foundation established in episode 1. Data has moved into the lakehouse, powered
by Trino, and more data is added and changed as part of normal operation. In
this episode Martin and myself look at maintaining the data in a healthy state
and explore some tips and tricks for querying data. For example, we look at data
management with procedures, analyzing data with window functions, and examine
more complex structural data.&lt;/p&gt;

&lt;h2 id=&quot;what-do-want-to-learn&quot;&gt;What do want to learn&lt;/h2&gt;

&lt;p&gt;So there you have it - enough reason to register. Well, if not we can do better:
Both sessions are aimed at all of you out there using Trino and we are ready to
discuss your questions during class. More importantly though, I would also love
to hear your suggestions for these and other topics about SQL and Trino. We can
adjust this series, figure out a session for Trino Summit, or bring another SQL
training series to you next year.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;mailto:manfred@starburst.io?subject=SQL%20basecamp%20idea&quot;&gt;
        Submit an idea to Manfred
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;trino-summit-needs-you&quot;&gt;Trino Summit needs you!&lt;/h2&gt;

&lt;p&gt;Now with all that in mind, what are you waiting for? Get ready to learn more
about SQL with Trino in the series and at Trino Summit.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/sql-basecamps-before-trino-summit/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-SQL-Basecamps-Before-Trino-Summit&amp;amp;utm_content=blog-1&quot;&gt;
        I am convinced - register now
    &lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;And of course, we are also interested in your 
&lt;a href=&quot;https://sessionize.com/trino-summit-2024&quot;&gt;speaker proposals&lt;/a&gt; and 
&lt;a href=&quot;mailto:events@starburstdata.com?subject=Sponsor%20Trino%20Summit%202024&quot;&gt;sponsorships&lt;/a&gt;
for Trino Summit to make it an awesome event for everyone again.&lt;/p&gt;

&lt;p&gt;See you soon,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Later in December your knowledge of our Trino SQL query engine will certainly peak again at Trino Summit 2024. To reach those heights and absorb all there is to learn at Trino Summit, you need to get ready. That is why I teamed up with our Trino creators and BDFLs – Martin Traverso, Dain Sundstrom, and David Phillips. We aim to be your coaches and trainers to get you ready and get to the summit without the need for oxygen masks and sherpas. Join us for the “SQL basecamps before Trino Summit”, where we expand on our past SQL training series with two new episodes. Register now</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/sql-basecamps-2024.png" />
      
    </entry>
  
    <entry>
      <title>23 is a go, keeping pace with Java</title>
      <link href="https://trino.io/blog/2024/09/17/java-23.html" rel="alternate" type="text/html" title="23 is a go, keeping pace with Java" />
      <published>2024-09-17T00:00:00+00:00</published>
      <updated>2024-09-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/09/17/java-23</id>
      <content type="html" xml:base="https://trino.io/blog/2024/09/17/java-23.html">&lt;p&gt;Only about ten Trino releases or six months ago, we released &lt;a href=&quot;https://trino.io/docs/current/release/release-447.html&quot;&gt;Trino
447&lt;/a&gt; with the requirement to
use Java 22. In recent releases we started to take more and more advantage of
features that are only available with that upgrade. We made some big steps in
terms of performance and talked talked about some of those performance
enhancements around aircompressor in the recent &lt;a href=&quot;https://trino.io/episodes/65.html&quot;&gt;Trino Community Broadcast
65&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Java community runs its release processes on a very predictable schedule -
March and September mean new Java releases. This time it’s Java 23, and
Trino will not be left behind. We are upgrading to &lt;a href=&quot;https://github.com/trinodb/trino/issues/21316&quot;&gt;use and require Java
23&lt;/a&gt; soon!.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background-and-motivation&quot;&gt;Background and motivation&lt;/h2&gt;

&lt;p&gt;While the new features and improvements in Java 23 are not as impactful as in
Java 22, we still need to keep pace to take advantage of the improvements and
avoid any problems in the future. Here are the Java Enhancement Proposals that
are &lt;a href=&quot;https://openjdk.org/projects/jdk/23/&quot;&gt;included with Java 23&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/455&quot;&gt;JEP 455:	Primitive Types in Patterns, instanceof, and switch (Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/466&quot;&gt;JEP 466:	Class-File API (Second Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/467&quot;&gt;JEP 467:	Markdown Documentation Comments&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/469&quot;&gt;JEP 469:	Vector API (Eighth Incubator)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/473&quot;&gt;JEP 473:	Stream Gatherers (Second Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/471&quot;&gt;JEP 471:	Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/474&quot;&gt;JEP 474:	ZGC: Generational Mode by Default&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/476&quot;&gt;JEP 476:	Module Import Declarations (Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/477&quot;&gt;JEP 477:	Implicitly Declared Classes and Instance Main Methods (Third Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/480&quot;&gt;JEP 480:	Structured Concurrency (Third Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/481&quot;&gt;JEP 481:	Scoped Values (Third Preview)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.org/jeps/482&quot;&gt;JEP 482:	Flexible Constructor Bodies (Second Preview)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more you can check out the &lt;a href=&quot;https://www.youtube.com/watch?v=ymuv5aUzWu0&quot;&gt;short summary
video&lt;/a&gt; or the &lt;a href=&quot;https://www.youtube.com/watch?v=QG9xKpgwOI4&quot;&gt;three hour long
launch stream&lt;/a&gt;. The &lt;a href=&quot;https://www.oracle.com/news/announcement/oracle-releases-java-23-2024-09-17/&quot;&gt;Oracle press
release&lt;/a&gt;
as well as the &lt;a href=&quot;https://blogs.oracle.com/java/post/the-arrival-of-java-23&quot;&gt;community
announcement&lt;/a&gt; also
bring you a wealth of further information.&lt;/p&gt;

&lt;p&gt;Overall our reasoning is unchanged from the &lt;a href=&quot;/blog/2023/11/03/java-21.html&quot;&gt;upgrade to 21&lt;/a&gt; and the &lt;a href=&quot;/blog/2024/03/13/java-22.html&quot;&gt;upgrade to 22&lt;/a&gt;.
So what are we specifically doing now?&lt;/p&gt;

&lt;h2 id=&quot;current-status-and-plans&quot;&gt;Current status and plans&lt;/h2&gt;

&lt;p&gt;Early access binaries have been in use in our continuous integration builds for
months. Java 23 launched today and the various JDK distribution binary packages
will become available shortly. We are executing on the same blueprint as last
time:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Wait for &lt;a href=&quot;https://adoptium.net/temurin/releases/&quot;&gt;Eclipse Temurin&lt;/a&gt; binaries.&lt;/li&gt;
  &lt;li&gt;Ensure everything works with Java 23.&lt;/li&gt;
  &lt;li&gt;Change the container image to use Java 23.&lt;/li&gt;
  &lt;li&gt;Cut a release and get community feedback from testing with the container.&lt;/li&gt;
  &lt;li&gt;Adjust to any feedback and available improvements for a few releases.&lt;/li&gt;
  &lt;li&gt;Switch the requirement for build and runtime to Java 23.&lt;/li&gt;
  &lt;li&gt;Cut another release and celebrate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Timing on all the work depends on obstacles we find on the way and how we
progress with removing them. We use the &lt;a href=&quot;https://github.com/trinodb/trino/issues/21316&quot;&gt;Java 23 tracking
issue&lt;/a&gt; and the linked issues and
pull requests to manage progress, discuss next steps, and work with the
community.&lt;/p&gt;

&lt;p&gt;Feel free to chime in there, find us on the &lt;a href=&quot;https://trinodb.slack.com/messages/C07ABNN828M&quot;&gt;#core-dev
channel&lt;/a&gt; on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino
community Slack&lt;/a&gt; or join us for a &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings&quot;&gt;contributor
call&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Mateusz Gajewski</name>
        </author>
      

      <summary>Only about ten Trino releases or six months ago, we released Trino 447 with the requirement to use Java 22. In recent releases we started to take more and more advantage of features that are only available with that upgrade. We made some big steps in terms of performance and talked talked about some of those performance enhancements around aircompressor in the recent Trino Community Broadcast 65. The Java community runs its release processes on a very predictable schedule - March and September mean new Java releases. This time it’s Java 23, and Trino will not be left behind. We are upgrading to use and require Java 23 soon!.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/java-duke-23.png" />
      
    </entry>
  
    <entry>
      <title>66: Chat with Trino and Wren AI</title>
      <link href="https://trino.io/episodes/66.html" rel="alternate" type="text/html" title="66: Chat with Trino and Wren AI" />
      <published>2024-09-12T00:00:00+00:00</published>
      <updated>2024-09-12T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/66</id>
      <content type="html" xml:base="https://trino.io/episodes/66.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/himanshu-mendapara-a732051aa/&quot;&gt;Himanshu Mendapra&lt;/a&gt;, 
Software Engineer at &lt;a href=&quot;https://begenuin.com/&quot;&gt;Genuin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/wwwy3y3/&quot;&gt;William Chang&lt;/a&gt;, 
CTO and Co-Founder at &lt;a href=&quot;https://cannerdata.com/&quot;&gt;Canner&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/yadiacolindres/&quot;&gt;Yadia Colindres&lt;/a&gt;, 
Product Management Advisor at &lt;a href=&quot;https://cannerdata.com/&quot;&gt;Canner&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-458.html&quot;&gt;Trino 458&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Deactivate legacy file system support for all catalogs. You must activate the
desired file system support with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-azure.enabled&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-gcs.enabled&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.native-s3.enabled&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fs.hadoop.enabled&lt;/code&gt; in
each catalog using the Delta Lake, Hive, Hudi, or Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;Add support for tracing with OpenTelemetry to the JDBC driver.&lt;/li&gt;
  &lt;li&gt;Reduce data transfer from remote systems for queries with large &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; lists in
numerous connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-459.html&quot;&gt;Trino 459&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Docker container now uses Java 23. Please test this and let us know of any
problems since Java 23 is going to be a requirement soon.&lt;/li&gt;
  &lt;li&gt;Add support for KiB and similar data size units for the Trino CLI output.&lt;/li&gt;
  &lt;li&gt;Allow configuring maximum concurrent HTTP requests to Azure on every node&lt;/li&gt;
  &lt;li&gt;Add support for WASB to Azure Storage file system support.&lt;/li&gt;
  &lt;li&gt;Improve cache hit ratio for the file system cache.&lt;/li&gt;
  &lt;li&gt;Remove the local file connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-460.html&quot;&gt;Trino 460&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for using an Alluxio cluster as file system cache.&lt;/li&gt;
  &lt;li&gt;Add support for WASBS to Azure Storage file system support.&lt;/li&gt;
  &lt;li&gt;Remove the atop connector.&lt;/li&gt;
  &lt;li&gt;Remove the Raptor connector.&lt;/li&gt;
  &lt;li&gt;Numerous performance improvements for the Clickhouse connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Updated and improved documentation for contributors for Trino, Trino Gateway,
and other Trino projects.&lt;/li&gt;
  &lt;li&gt;Jan Was steps up as subproject maintainer for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-js-client&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Cristian Osiac, Jordan Zimmermann, and Pablo Arteaga are working on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws-proxy&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-himanshu&quot;&gt;Introducing Himanshu&lt;/h2&gt;

&lt;p&gt;Working at Genuin as software engineer, learning about new technologies, and
occasionally &lt;a href=&quot;https://github.com/himanshu634&quot;&gt;contributing to open source
projects&lt;/a&gt; like Wren AI.&lt;/p&gt;

&lt;h2 id=&quot;introducing-william-and-yadia&quot;&gt;Introducing William and Yadia&lt;/h2&gt;

&lt;p&gt;William is co-founder at Canner and drives everything about Canner Enterprise
and Wren AI as CTO. Yadia works with William at Canner and is product manager
for Wren AI.&lt;/p&gt;

&lt;p&gt;We talk about the history of Canner and their usage of Trino in Canner
Enterprise.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/canner-small.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Pivoting to talk about Wren AI, we learn about its architecture, use cases and
features, and continue along with an extensive demo of Wren AI.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/wren-ai-small.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://cannerdata.com/&quot;&gt;Canner&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getwren.ai/&quot;&gt;Wren AI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Canner/WrenAI/pull/535&quot;&gt;Pull request for Trino integration&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.getwren.ai/oss/guide/connect/trino&quot;&gt;Trino as Wren AI data source documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.producthunt.com/posts/wren-ai-cloud&quot;&gt;Wren AI launch at producthunt&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;A call out to help us &lt;a href=&quot;https://github.com/trinodb/trino/issues/23121&quot;&gt;clean up and close old
issues&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Join us for the next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast
67&lt;/a&gt; about the Exasol database and Trino connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>65: Performance boosts</title>
      <link href="https://trino.io/episodes/65.html" rel="alternate" type="text/html" title="65: Performance boosts" />
      <published>2024-09-12T00:00:00+00:00</published>
      <updated>2024-09-12T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/65</id>
      <content type="html" xml:base="https://trino.io/episodes/65.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-455.html&quot;&gt;Trino 455&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add query starting time in QueryStatistics in all event listeners, including
the new Kafka event listener.&lt;/li&gt;
  &lt;li&gt;Allow configuring endpoint for the native Azure filesystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-456.html&quot;&gt;Trino 456&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Invalid - release process errors resulted in invalid artifacts.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-457.html&quot;&gt;Trino 457&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance of queries involving joins when fault-tolerant execution
is enabled.&lt;/li&gt;
  &lt;li&gt;Improve performance for LZ4, Snappy and ZSTD compression and decompression.&lt;/li&gt;
  &lt;li&gt;Publish a JDBC driver JAR without bundled, third-party dependencies.&lt;/li&gt;
  &lt;li&gt;Improve performance for concurrent write operations on S3 by using lock-less
Delta Lake write reconciliation, made possible with the release of the AWS SDK
with S3 conditional write support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;h2 id=&quot;performance-boosters&quot;&gt;Performance boosters&lt;/h2&gt;

&lt;p&gt;We chat about some of the following aspects and projects and their impact on Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Role and history of Aircompressor.&lt;/li&gt;
  &lt;li&gt;Foundation from Airlift.&lt;/li&gt;
  &lt;li&gt;Relation to Java 22, and soon 23.&lt;/li&gt;
  &lt;li&gt;Status and next steps for improved and modernized file system support.&lt;/li&gt;
  &lt;li&gt;A quick glance at client protocol improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/airlift/aircompressor&quot;&gt;Aircompressor&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/airlift/airlift&quot;&gt;Airlift&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/object-storage.html&quot;&gt;Object storage and file system documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/22271&quot;&gt;Project Swift&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;We chat about the &lt;a href=&quot;https://github.com/trinodb/trino/issues/23122&quot;&gt;recent cleanup of unused Slack
channels&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;A call out to help us &lt;a href=&quot;https://github.com/trinodb/trino/issues/23121&quot;&gt;clean up and close old
issues&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Check out our new &lt;a href=&quot;https://github.com/trinodb/presentations/tree/main/assets/backgrounds&quot;&gt;video call background
images&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Join us for the next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast
66&lt;/a&gt; about Wren AI and Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>64: Control with Open Policy Agent OPA</title>
      <link href="https://trino.io/episodes/64.html" rel="alternate" type="text/html" title="64: Control with Open Policy Agent OPA" />
      <published>2024-08-22T00:00:00+00:00</published>
      <updated>2024-08-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/64</id>
      <content type="html" xml:base="https://trino.io/episodes/64.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://x.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://trino.io/users.html#starburst&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/sebastian-bernauer-622b95167&quot;&gt;Sebastian Bernauer&lt;/a&gt;, Software Developer at &lt;a href=&quot;https://trino.io/users.html#stackable&quot;&gt;Stackable&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/soenkeliebau/&quot;&gt;Sönke Liebau&lt;/a&gt;, Co-Founder and CPO
at &lt;a href=&quot;https://trino.io/user.htmls#stackable&quot;&gt;Stackable&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-454.html&quot;&gt;Trino 454&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance for queries that contain multiple aggregate functions,
including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add Kafka event listener plugin (yet to be documented).&lt;/li&gt;
  &lt;li&gt;Add configuration for fetch size with JDBC-based connectors (yet to be documented).&lt;/li&gt;
  &lt;li&gt;Add support for writing Deletion Vectors with the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Add new &lt;strong&gt;Resources&lt;/strong&gt; tab in the web interface with data from the new
light-weight query endpoint &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/v1/query?pruned=true&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add new Preview Web UI (help us test and develop!).&lt;/li&gt;
  &lt;li&gt;Add S3 security mapping for the native S3 filesystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;h2 id=&quot;stackable-opa-and-more&quot;&gt;Stackable, OPA, and more&lt;/h2&gt;

&lt;p&gt;We chat with Sönke and Sebastian about the following agenda topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is Stackable?&lt;/li&gt;
  &lt;li&gt;Open Policy Agent (OPA) authorization plugin
    &lt;ul&gt;
      &lt;li&gt;History&lt;/li&gt;
      &lt;li&gt;Recent development&lt;/li&gt;
      &lt;li&gt;Compatibility layer to Trino’s file-based access control&lt;/li&gt;
      &lt;li&gt;Quick demo on row filtering and column masking&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Auto-scaling Trino clusters using trino-lb
    &lt;ul&gt;
      &lt;li&gt;Differences between &lt;a href=&quot;https://trino.io/ecosystem/add-on.html#trino-gateway&quot;&gt;Trino
Gateway&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#trino-lb&quot;&gt;trino-lb&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other aspects we discuss include the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Performance considerations&lt;/li&gt;
  &lt;li&gt;Aspects of Trino on Kubernetes such as graceful shutdown,
PodDisruptionBudgets,  and anti-affinity&lt;/li&gt;
  &lt;li&gt;Plans for next steps&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;other-resources&quot;&gt;Other resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/assets/episode/tcb64-stackable-opa-trino-lb.pdf&quot;&gt;Presentation slide deck&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;i class=&quot;fab fa-youtube watch-listen-icon&quot; title=&quot;Youtube&quot;&gt;&lt;/i&gt; Video for
&lt;a href=&quot;https://www.youtube.com/watch?v=fbqqapQbAv0&quot;&gt;Trino OPA Authorizer - Stackable and Bloomberg at Trino Summit
2023&lt;/a&gt; presented by Sönke from
Stackable and Pablo Arteaga from Bloomberg&lt;/li&gt;
  &lt;li&gt;&lt;i class=&quot;fab fa-github&quot; title=&quot;GitHub&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://github.com/stackabletech/trino-operator/tree/main/tests/templates/kuttl/opa-authorization/trino_rules&quot;&gt;Source code repo for
compatibility layer between Trino classic file-based access control JSON and
OPA/Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;i class=&quot;fab fa-youtube watch-listen-icon&quot; title=&quot;Youtube&quot;&gt;&lt;/i&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ATlq_l3WNiA&quot;&gt;Longer demo
video for row filtering and column
masking&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast 65&lt;/a&gt; about
the new Exasol connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>63: Querying with JS</title>
      <link href="https://trino.io/episodes/63.html" rel="alternate" type="text/html" title="63: Querying with JS" />
      <published>2024-08-01T00:00:00+00:00</published>
      <updated>2024-08-01T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/63</id>
      <content type="html" xml:base="https://trino.io/episodes/63.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/emilyasunaryo&quot;&gt;Emily Sunaryo&lt;/a&gt;, DevRel Intern at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-452.html&quot;&gt;Trino 452&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add Exasol connector.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;euclidean_distance()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dot_product()&lt;/code&gt;, and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cosine_distance()&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Add support for using the BigQuery Storage Read API when using the query table
function with the BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function for full query pass-through to the ClickHouse
connector.&lt;/li&gt;
  &lt;li&gt;Numerous improvements on the Delta Lake, Hive, Hudi, and Iceberg connectors
and the related file system support in Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-453.html&quot;&gt;Trino 453&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for non-equality joins.&lt;/li&gt;
  &lt;li&gt;Support for setting the SQL path for JDBC driver and CLI.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;execute&lt;/code&gt; procedure to run arbitrary statements in the underlying data source.&lt;/li&gt;
  &lt;li&gt;Support for reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pgvector&lt;/code&gt; vector types in PostgreSQL connector.&lt;/li&gt;
  &lt;li&gt;Support for views when using the Iceberg JDBC catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual, numerous performance improvements, bug fixes, and other features
have been added as well.&lt;/p&gt;

&lt;p&gt;Other noteworthy topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/&quot;&gt;Trino Gateway 10&lt;/a&gt;
release is out, and includes some major refactoring and new features.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-25-jul-2024&quot;&gt;Trino Contributor Call&lt;/a&gt;
recap is available. Note that the file system support will soon switch to the
new Trino-native implementations as default.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest-emily-sunaryo&quot;&gt;Guest Emily Sunaryo&lt;/h2&gt;

&lt;p&gt;Emily Sunaryo is a recent UC Berkeley graduate working in the Developer
Relations team at Starburst. She has a passion for both technical development
and also enablement of developer communities. With her degree in Data Science,
she is also interested in learning more about modern approaches to data
analytics and how emerging technologies can drive innovation in this space.&lt;/p&gt;

&lt;h2 id=&quot;trino-clients&quot;&gt;Trino clients&lt;/h2&gt;

&lt;p&gt;Trino clients come in many shapes and forms, but all of them allow users to run
SQL queries in Trino and access the results. They all use the Trino client REST
API. To make it easier for developers of these applications, as well as any
custom application, we provide a number of drivers as language-specific
wrappers. These include the JDBC driver, the Python client, the Go client, and
others.&lt;/p&gt;

&lt;h2 id=&quot;javascript&quot;&gt;JavaScript&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;https://trino.io/assets/images/logos/javascript.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/regadas&quot;&gt;Filipe Regadas&lt;/a&gt; agreed to transfer his
&lt;a href=&quot;https://github.com/trinodb/trino-js-client&quot;&gt;trino-js-client&lt;/a&gt; project to
trinodb and is now subproject maintainer. We are in the process of getting to a
first release ready to ship. We would love for you to help us!&lt;/p&gt;

&lt;h2 id=&quot;learning-about-trino&quot;&gt;Learning about Trino&lt;/h2&gt;

&lt;p&gt;Emily’s journey and bringing it all together. From university and Starburst
internship to the Trino Community Broadcast, and a working demo web application.&lt;/p&gt;

&lt;h2 id=&quot;demo-time&quot;&gt;Demo time&lt;/h2&gt;

&lt;p&gt;Emily talks about her demo web application using React, npm, and various other
libraries and tools to build a data application. The data resides in Trino,
specifically in &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst
Galaxy&lt;/a&gt; to make the
management easier, and she uses the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-js-client&lt;/code&gt; in her application to run
some pretty complex SQL queries again the NYC rideshare data set.&lt;/p&gt;

&lt;p&gt;Find more details in the
&lt;a href=&quot;https://github.com/emilysunaryo/trino-js-demo&quot;&gt;source code repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-22-aug-2024&quot;&gt;Trino Contributor Call&lt;/a&gt;
on the 22nd of August.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community Broadcast 64&lt;/a&gt; with
the &lt;a href=&quot;https://trino.io/users.html#stackable&quot;&gt;Stackable&lt;/a&gt; team about OPA on the 22nd
of August.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>62: A lakehouse that simply works at Prezi</title>
      <link href="https://trino.io/episodes/62.html" rel="alternate" type="text/html" title="62: A lakehouse that simply works at Prezi" />
      <published>2024-07-11T00:00:00+00:00</published>
      <updated>2024-07-11T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/62</id>
      <content type="html" xml:base="https://trino.io/episodes/62.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/vincenzo-cassaro/&quot;&gt;Vincenzo Cassaro&lt;/a&gt; -
&lt;a href=&quot;https://twitter.com/viciocassaro&quot;&gt;@viciocassaro&lt;/a&gt;, Data Engineer at
&lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-451.html&quot;&gt;Trino 451&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for configuring a proxy for the S3 native file system.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t_pdf&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t_cdf&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Improve performance of certain queries involving window functions.&lt;/li&gt;
  &lt;li&gt;Lots of Iceberg connector improvements including support for incremental
refresh for basic materialized views.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other noteworthy topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/oneonestar&quot;&gt;Star Poon (oneonestar)&lt;/a&gt; approved as new
subproject maintainer for &lt;a href=&quot;https://trinodb.github.io/trino-gateway/&quot;&gt;Trino Gateway&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/06/24/trino-fest-recap.html&quot;&gt;Recap blog post&lt;/a&gt; from Trino Fest
with video recordings and slides is now available.&lt;/li&gt;
  &lt;li&gt;Trino Contributor Congregation &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-congregation-14-june-2024&quot;&gt;recap notes&lt;/a&gt; are also available.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techplay.jp/event/944074&quot;&gt;Trino Japan meetup&lt;/a&gt; happened on the 10th of July.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest-vincenzo-cassaro&quot;&gt;Guest Vincenzo Cassaro&lt;/h2&gt;

&lt;p&gt;Vincenzo has been working with data in all its forms, from data modeling to
analytics and ML, since he completed his masters in computer engineering in
Italy. He is joining us from there, more specifically from Sicily, to chat with
us about how he got into computers, learned about Trino, and ended up at Prezi
now.&lt;/p&gt;

&lt;h2 id=&quot;about-prezi&quot;&gt;About Prezi&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://prezi.com/&quot;&gt;Prezi&lt;/a&gt; probably doesn’t need any introduction, but just in
case: Prezi is a popular and powerful platform to create and show engaging
presentations, videos, and infographics.&lt;/p&gt;

&lt;h2 id=&quot;a-lakehouse-that-simply-works&quot;&gt;A Lakehouse that simply works&lt;/h2&gt;

&lt;p&gt;With so many different technologies and vendors making proposals, it’s easy to
lose track of what truly matters. We chat with Vincenzo Cassaro from Prezi about
how a simple combination of established, maintained, open source technologies
can make a lakehouse that truly works at the scale of a company with 150 million
users.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://prezi.com/view/P4HYav74ficPkkTAHjXJ/&quot;&gt;Prezi slide deck for Vincenzo’s talk&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/07/11/trino-summit-2024-call-for-speakers.html&quot;&gt;Trino Summit 2024&lt;/a&gt; is coming on the 11th and 12th of December, and registration, call for
speakers, and sponsorship opportunities are open.&lt;/li&gt;
  &lt;li&gt;Next &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-25-jul-2024&quot;&gt;Trino Contributor Call&lt;/a&gt; on the 25th of July.&lt;/li&gt;
  &lt;li&gt;Next Trino Community Broadcast on 1st of August.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Announcing Trino Summit 2024</title>
      <link href="https://trino.io/blog/2024/07/11/trino-summit-2024-call-for-speakers.html" rel="alternate" type="text/html" title="Announcing Trino Summit 2024" />
      <published>2024-07-11T00:00:00+00:00</published>
      <updated>2024-07-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/07/11/trino-summit-2024-call-for-speakers</id>
      <content type="html" xml:base="https://trino.io/blog/2024/07/11/trino-summit-2024-call-for-speakers.html">&lt;p&gt;Fresh off the heels of &lt;a href=&quot;/blog/2024/06/24/trino-fest-recap.html&quot;&gt;Trino Fest 2024&lt;/a&gt;, where Commander Bun Bun was busy meeting the Trino community in-person,
we’re already looking forward to another, bigger event to round out the year in
Trino. For those who’ve been here a while, you know that can only mean one
thing: Trino Summit 2024. Much like last year, it will be a two-day, fully
virtual event, hosting a wide range of talks covering all things Trino on the
11th and 12th of December. Read on for more info, or if you’re already
convinced…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;[…]Y25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=CFS-Blog&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;join-us-online&quot;&gt;Join us online&lt;/h2&gt;

&lt;p&gt;Trino Summit is an event that brings together engineers, analysts, data
scientists, and anyone else interested in using or contributing to Trino. As the
biggest Trino event of the year, we’re excited to bring together professionals
from the big data and analytics community, so they can share experiences and
insights, make connections, and learn from each other.&lt;/p&gt;

&lt;p&gt;The event will be broadcast live, and speakers will be addressing questions
asked in chat, so if you want the full experience, make sure to register and
attend while the talks are happening. Even if you can’t make it, registering
means you’ll be notified when we post videos of all talks to the Trino YouTube
channel after the event, &lt;a href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;[…]Y25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=CFS-Blog&quot;&gt;so don’t fret - sign up!&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;call-for-speakers&quot;&gt;Call for speakers&lt;/h2&gt;

&lt;p&gt;Interested in speaking? We want to hear from everyone in the Trino community
who has something to share. We are looking for full sessions (about 30 minutes)
and lightning talks (15 minutes). We welcome beginner to highly advanced
submissions for talks that are connected to Trino.&lt;/p&gt;

&lt;p&gt;A two-day event means we’ve got room for everything, so if you’re unsure about
whether to submit a talk, go ahead and do it! We’ll review all submissions, and
we’ll do our best to work with you to turn your talk into a smash hit. Some
possible topics include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Best practices and use cases&lt;/li&gt;
  &lt;li&gt;Data lake, lakehouse, and data federation architectures&lt;/li&gt;
  &lt;li&gt;Query federation and data migrations&lt;/li&gt;
  &lt;li&gt;Table formats, file formats, and metadata catalogs&lt;/li&gt;
  &lt;li&gt;Optimizations and performance improvements&lt;/li&gt;
  &lt;li&gt;Data engineering, including data cleaning, batch and streaming architectures,
and maintenance&lt;/li&gt;
  &lt;li&gt;Streaming and other data ingestion and pipelines&lt;/li&gt;
  &lt;li&gt;Data science workflows and analytics&lt;/li&gt;
  &lt;li&gt;SQL analytics, business intelligence, dashboarding and other visualizations&lt;/li&gt;
  &lt;li&gt;Data governance and security&lt;/li&gt;
  &lt;li&gt;Writing advanced SQL queries and pipelines&lt;/li&gt;
  &lt;li&gt;Help for Trino deployment on-premise and in the cloud&lt;/li&gt;
  &lt;li&gt;Developing custom connectors and other plugins&lt;/li&gt;
  &lt;li&gt;Contributing to Trino&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to speak?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://sessionize.com/trino-summit-2024&quot;&gt;
        Submit a talk!
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;sponsor-trino-summit&quot;&gt;Sponsor Trino Summit&lt;/h2&gt;

&lt;p&gt;Starburst is the organizing sponsor of the event, but to make Trino Summit a
smashing success, they’re excited and interested in collaborating with other
organizations within the community. If you are interested in sponsoring, email
&lt;a href=&quot;mailto:events@starburstdata.com&quot;&gt;events@starburstdata.com&lt;/a&gt; for information.&lt;/p&gt;

&lt;p&gt;And regardless of whether you’re planning on attending, speaking, or sponsoring,
we look forward to seeing you soon!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden, Manfred Moser, and Monica Miller</name>
        </author>
      

      <summary>Fresh off the heels of Trino Fest 2024, where Commander Bun Bun was busy meeting the Trino community in-person, we’re already looking forward to another, bigger event to round out the year in Trino. For those who’ve been here a while, you know that can only mean one thing: Trino Summit 2024. Much like last year, it will be a two-day, fully virtual event, hosting a wide range of talks covering all things Trino on the 11th and 12th of December. Read on for more info, or if you’re already convinced… Register to attend!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2024/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Trino Fest 2024 recap</title>
      <link href="https://trino.io/blog/2024/06/24/trino-fest-recap.html" rel="alternate" type="text/html" title="Trino Fest 2024 recap" />
      <published>2024-06-24T00:00:00+00:00</published>
      <updated>2024-06-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/06/24/trino-fest-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2024/06/24/trino-fest-recap.html">&lt;p&gt;Trino Fest 2024 is successfully in the books! While over 100 enthusiastic
members of the community gathered in Boston, over 650 virtual attendees joined
us worldwide to learn from our expert speakers as they discussed topics such as
table formats, enhancements and optimizations, and use cases with Trino both
large and small. And now it is your chance to revisit the presentations or catch
up on everything you missed.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;impressions&quot;&gt;Impressions&lt;/h2&gt;

&lt;p&gt;Judging from early results from attendee and speaker feedback, everyone enjoyed
the event. Asked about what sessions the audience liked we got answers like&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;They were all very insightful.&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;All of it, but especially the realtime demos to see speed difference on query
optimization.&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;and &lt;em&gt;All of them, nothing was missed!&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just like some attendees, our speakers travelled from Europe, Asia, and other
places, and enjoyed the event.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Thanks for organizing the awesome event and inviting me for the talk!&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Was great to finally meet you and we had a great time at Trino Fest!&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Thanks for a great event last week. It was a pleasure to meet you all.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many of us also &lt;a href=&quot;https://www.linkedin.com/posts/k-shreya-s_trinofest2024-bigdata-analytics-activity-7209236269774585857-p8-e?utm_source=share&amp;amp;utm_medium=member_desktop&quot;&gt;met Commander Bun Bun&lt;/a&gt;,
and &lt;a href=&quot;https://www.youtube.com/watch?v=4jPYpU9Jrrw&quot;&gt;we sent greetings to the remote audience as
well&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://trino.io/assets/blog/trino-fest-2024/cbb-manfred.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The keynote, the sessions, and all the talk in the hallways confirmed that Trino
continues to thrive and expand in usage. Large companies like &lt;a href=&quot;https://trino.io/users.html&quot;&gt;Apple, Microsoft,
LinkedIn, Amazon, and many other users&lt;/a&gt; openly talk
about shipping Trino as part of their products and using it for internal usage
as well. Smaller companies either run Trino themselves or take advantage of
Trino-based products for all their data platform needs. Our sessions for Trino
Fest offered something to learn for everyone.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://trino.io/assets/blog/trino-fest-2024/hallway-chat.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;sponsors&quot;&gt;Sponsors&lt;/h2&gt;

&lt;p&gt;Bringing together the event was only possible thanks to the great Trino events
team around &lt;a href=&quot;https://www.linkedin.com/in/anna-schibli-418692172/&quot;&gt;Anna Schibli&lt;/a&gt;
at our main sponsor Starburst, and the assistance from all our other sponsors. A
heartfelt thank you from Commander Bun Bun and all of us go out to you!&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;Now, following is what you are really looking for. All the talks, speakers,
short recaps, slide decks, video recordings, and following Q&amp;amp;A sessions, ready
for you. Enjoy!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s new in Trino this summer&lt;/strong&gt;
&lt;br /&gt;Presented by Martin Traverso from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin recapped everything that’s happened in Trino over the last six months,
taking a look at the biggest new features and how Trino development is going
better than ever. He also gave a sneak peek at what we can expect soon in Trino.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=mk3n0_tAdZY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/keynote.pdf&quot; target=&quot;_blank&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Reducing query cost and query runtimes of Trino powered analytics platforms&lt;/strong&gt;
&lt;br /&gt;Presented by Jonas Irgens Kylling from
&lt;a href=&quot;https://dune.com/&quot; target=&quot;_blank&quot;&gt;Dune&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Jonas gave a detailed talk about how Dune has improved their performance of
Trino with a few key tweaks. That includes leveraging caching with Alluxio,
advanced cluster management, and storing, sampling, and filtering query results.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=11yhPXIXiBY&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/dune.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Enhancing Trino’s query performance and data management with Hudi: innovations and future&lt;/strong&gt;
&lt;br /&gt;Presented by Ethan Guo from
&lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;Onehouse&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ethan gave a look into development on Hudi and Trino’s Hudi connector,
explaining multi-modal indexing and how it can improve query performance. He
also gave an overview of the roadmap and future of the connector.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=JMzS2BbeK0E&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/onehouse.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Trino Engineering @ Microsoft&lt;/strong&gt;
&lt;br /&gt;Presented by George Fisher and Ishan Patwa from
&lt;a href=&quot;https://www.microsoft.com/&quot; target=&quot;_blank&quot;&gt;Microsoft&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;George and Ishan gave a deep dive into what’s been going on with Microsoft’s
deployment and management of Trino. This included clients and integrations,
result caching, a sharded SQL connector, deep debugging and monitoring, and
seamless security integration with Azure.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=t7ndqYUhKSA&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Enhancing data governance in Trino with the OpenLineage integration&lt;/strong&gt;
&lt;br /&gt;Presented by Alok Kumar Prusty from
&lt;a href=&quot;https://www.apple.com/&quot; target=&quot;_blank&quot;&gt;Apple&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Alok’s lightning talk is all about how Apple deployed OpenLineage, an open
framework for data lineage collection and analysis, and built a Trino plugin to
publish OpenLineage complaint events that can be viewed and monitored.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=A7hj1M7IYj8&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Best practices and insights when migrating to Apache Iceberg for data engineers&lt;/strong&gt;
&lt;br /&gt;Presented by Amit Gilad from
&lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;Cloudinary&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Amit shared how Cloudinary expanded their data lake to use Apache Iceberg. He
demonstrated how moving from Snowflake to an open table format allowed them to
reduce storage costs and leverage different query and processing engines to run
more powerful analytics at scale.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=dKQ2zShNlyQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/cloudinary.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Trino query intelligence: insights, recommendations, and predictions&lt;/strong&gt;
&lt;br /&gt;Presented by Marton Bod from &lt;a href=&quot;https://www.apple.com/&quot; target=&quot;_blank&quot;&gt;Apple&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Marton’s lightning talk explored how Apple has monitored and stored metadata for
every Trino query execution, then used that data for for real-time cluster
dashboarding, self-service troubleshooting, and automatic generation of
recommendations for users.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=K3iSXOJNaSQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;The open source journey of the Trino Delta Lake Connector&lt;/strong&gt;
&lt;br /&gt;Presented by Marius Grama from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Marius went into a deep dive on all the work and collaboration that’s gone into
making the Delta Lake connector in Trino a robust, first-class connector. Casual
discussions, engineers working together, GitHub issues filed by the community,
and innovative contributions have all come together, and Marius’ talk shows why
an open source community is so powerful.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=mPfRYdvDcMo&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/delta-lake.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Tiny Trino; new perspectives in small data&lt;/strong&gt;
&lt;br /&gt;Presented by Ben Jeter and Thomas Zugibe from
&lt;a href=&quot;https://www.executivehomes.com/&quot; target=&quot;_blank&quot;&gt;Executive Homes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ben and Tommy explore how Executive Homes uses Trino’s robust suite of
integrations to handle data at a small scale. Instead of petabytes, how about a
handful of gigabytes in several different systems? It’s something that Trino is
well-equipped to handle thanks to how well-supported it is in the data
ecosystem, and they explain why.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=ZcY9LJDdB6Y&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/executive-homes.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Bridging the divide: running Trino SQL on a vector data lake powered by Lance&lt;/strong&gt;
&lt;br /&gt;Presented by Lei Xu from &lt;a href=&quot;https://lancedb.com/&quot; target=&quot;_blank&quot;&gt;LanceDB&lt;/a&gt;
and Noah Shpak from &lt;a href=&quot;https://character.ai/&quot; target=&quot;_blank&quot;&gt;Character.ai&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Lei and Noah give an overview of LanceDB, how it works, and what makes it a
great database for multimodal AI. Then they dive into a Trino connector for
Lance, and explore how Trino slots into Character.AI’s workload to blend
analytics with training and generating new models.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=jmOsVbGfon0&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/lance-characterai.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;How FourKites runs a scalable and cost-effective log analytics solution to
handle petabytes of logs&lt;/strong&gt;
&lt;br /&gt;Presented by Arpit Garg from
&lt;a href=&quot;https://www.fourkites.com/&quot; target=&quot;_blank&quot;&gt;FourKites&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With nearly a petabyte of logs being managed at FourKites, it shouldn’t be a
huge surprise that they’ve turned to Trino to handle understanding and analyzing
them. Arpit discusses how they’ve scaled log ingestion, strategically used S3
with Parquet to minimize storage costs, transformed and extracted those logs at
scale, and leveraged Trino to search and explore the datasets with Superset as a
frontend for visualization.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=xdCZBQJt-0g&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/fourkites.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Observing Trino&lt;/strong&gt;
&lt;br /&gt;Presented by Matt Stephenson from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Starburst has built a comprehensive observability platform around Trino to
better serve its users and customers. Matt explored all the components of it,
including how to integrate with Jaeger, Prometheus, and ELK.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=v7p72Ggcc5I&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/observing-trino.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Accelerate Performance at Scale: Best Practices for Trino with Amazon S3&lt;/strong&gt;
&lt;br /&gt;Presented by Dai Ozaki from &lt;a href=&quot;https://aws.amazon.com/&quot; target=&quot;_blank&quot;&gt;AWS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Dai’s talk explores best practices to get the most out of using Trino in
conjunction with Amazon S3. He discusses partitioning, scaling workloads,
reducing latency, and resolving common bottlenecks, providing valuable insights
for anyone trying to manage and deploy Trino with S3.
&lt;br /&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=cjUUcHlUKxQ&quot; target=&quot;_blank&quot;&gt;Video recording&lt;/a&gt;
| &lt;a href=&quot;https://trino.io/assets/blog/trino-fest-2024/aws-s3.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;While you are busy catching up, we are still working hard on a recap of the
Trino Contributor Congregation. We also had a lot of great conversations that
lead us to follow up action items such as more pull requests to review, new
contributors to onboard, and more projects to work on.&lt;/p&gt;

&lt;p&gt;Make sure you to &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;join the community on Slack&lt;/a&gt; to learn
more in the next little while.&lt;/p&gt;

&lt;p&gt;Oh, and one last thing…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=Trino-Fest-Blog-Recap&quot;&gt;
        Trino Summit 2024 registration is open
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you soon,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Cole, and Monica&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden, Monica Miller</name>
        </author>
      

      <summary>Trino Fest 2024 is successfully in the books! While over 100 enthusiastic members of the community gathered in Boston, over 650 virtual attendees joined us worldwide to learn from our expert speakers as they discussed topics such as table formats, enhancements and optimizations, and use cases with Trino both large and small. And now it is your chance to revisit the presentations or catch up on everything you missed.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/trino-fest-talk.jpg" />
      
    </entry>
  
    <entry>
      <title>61: Trino powers business intelligence</title>
      <link href="https://trino.io/episodes/61.html" rel="alternate" type="text/html" title="61: Trino powers business intelligence" />
      <published>2024-06-20T00:00:00+00:00</published>
      <updated>2024-06-20T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/61</id>
      <content type="html" xml:base="https://trino.io/episodes/61.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/patrick-pichler/&quot;&gt;Patrick Pichler&lt;/a&gt;, Owner and
co-founder at &lt;a href=&quot;https://www.creativedata.io/&quot;&gt;Creative Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-449.html&quot;&gt;Trino 449&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add OpenLineage event listener.&lt;/li&gt;
  &lt;li&gt;Add support for views when using the Iceberg REST catalog.&lt;/li&gt;
  &lt;li&gt;Improve write performance for Parquet files in Hive, Iceberg, and Delta Lake
connector.&lt;/li&gt;
  &lt;li&gt;Improve equality delete performance in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-450.html&quot;&gt;Trino 450&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;first_value()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_value()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_trunc()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_add()&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_diff()&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Add support for concurrent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; queries in Delta
Lake connector.&lt;/li&gt;
  &lt;li&gt;Add support for reading UniForm tables in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE&lt;/code&gt; in Iceberg and Memory connector.&lt;/li&gt;
  &lt;li&gt;Automatically configure BigQuery scan parallelism.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;first-recap-from-trino-fest-2024&quot;&gt;First recap from Trino Fest 2024&lt;/h2&gt;

&lt;p&gt;Cole and Manfred chat a bit about Trino Fest last week, mentioning that &lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7waExsD4lWarA3ML4R2HH58A&quot;&gt;all
videos are now available&lt;/a&gt;,
and a blog post with slides and more material is coming as well.&lt;/p&gt;

&lt;h2 id=&quot;impression-from-trino-contributor-congregation&quot;&gt;Impression from Trino Contributor Congregation&lt;/h2&gt;

&lt;p&gt;Manfred and Dain lead the discussions in the congregation. We are excited about
a lot of the follow ups for the project and increased collaboration and
innovation.&lt;/p&gt;

&lt;h2 id=&quot;guest-patrick-pichler&quot;&gt;Guest Patrick Pichler&lt;/h2&gt;

&lt;p&gt;Patrick specializes in providing guidance, designing, and implementing
sustainable data, analytics and AI solutions utilizing open architectures at
Creative Data. He has a long history of working in the data and data platform
space as user, developer, administrator, manager, consultant, and educator.&lt;/p&gt;

&lt;h2 id=&quot;powerbi-overview&quot;&gt;PowerBI overview&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://powerbi.microsoft.com/&quot;&gt;Power BI&lt;/a&gt; is an interactive data visualization
software product suite developed by Microsoft with a primary focus on business
intelligence. We talk about the different available products and features, and
their usage in the community.&lt;/p&gt;

&lt;h2 id=&quot;trino-client-support-options-for-power-bi&quot;&gt;Trino client support options for Power BI&lt;/h2&gt;

&lt;p&gt;Typically, Power BI relies on ODBC drivers for connecting to specific data
sources. Since there is no open source Trino ODBC driver however, Patrick and
other clever developers have created a &lt;a href=&quot;https://github.com/CreativeDataEU/PowerBITrinoConnector&quot;&gt;Power BI
client&lt;/a&gt; that connects
to Trino directly via the client REST API - the
&lt;a href=&quot;https://github.com/CreativeDataEU/PowerBITrinoConnector&quot;&gt;PowerBITrinoConnector&lt;/a&gt;.
We discuss the details and limitation of both approaches, look at the source
code, and learn about import and direct query modes.&lt;/p&gt;

&lt;h2 id=&quot;demo&quot;&gt;Demo&lt;/h2&gt;

&lt;p&gt;Patrick showcases how to install and use the connector in his demo of Trino and
Power BI.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trino-summit-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=NORAM-FY25-Q4-CM-Trino-Summit-2024-IMC-Upgrade&amp;amp;utm_content=Trino-Fest-Blog-Recap&quot;&gt;Trino Summit 2024&lt;/a&gt;
is coming on the 11th and 12th of December, and registration is open now.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>One busy week to go before Trino Fest 2024</title>
      <link href="https://trino.io/blog/2024/06/06/trino-fest-last-call.html" rel="alternate" type="text/html" title="One busy week to go before Trino Fest 2024" />
      <published>2024-06-06T00:00:00+00:00</published>
      <updated>2024-06-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/06/06/trino-fest-last-call</id>
      <content type="html" xml:base="https://trino.io/blog/2024/06/06/trino-fest-last-call.html">&lt;p&gt;This week has surely started off with a big bang and another boom in the data
platform world. Snowflake &lt;a href=&quot;https://www.snowflake.com/blog/introducing-polaris-catalog/&quot;&gt;introduced the open source Polaris
catalog&lt;/a&gt; as
implementation of the Iceberg REST catalog specification. And Databricks, the
main driver of the Delta Lake table format, &lt;a href=&quot;https://www.databricks.com/blog/databricks-tabular&quot;&gt;announced their acquisition of
Tabular&lt;/a&gt;, a main driver in
the Apache Iceberg community.&lt;/p&gt;

&lt;p&gt;Interestingly enough, Trino is in the middle of all this with great support for
Delta Lake, Hudi, Iceberg, and also the Iceberg REST catalog. And if all that
interoperability with Trino is not enough reason to join us next week at Trino
Fest 2024, I have some more ideas for you to consider.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;reasons-to-attend-trino-fest&quot;&gt;Reasons to attend Trino Fest&lt;/h2&gt;

&lt;p&gt;Trino Fest is happening next week on the 13th of June, and following are all the
reasons I can think of why you should tune in.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The event is free for all attendees. It is available as an in-person event in
Boston and for virtual attendance across the rest of the world.&lt;/li&gt;
  &lt;li&gt;You can learn about real world experience with Trino, Delta Lake, Iceberg,
Hudi, and many &lt;a href=&quot;https://trino.io/ecosystem/index.html&quot;&gt;other data sources, clients, and add-ons&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Many Trino friends, users, and contributors from around the world and
companies like Amazon, Apple, Bloomberg, character.ai, Dune, LanceDB,
Microsoft, Onehouse and Starburst are going to attend and present.&lt;/li&gt;
  &lt;li&gt;Monica Miller and Manfred Moser will guide you through the event with the help
of the awesome Starburst Trino events team.&lt;/li&gt;
  &lt;li&gt;In-person attendees might just meet our mascot, Commander Bun Bun.&lt;/li&gt;
  &lt;li&gt;On the following day, the &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-congregation-14-june-2024&quot;&gt;Trino Contributor
Congregation&lt;/a&gt;
will dive super deep into technical details and collaborative efforts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Convinced yet, or still wondering. In either case, go and &lt;a href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;have a look at the
detailed agenda and then register to attend&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;
        Register now!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;And last, but not least thank you to our sponsors for making this event happen…&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>This week has surely started off with a big bang and another boom in the data platform world. Snowflake introduced the open source Polaris catalog as implementation of the Iceberg REST catalog specification. And Databricks, the main driver of the Delta Lake table format, announced their acquisition of Tabular, a main driver in the Apache Iceberg community. Interestingly enough, Trino is in the middle of all this with great support for Delta Lake, Hudi, Iceberg, and also the Iceberg REST catalog. And if all that interoperability with Trino is not enough reason to join us next week at Trino Fest 2024, I have some more ideas for you to consider.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>60: Trino calling AI</title>
      <link href="https://trino.io/episodes/60.html" rel="alternate" type="text/html" title="60: Trino calling AI" />
      <published>2024-05-22T00:00:00+00:00</published>
      <updated>2024-05-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/60</id>
      <content type="html" xml:base="https://trino.io/episodes/60.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/isainalcik/&quot;&gt;Isa Inalcik&lt;/a&gt;, Principal Data
Engineer at &lt;a href=&quot;https://bestsecret.com/&quot;&gt;BestSecret Group&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-and-news&quot;&gt;Releases and news&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-446.html&quot;&gt;Trino 446&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for the Snowflake catalog in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for reading S3 objects restored from Glacier storage in the Hive
connector.&lt;/li&gt;
  &lt;li&gt;Add support for unsupported type handling configuration in the Snowflake
connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-447.html&quot;&gt;Trino 447&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE FUNCTION&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Require Java 22.&lt;/li&gt;
  &lt;li&gt;Add support for concurrent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE&lt;/code&gt; in the Delta Lake
connector.&lt;/li&gt;
  &lt;li&gt;Remove support for Phoenix 5.1.x and earlier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-448.html&quot;&gt;Trino 448&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improve performance of reading from Parquet files.&lt;/li&gt;
  &lt;li&gt;Add support for caching Glue metadata with the update to use the V2 REST
interface.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.github.io/trino-gateway/release-notes/&quot;&gt;Trino Gateway 8 and 9&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support support for configurable router policies with two new policies available.&lt;/li&gt;
  &lt;li&gt;Add a Helm chart for deployment.&lt;/li&gt;
  &lt;li&gt;Add new website.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also had a new Trino Helm chart release 0.20.0.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/nineinchnick&quot;&gt;Jan Waś&lt;/a&gt; is now also
&lt;a href=&quot;https://trino.io/development/roles#subproject-maintainers&quot;&gt;subproject maintainer&lt;/a&gt; of the
&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;go client&lt;/a&gt; and the
&lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm charts&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;impressions-from-the-iceberg-summit&quot;&gt;Impressions from the Iceberg Summit&lt;/h2&gt;

&lt;p&gt;Last week, Cole attended the &lt;a href=&quot;https://iceberg-summit.org/&quot;&gt;Iceberg Summit&lt;/a&gt; with
a special Trino perspective, and we chat about his impressions and major
take-aways.&lt;/p&gt;

&lt;h2 id=&quot;guest-isa-inalcik-from-bestsecret&quot;&gt;Guest Isa Inalcik from BestSecret&lt;/h2&gt;

&lt;p&gt;Isa is a highly skilled data expert with over a decade of hands-on experience in
software development lifecycle. He is well versed with many data tools including
Trino/Starburst Enterprise Platform, Snowflake, Airflow, Apache Spark, Hive,
Apache Iceberg, dbt, and others.&lt;/p&gt;

&lt;h2 id=&quot;trino-at-bestsecret&quot;&gt;Trino at BestSecret&lt;/h2&gt;

&lt;p&gt;At BestSecret, a leading online retailer for fashion and lifestyle in Europe,
Isa spearheads the development of efficient and resilient ELT/ETL pipelines and
the implementation of data and AI-driven solutions. We chat in more details
about their setup and use cases, his solutions, and challenges he is facing.&lt;/p&gt;

&lt;h2 id=&quot;generative-ai-interest-and-use-cases&quot;&gt;Generative AI interest and use cases&lt;/h2&gt;

&lt;p&gt;Isa has been following the waves of interest in AI and sees the following use
cases related to data and Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Media (Audio,Video,Image): Extract information out of images.&lt;/li&gt;
  &lt;li&gt;Object categorization: Categorize objects on images, videos.&lt;/li&gt;
  &lt;li&gt;Data masking: For anonymizing sensitive data from unstructured text.&lt;/li&gt;
  &lt;li&gt;Data extraction: To pull structured information from unstructured text.&lt;/li&gt;
  &lt;li&gt;Sentiment analysis: For gauging the sentiment of textual data.&lt;/li&gt;
  &lt;li&gt;Language detection or translation: For language detection or translating.&lt;/li&gt;
  &lt;li&gt;Summarization: To generate concise summaries from lengthy texts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This inspired him to try an integration of the new emerging LLMs with Trino.&lt;/p&gt;

&lt;h2 id=&quot;trino-spi&quot;&gt;Trino SPI&lt;/h2&gt;

&lt;p&gt;Trino uses a service provider interface (SPI) to allow developers to create
plugins for features such as connectors, security integrations and custom
functions. This is crucial for business to implement required functionality and
enabled Isa to work on a plugin to support custom functions that call LLMs.&lt;/p&gt;

&lt;p&gt;The OpenAI API specification also allowed him to create one function that can be
used with different LLM backends.&lt;/p&gt;

&lt;h2 id=&quot;proof-of-concept-and-demo&quot;&gt;Proof of concept and demo&lt;/h2&gt;

&lt;p&gt;We look at the concept and implementation that Isa developed with the following
architecture:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/60/trino-ai-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Isa’s &lt;a href=&quot;https://github.com/alaturqua/trino-ai&quot;&gt;trino-ai repository&lt;/a&gt; contains
source code and more details as mentioned in his post on
&lt;a href=&quot;https://www.linkedin.com/posts/isainalcik_trino-trino-llama3-activity-7187411736587587584-e2WW/&quot;&gt;LinkedIn&lt;/a&gt;
and used in the demo.&lt;/p&gt;

&lt;h2 id=&quot;other-resources&quot;&gt;Other resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Post from Isa: &lt;a href=&quot;https://www.linkedin.com/pulse/maximize-performance-secret-scaling-trino-clusters-isa-inalcik-ffo5e/&quot;&gt;Maximize Performance: The Secret to Scaling Trino Clusters with KEDA&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Post from Isa: &lt;a href=&quot;https://www.linkedin.com/pulse/enhancing-security-observability-trino-open-policy-agent-isa-inalcik-zhl9e&quot;&gt;Enhancing Security and Observability in Trino with Open Policy Agent and OpenTelemetry&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ollama.com/&quot;&gt;Ollama&lt;/a&gt; system used to run LLMs&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/develop.html&quot;&gt;Trino SPI documentation&lt;/a&gt;, including
&lt;a href=&quot;https://trino.io/docs/current/develop/functions.html&quot;&gt;custom function creation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Fest news:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/05/08/trino-fest-lineup-finalized.html&quot;&gt;Finalized speaker lineup announced&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburst.io/info/trino-fest-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=banner&quot;&gt;Register for event and hotel now&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Special thanks to our Trino Fest sponsors - Starburst as event host and
Alluxio, Cloudinary, Onehouse, Startree, and Upsolver as event sponsors.&lt;/li&gt;
  &lt;li&gt;Contact us to join the Trino Contributor Congregation the next day.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino Contributor Call on 23rd of May.&lt;/li&gt;
  &lt;li&gt;Check out upcoming &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino Community Broadcast episodes and other events&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Big names round out the Trino Fest 2024 lineup</title>
      <link href="https://trino.io/blog/2024/05/08/trino-fest-lineup-finalized.html" rel="alternate" type="text/html" title="Big names round out the Trino Fest 2024 lineup" />
      <published>2024-05-08T00:00:00+00:00</published>
      <updated>2024-05-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/05/08/trino-fest-lineup-finalized</id>
      <content type="html" xml:base="https://trino.io/blog/2024/05/08/trino-fest-lineup-finalized.html">&lt;p&gt;We gave
&lt;a href=&quot;/blog/2024/04/15/trino-fest-2024-approaches.html&quot;&gt;a sneak peek of the Trino Fest lineup a month ago&lt;/a&gt;,
and we’re excited to now bring you the full lineup for the event. We’ve got some
major names being added, including Amazon, Microsoft, and another talk from
Apple. With Fourkites and a joint talk with LanceDB and CharacterAI also added
to the schedule, we’re excited to present the
&lt;a href=&quot;https://www.starburst.io/info/trino-fest-2024/#agenda&quot;&gt;full lineup for Trino Fest 2024&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Trino Fest is barely a month away on the 13th of June, and whether you want to
attend live in Boston or tune in virtually, this is a reminder that you
should &lt;a href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;register to attend!&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;trino-fest-the-contributor-congregation-and-logistics&quot;&gt;Trino Fest, the contributor congregation, and logistics&lt;/h2&gt;

&lt;p&gt;In case you missed
&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;our announcement of Trino Fest&lt;/a&gt;,
it’s a hybrid event taking place from 9am-5pm Eastern Time on June 13th. It’ll
feature talks from a wide range of Trino users and contributors, with topics
ranging from use cases, migrations, cluster management and administration,
to lakehouse integrations and more. If you want to join us in-person, we’ll be at
the Hyatt Regency Boston. There will also be a meeting for Trino contributors
the day after the event at the Starburst office in Boston from 9am-1pm, and if
you’d be interested in attending that, please reach out to myself (Cole Bowden)
or Manfred Moser on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you still haven’t booked a hotel, we also have a discounted rate at the Hyatt
for the event to make life easy - whether that’s waking up and heading
downstairs for the start of the event, or being able to quickly duck back to
your room for a 30-minute meeting without missing too much. One link will take
you to a booking for just the night before the event, while the other allows
you to optionally book an extra night prior or include the night after Trino
Fest so you can stick around for the contributor congregation or explore Boston.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA4&quot;&gt;
        Book your hotel for June 12-13
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA3&quot;&gt;
        Book your hotel for June 11-14
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h2 id=&quot;and-dont-forget-those-additional-speakers&quot;&gt;And don’t forget those additional speakers&lt;/h2&gt;

&lt;p&gt;George Fisher, Ishan Patwa, and Oleg Savin will be diving deep into how Trino is
leveraged at Microsoft. While we’ve previously had LinkedIn at Trino events,
this is the first time the Trino community is getting to hear about the scale of
Trino within Microsoft proper, and with their plans to cover clients,
integrations, result caching, a sharded connector, visualization for monitoring,
and AKS deployment with Azure, there will be a lot to learn.&lt;/p&gt;

&lt;p&gt;Alok Kumar Prusty and Amogh Margoor from Apple will be joining the lineup to
discuss Trino query intelligence. With the mountain of query metadata, the team
at Apple has been able to better understand Trino usage and use that knowledge
to create impactful improvements for their Trino users. With dashboarding,
self-service troubleshooting, and automatic recommendations for query
optimization, Alok and Amogh will detail how a world-class engineering team can
take an awesome tool like Trino and make it even better for the end users.&lt;/p&gt;

&lt;p&gt;Also relatively new to the Trino community is discussing AI workloads. Lei Xu
from &lt;a href=&quot;https://lancedb.com/&quot;&gt;LanceDB&lt;/a&gt; and Noah Shpak from
&lt;a href=&quot;https://character.ai/&quot;&gt;character.ai&lt;/a&gt; will be highlighting exactly that,
using Trino as an analytics engine on top of a LanceDB-powered vector data lake.
With AI data so often being in a silo, analyzing it with a traditional SQL
workload is often expensive or complicated… but Lei and Noah will be
demonstrating how character.ai’s LanceDB/Trino pairing maintains the power of
both systems while making it easy.&lt;/p&gt;

&lt;p&gt;Dai Ozaki from Amazon will be diving into how to optimize Trino with S3. Given
how many people are using Trino with S3 already, hearing directly from Dai, an
engineer at Amazon, regarding best practices and optimizations should prove
beneficial for a massive chunk of the Trino community. Dai plans on talking
about how Trino and S3 interact, and how that knowledge can be used to get the
most out of your stack and avoid common bottlenecks.&lt;/p&gt;

&lt;p&gt;And last but not least, Aprit Garg from &lt;a href=&quot;https://www.fourkites.com/&quot;&gt;FourKites&lt;/a&gt;
will be discussing utilizing Trino to handle nearly a petabyte of logs.
FourKites is able to ingest massive amounts of logs, use S3 and
Parquet to keep storage costs low, transform and extract logs at scale, and then
use Trino as the engine to query those logs and reference them in context with
other data sets and data stores. Arpit will also touch on using Superset as a
frontend for Trino.&lt;/p&gt;

&lt;p&gt;And keep in mind - all of that is in addition to the talks we’ve already
announced!
&lt;a href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-3&quot;&gt;Register to attend&lt;/a&gt;,
&lt;a href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA3&quot;&gt;book your hotel&lt;/a&gt;, and
the Trino community is looking forward to seeing you there!&lt;/p&gt;

&lt;p&gt;Thank you to our sponsors for making this event happen…&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>We gave a sneak peek of the Trino Fest lineup a month ago, and we’re excited to now bring you the full lineup for the event. We’ve got some major names being added, including Amazon, Microsoft, and another talk from Apple. With Fourkites and a joint talk with LanceDB and CharacterAI also added to the schedule, we’re excited to present the full lineup for Trino Fest 2024. Trino Fest is barely a month away on the 13th of June, and whether you want to attend live in Boston or tune in virtually, this is a reminder that you should register to attend!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>59: Querying Trino with Java and jOOQ</title>
      <link href="https://trino.io/episodes/59.html" rel="alternate" type="text/html" title="59: Querying Trino with Java and jOOQ" />
      <published>2024-04-24T00:00:00+00:00</published>
      <updated>2024-04-24T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/59</id>
      <content type="html" xml:base="https://trino.io/episodes/59.html">&lt;h2 id=&quot;host&quot;&gt;Host&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Lukas Eder, Creator of &lt;a href=&quot;https:/jooq.org&quot;&gt;jOOQ&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/lukaseder&quot;&gt;@lukaseder&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-releases&quot;&gt;Trino releases&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-445.html&quot;&gt;Trino 445&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for time travel queries with the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Add support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REPLACE&lt;/code&gt; modifier as part of a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE&lt;/code&gt; statement
with the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Add support for writing Bloom filters in Parquet files with the Hive connector.&lt;/li&gt;
  &lt;li&gt;Add support for dynamic filtering to the MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Expand support for function pushdown in the Snowflake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;lukas-eder-and-data-geekery&quot;&gt;Lukas Eder and data geekery&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/lukaseder&quot;&gt;Lukas&lt;/a&gt; is recognized as a Java Champion and
well-known as a very active member of the Java community. We chat about his
history and involvement in the community of Java and related open source
projects, and how it lead to &lt;a href=&quot;https://www.jooq.org/&quot;&gt;jOOQ and his company data
geekery&lt;/a&gt;. Lukas also briefly talks about other products.&lt;/p&gt;

&lt;h2 id=&quot;jooq&quot;&gt;jOOQ&lt;/h2&gt;

&lt;p&gt;jOOQ stands for jOOQ Object Oriented Querying (jOOQ). It generates Java code
from your database, and lets you build type safe SQL queries through its
fluent API.&lt;/p&gt;

&lt;p&gt;All editions of jOOQ since the 3.19 release include support for Trino. The
level of support depends on the used catalog and connector, and further
Trino-specific enhancements are in progress.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#jooq&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/jooq.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our conversation and demo session with Lukas, we cover all the following
aspects and a few other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is jOOQ?&lt;/li&gt;
  &lt;li&gt;What motivated the creation of jOOQ?&lt;/li&gt;
  &lt;li&gt;Discuss the great reasons for using jOOQ:
    &lt;ul&gt;
      &lt;li&gt;Database first&lt;/li&gt;
      &lt;li&gt;Typesafe SQL&lt;/li&gt;
      &lt;li&gt;Code generation&lt;/li&gt;
      &lt;li&gt;Active records&lt;/li&gt;
      &lt;li&gt;Multi-tenancy&lt;/li&gt;
      &lt;li&gt;Standardization&lt;/li&gt;
      &lt;li&gt;Query lifecycle&lt;/li&gt;
      &lt;li&gt;Procedures&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;How does it compare to ORM system like &lt;a href=&quot;https://hibernate.org/&quot;&gt;Hibernate&lt;/a&gt; or
others like the old &lt;a href=&quot;https://blog.mybatis.org/&quot;&gt;MyBatis&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;What databases are supported by jOOQ and commonly used?&lt;/li&gt;
  &lt;li&gt;Chat about some customer use cases.&lt;/li&gt;
  &lt;li&gt;Supported and required Java versions, fun with upgrades, and experience from customers.&lt;/li&gt;
  &lt;li&gt;How Lukas discovered Trino and decided to add support for it.&lt;/li&gt;
  &lt;li&gt;Challenges and interesting aspects of supporting different databases&lt;/li&gt;
  &lt;li&gt;What is next for jOOQ in general, and Trino support specifically?&lt;/li&gt;
  &lt;li&gt;Cool SQL features in Trino that might be suitable for standardization:
    &lt;ul&gt;
      &lt;li&gt;Higher order functions, partially &lt;a href=&quot;https://www.jooq.org/doc/dev/manual/sql-building/column-expressions/array-functions/&quot;&gt;already supported in jOOQ&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Integration of object-relational database feature, such as nested
collections with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIST&lt;/code&gt;.&lt;/li&gt;
      &lt;li&gt;Potential introduction of new concepts to SQL, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAP&lt;/code&gt;.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Complexities from Trino having different catalogs and connectors, and the
catalog, schema, table hierarchy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;jOOQ resources and further information:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.jooq.org/&quot;&gt;Website&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://groups.google.com/g/jooq-user&quot;&gt;User group mailing list&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.jooq.org/learn/&quot;&gt;Documentation and other learning resources&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jOOQ/jOOQ&quot;&gt;Source code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples&quot;&gt;Example projects&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/JavaOOQ&quot;&gt;jOOQ on X&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Fest news:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/04/15/trino-fest-2024-approaches.html&quot;&gt;Great speaker lineup&lt;/a&gt; announced&lt;/li&gt;
  &lt;li&gt;More to come&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburst.io/info/trino-fest-2024/?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=banner&quot;&gt;Register for event and hotel now&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Contact us to join the Trino Contributor Congregation the next day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other news and events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred’s recap of Open Source Summit NA and Data Engineer Things meeting in Seattle.&lt;/li&gt;
  &lt;li&gt;Trino Contributor Call right after the episode.&lt;/li&gt;
  &lt;li&gt;Contact us to be a guest in upcoming &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; episodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Host</summary>

      
      
    </entry>
  
    <entry>
      <title>A sneak peek of Trino Fest 2024</title>
      <link href="https://trino.io/blog/2024/04/15/trino-fest-2024-approaches.html" rel="alternate" type="text/html" title="A sneak peek of Trino Fest 2024" />
      <published>2024-04-15T00:00:00+00:00</published>
      <updated>2024-04-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/04/15/trino-fest-2024-approaches</id>
      <content type="html" xml:base="https://trino.io/blog/2024/04/15/trino-fest-2024-approaches.html">&lt;p&gt;Trino Fest is drawing ever closer. Commander Bun Bun has been hard at work
behind the scenes arranging the schedule and making sure that Trino’s trip to
Boston is going to be a great one. In case you missed it,
&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;we announced Trino Fest&lt;/a&gt;
a couple months ago, and if you &lt;em&gt;have&lt;/em&gt; missed it, make sure to go register to
attend! All our speakers will be in person in downtown Boston on the 13th of
June, with plenty of opportunities for networking and a happy hour event at the
end of the day. But if you can’t make the trip to enjoy the lovely New England
summer, we’ll also be live-streaming the event, and you can register to join us
virtually.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-2&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Still on the fence, though? Read on for a preview of our speaker lineup and
brief summaries of their talks. Keep in mind this also isn’t the full lineup,
and we’ll follow up soon with the last few talks that round out the schedule.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;a-brief-word-from-our-sponsors&quot;&gt;A brief word from our sponsors…&lt;/h2&gt;

&lt;p&gt;Thank you to our sponsors for making this event happen…&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.onehouse.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/onehouse-small.png&quot; title=&quot;Onehouse, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.startree.ai/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/startree-small.png&quot; title=&quot;Startree, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://cloudinary.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/cloudinary-small.png&quot; title=&quot;Cloudinary, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.upsolver.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/upsolver-small.png&quot; title=&quot;Upsolver, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;And now onto what you’re waiting for: a preview of most of the talks coming to
Trino Fest this year!&lt;/p&gt;

&lt;h2 id=&quot;lakehouses&quot;&gt;Lakehouses&lt;/h2&gt;

&lt;p&gt;It’s no secret that using Trino as part of your lakehouse has become one of its
major use cases in the past few years. We’re excited to say that at Trino Fest,
we’ll have representation for each of the modern big three table formats:
Iceberg, Delta Lake, and Hudi.&lt;/p&gt;

&lt;h3 id=&quot;iceberg&quot;&gt;Iceberg&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt; will be covered twice: Amogh
Jahagirdar from &lt;a href=&quot;https://tabular.io/&quot;&gt;Tabular&lt;/a&gt; will be diving into the world of
Iceberg views and how they can be leveraged to coordinate across different query
languages and dialects. Amit Gilad from &lt;a href=&quot;https://cloudinary.com/&quot;&gt;Cloudinary&lt;/a&gt;
will be covering the story of migrating out of Snowflake to the wonderful world
of open table formats and Iceberg.&lt;/p&gt;

&lt;h3 id=&quot;delta-lake&quot;&gt;Delta Lake&lt;/h3&gt;

&lt;p&gt;Marius Grama, a Trino contributor at &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;,
will be going into detail on the history, development, and improvements to the
&lt;a href=&quot;https://delta.io/&quot;&gt;Delta Lake&lt;/a&gt; connector. With
&lt;a href=&quot;/blog/2024/04/11/time-travel-delta-lake.html&quot;&gt;time travel for the Delta Lake connector&lt;/a&gt;
landing in Trino 445, it’s one of the most exciting areas for development in
open source Trino, and there’s some interesting stories that Marius is excited
to share with the community.&lt;/p&gt;

&lt;h3 id=&quot;hudi&quot;&gt;Hudi&lt;/h3&gt;

&lt;p&gt;Rounding out data lakes, Ethan Guo from &lt;a href=&quot;https://www.onehouse.ai/&quot;&gt;Onehouse&lt;/a&gt;
will be diving into Trino’s &lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Hudi&lt;/a&gt; connector, giving
an update on what’s landed lately to improve performance and functionality.
He’ll also give a preview of what’s coming soon. The features are flying in, and
if you’re a current or prospective user of Hudi with Trino, you won’t want to
miss out.&lt;/p&gt;

&lt;h2 id=&quot;data-takes&quot;&gt;Data takes&lt;/h2&gt;

&lt;p&gt;Of course, there’s more to Trino than querying data lakes, and there’s a wide
variety of talks to discuss the other activities going on within the Trino
community.&lt;/p&gt;

&lt;h3 id=&quot;small-scale&quot;&gt;Small scale&lt;/h3&gt;

&lt;p&gt;Ben Jeter at &lt;a href=&quot;https://www.executivehomes.com/&quot;&gt;Executive Homes&lt;/a&gt;, who gave
&lt;a href=&quot;/blog/2023/07/25/trino-fest-2023-datto.html&quot;&gt;a talk at Trino Fest last year&lt;/a&gt;
while at &lt;a href=&quot;https://www.datto.com/&quot;&gt;Datto&lt;/a&gt;, is back to discuss running Trino at a
more moderate scale than that we’re used to hearing about in the Trino space.
Forget petabytes and exabytes, and welcome a tiny cluster querying thousands,
not millions, of records that still derives huge value from Trino. It’s a great
playbook for smaller startups and enterprises who still need robust, flexible,
performant analytics.&lt;/p&gt;

&lt;h3 id=&quot;maximizing-performance&quot;&gt;Maximizing performance&lt;/h3&gt;

&lt;p&gt;Jonas Kylling from &lt;a href=&quot;https://dune.com/about&quot;&gt;Dune&lt;/a&gt; will be detailing how they’ve
managed to optimize Trino and squeeze out every ounce of performance to reduce
query costs and runtimes. That includes leveraging the new Alluxio-based file
system caching, emulating various cluster sizes to avoid expensive idle cluster
time, and storing, sampling, and filtering query results to avoid re-executing
queries.&lt;/p&gt;

&lt;h3 id=&quot;query-intelligence&quot;&gt;Query intelligence&lt;/h3&gt;

&lt;p&gt;Marton Bod and Vinitha Gankidi from Apple bring insights to query intelligence.
They’ll demonstrate how Apple has understood when their clusters are most
utilized and who’s using them, enabling slicing and dicing along different
dimensions. Having a query intelligence dataset can be used for real-time
cluster dashboarding, self-service troubleshooting, and automatic generation of
recommendations for users, all of which can empower Trino to be better than
ever.&lt;/p&gt;

&lt;h2 id=&quot;and-more&quot;&gt;And more!&lt;/h2&gt;

&lt;p&gt;Of course, Trino’s own Martin Traverso will be giving a keynote on the latest
and greatest in the project, covering everything big that’s landed since Trino
Summit, as well as a glimpse at the roadmap for the project in the coming few
months. Several other big talks are falling into place that we can’t announce
just yet, so stay tuned for more info as the event draws nearer.&lt;/p&gt;

&lt;h2 id=&quot;trino-contributor-congregation&quot;&gt;Trino contributor congregation&lt;/h2&gt;

&lt;p&gt;The day after Trino Fest, we’ll also be hosting an in-person meetup for
Trino contributors and engineers to catch up, discuss the Trino roadmap, and
engage directly with the maintainers in-person. It’s a great opportunity to put
faces and voices to those GitHub handles, align on the big ideas or tricky PRs
that have been moving slowly, and find more ways to get involved in Trino
development. If you’re interested in attending, message Manfred Moser or Cole
Bowden on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, and we’ll get you added to
the attendee list and share more details.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>Trino Fest is drawing ever closer. Commander Bun Bun has been hard at work behind the scenes arranging the schedule and making sure that Trino’s trip to Boston is going to be a great one. In case you missed it, we announced Trino Fest a couple months ago, and if you have missed it, make sure to go register to attend! All our speakers will be in person in downtown Boston on the 13th of June, with plenty of opportunities for networking and a happy hour event at the end of the day. But if you can’t make the trip to enjoy the lovely New England summer, we’ll also be live-streaming the event, and you can register to join us virtually. Register to attend! Still on the fence, though? Read on for a preview of our speaker lineup and brief summaries of their talks. Keep in mind this also isn’t the full lineup, and we’ll follow up soon with the last few talks that round out the schedule.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>Time travel in Delta Lake connector</title>
      <link href="https://trino.io/blog/2024/04/11/time-travel-delta-lake.html" rel="alternate" type="text/html" title="Time travel in Delta Lake connector" />
      <published>2024-04-11T00:00:00+00:00</published>
      <updated>2024-04-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/04/11/time-travel-delta-lake</id>
      <content type="html" xml:base="https://trino.io/blog/2024/04/11/time-travel-delta-lake.html">&lt;p&gt;Exciting news - time travel capability has finally arrived in the Delta Lake
connector! After introducing support for time travel in the Iceberg connector
back in 2022, we’re thrilled to announce that the Delta Lake connector now joins
the ranks as the second connector offering this feature.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background-and-motivation&quot;&gt;Background and motivation&lt;/h2&gt;

&lt;p&gt;Time travel as a feature has a number of practical use cases:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Data recovery and rollback&lt;/strong&gt;: In the event of data corruption or erroneous
 updates, time travel allows users to roll back to a previous version of the
 data, restoring it to a known good state.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Auditing and compliance&lt;/strong&gt;: Time travel enables auditors and compliance
 teams to analyze data changes over time, ensuring regulatory compliance and
 providing transparency into data operations.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Historical analysis&lt;/strong&gt;: Data analysts and data scientists can perform
 historical analysis by querying data at different points in time, uncovering
 trends, patterns, and anomalies that may not be apparent in current data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;time-travel-sql-example&quot;&gt;Time travel SQL example&lt;/h2&gt;

&lt;p&gt;Start by creating a catalog &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example&lt;/code&gt; with the &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot;&gt;Delta Lake
connector&lt;/a&gt;, create a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;demo&lt;/code&gt;
schema, and make it the current catalog with the
&lt;a href=&quot;https://trino.io/docs/current/sql/use.html&quot;&gt;USE&lt;/a&gt; statement.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;USE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;demo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s create a Delta Lake table, add some data, modify the table and add some
more data using the following SQL statement:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_mapping_mode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;name&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Alice&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Bob&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;Mallory&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ALTER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COLUMN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Use the following statement to look at all data in the table:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; id
----
  1
  2
  3
  4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$history&lt;/code&gt; metadata table offers a record of past operations:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;operation&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;users$history&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; version |             timestamp              |  operation
---------+------------------------------------+--------------
       0 | 2024-04-10 17:49:18.528 Asia/Tokyo | CREATE TABLE
       1 | 2024-04-10 17:49:18.755 Asia/Tokyo | WRITE
       2 | 2024-04-10 17:49:18.929 Asia/Tokyo | DROP COLUMNS
       3 | 2024-04-10 17:49:19.137 Asia/Tokyo | WRITE
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can specify the version using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR VERSION AS OF&lt;/code&gt;. For example, to time
travel to version 1, which includes a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WRITE&lt;/code&gt; operation, the query would look
like this:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VERSION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OF&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, time travel not only rolls back the data but also the table definition:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----+---------&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Alice&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Bob&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Mallory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;technical-details&quot;&gt;Technical details&lt;/h2&gt;

&lt;p&gt;Delta Lake manages transaction logs in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_delta_log&lt;/code&gt; directory located under
the table’s specified location.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Last checkpoint&lt;/strong&gt;: The optional file that manages the last checkpoint
version is named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_last_checkpoint&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Delta log entries&lt;/strong&gt;: The JSON file contains an atomic set of actions, for
example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000000.json&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Checkpoints&lt;/strong&gt;: The Parquet file contains the complete replay of all actions,
up to and including the checkpointed table version, for example
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000010.checkpoint.parquet&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More details are available in the &lt;a href=&quot;https://github.com/delta-io/delta/blob/master/PROTOCOL.md&quot;&gt;Delta Lake protocol
documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Following is an example of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_delta_log&lt;/code&gt; directory:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;00000000000000000000.json
00000000000000000001.json
00000000000000000002.json
00000000000000000003.json
00000000000000000003.checkpoint.parquet
00000000000000000004.json
00000000000000000005.json
...
_last_checkpoint
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When the specified version is older than the last checkpoint, such as version 2,
the connector reads the transaction log files starting from the initial
checkpoint file (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000000.json&lt;/code&gt;) up to the specified version
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000002.json&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When the specified version is equal to the last checkpoint, in our example
version 3, the connector reads only the checkpoint file for that version
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000003.checkpoint.parquet&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When the specified version is newer than the last checkpoint, so version 4, the
connector reads the checkpoint file for the last checkpoint version
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000003.checkpoint.parquet&lt;/code&gt;) and the transaction log file for the
specified version (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00000000000000000004.json&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The actual logic without the last checkpoint is more complex because the
connector cannot determine the checkpoints without listing file names in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_delta_log&lt;/code&gt; directory.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Time travel in the Trino &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot;&gt;Delta Lake
connector&lt;/a&gt; opens up new
possibilities for data exploration and analysis, empowering users to delve into
the past and derive insights from historical data. By seamlessly integrating
with Delta Lake’s versioning and transaction logs, Trino provides a powerful
tool for querying data as it appeared at different points in time. Whether it’s
auditing, historical analysis, or data recovery, time travel adds a valuable
dimension to data-driven decision-making, making it an indispensable feature for
modern data platforms.&lt;/p&gt;

&lt;h2 id=&quot;bonus&quot;&gt;Bonus&lt;/h2&gt;

&lt;p&gt;Join us for &lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino Fest 2024&lt;/a&gt; where &lt;a href=&quot;https://github.com/findinpath&quot;&gt;Marius Grama&lt;/a&gt; presents &lt;em&gt;“The open
source journey of the Trino Delta Lake connector”&lt;/em&gt; and shares more tips and
tricks.&lt;/p&gt;</content>

      
        <author>
          <name>Yuya Ebihara</name>
        </author>
      

      <summary>Exciting news - time travel capability has finally arrived in the Delta Lake connector! After introducing support for time travel in the Iceberg connector back in 2022, we’re thrilled to announce that the Delta Lake connector now joins the ranks as the second connector offering this feature.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/trino-delta.png" />
      
    </entry>
  
    <entry>
      <title>58: Understanding your users with Trino and Mitzu</title>
      <link href="https://trino.io/episodes/58.html" rel="alternate" type="text/html" title="58: Understanding your users with Trino and Mitzu" />
      <published>2024-04-04T00:00:00+00:00</published>
      <updated>2024-04-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/58</id>
      <content type="html" xml:base="https://trino.io/episodes/58.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/imeszaros/&quot;&gt;István Mészáros&lt;/a&gt;, Founder and CEO of
&lt;a href=&quot;https://www.mitzu.io/&quot;&gt;Mitzu&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-releases&quot;&gt;Trino releases&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-442.html&quot;&gt;Trino 442&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for configuring AWS deployment type in OpenSearch connector.&lt;/li&gt;
  &lt;li&gt;Fix a regression from 440 in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-443.html&quot;&gt;Trino 443&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ensure all files are deleted when native S3 file system support is enabled,
and some other object storage connector improvements.&lt;/li&gt;
  &lt;li&gt;Add support for a custom authorization header name in Prometheus connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-444.html&quot;&gt;Trino 444&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Update Docker image to use Java 22 for runtime.&lt;/li&gt;
  &lt;li&gt;Numerous performance improvements for the Snowflake connector.&lt;/li&gt;
  &lt;li&gt;Add support for reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BYTE_STREAM_SPLIT&lt;/code&gt; encoding in Parquet files.&lt;/li&gt;
  &lt;li&gt;Add support for canned access control lists with the native S3 file system.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;other-trino-news&quot;&gt;Other Trino news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/release-notes.md#trino-gateway-7-21--mar-2024&quot;&gt;Trino Gateway
7&lt;/a&gt;
shipped with a new user interface thanks to a contribution from our new
&lt;a href=&quot;https://www.starburst.io/community/trino-champions/#peng-wei&quot;&gt;Starburst Trino champion Peng
Wei&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Status of the continuous integration and build setup with Apache Maven
improved a lot thanks to our collaboration with the new &lt;a href=&quot;https://www.starburst.io/community/trino-champions/#tamas-cservenak&quot;&gt;Starburst Trino
champion Tamas Cservenak&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Trino Contributor Call &lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-21-mar-2024&quot;&gt;recap is now
available&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;mitzu&quot;&gt;Mitzu&lt;/h2&gt;

&lt;p&gt;Mitzu is a warehouse-native product analytics platform that revolutionizes how
companies leverage their product usage data in the data lake.&lt;/p&gt;

&lt;p&gt;By directly connecting to Trino, Mitzu eliminates the need for traditional
reverse ETL processes to 3rd party applications such as Amplitude or Mixpanel.
Mitzu enables real-time self-served product analytics on top of the existing
data infrastructure with generated SQL queries.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/client.html#mitzu&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/mitzu.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our conversation and demo session with István we cover all the following
aspects and a few other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is product analytics?&lt;/li&gt;
  &lt;li&gt;Discuss some key terms, such as segmentation, funnels, and retention, and
discuss what insights and benefit become available.&lt;/li&gt;
  &lt;li&gt;What are some example use cases?&lt;/li&gt;
  &lt;li&gt;What kind of products can be analyzed?&lt;/li&gt;
  &lt;li&gt;Use of Mitzu for marketing.&lt;/li&gt;
  &lt;li&gt;What other product analytics tools exist, and what sets Mitzu apart?&lt;/li&gt;
  &lt;li&gt;How is Trino involved to make Mitzu warehouse-native?&lt;/li&gt;
  &lt;li&gt;What are the advantages of being warehouse-native? What does that mean?&lt;/li&gt;
  &lt;li&gt;Compare with Mitzu on other data platforms.&lt;/li&gt;
  &lt;li&gt;Implementation details of the Mitzu and Trino integration, such as connectors,
security, and client libraries&lt;/li&gt;
  &lt;li&gt;How to use Mitzu in terms of deployment and configuration.&lt;/li&gt;
  &lt;li&gt;Cool features of Mitzu.&lt;/li&gt;
  &lt;li&gt;Practical experience and customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Fest news:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Speakers are selected, contact and announcement coming soon&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Register now&lt;/a&gt;, and book
travel and hotel.&lt;/li&gt;
  &lt;li&gt;Contact us to join the Trino Contributor Congregation the next day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other news and events:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred will attend &lt;a href=&quot;https://events.linuxfoundation.org/open-source-summit-north-america/&quot;&gt;Open Source Summit
NA&lt;/a&gt;, and
present a Big Data Whirlwind Tour at the &lt;a href=&quot;https://www.meetup.com/data-engineer-things-seattle-meetup/events/300067664/&quot;&gt;inaugural Data Engineer Things
meeting&lt;/a&gt;
in Seattle.&lt;/li&gt;
  &lt;li&gt;Trino Contributor Call is now planned as monthly event with video recordings.&lt;/li&gt;
  &lt;li&gt;Check out the upcoming &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; episode about jOOQ.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from O’Reilly.
You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>57: Seeing clearly with OpenTelemetry</title>
      <link href="https://trino.io/episodes/57.html" rel="alternate" type="text/html" title="57: Seeing clearly with OpenTelemetry" />
      <published>2024-03-14T00:00:00+00:00</published>
      <updated>2024-03-14T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/57</id>
      <content type="html" xml:base="https://trino.io/episodes/57.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of Trino
Community Leadership at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/electrum/&quot;&gt;David Phillips&lt;/a&gt;, co-creator of Trino
and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jmstephenson/&quot;&gt;Matt Stephenson&lt;/a&gt;, Senior Principal
Software Engineer at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-releases&quot;&gt;Trino releases&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-440.html&quot;&gt;Trino 440&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Snowflake connector&lt;/li&gt;
  &lt;li&gt;Support for sub-queries inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; clauses&lt;/li&gt;
  &lt;li&gt;Support for row filtering and column masking with Open Policy Agent&lt;/li&gt;
  &lt;li&gt;Improved latency when filesystem caching is enabled in Delta and Iceberg connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-441.html&quot;&gt;Trino 441&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Remove the default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;legacy&lt;/code&gt; mode for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.security&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And there is a regression for Iceberg, so wait for 442 potentially. (Update:
&lt;a href=&quot;https://trino.io/docs/current/release/release-442.html&quot;&gt;Trino 442&lt;/a&gt; is released.)&lt;/p&gt;

&lt;h2 id=&quot;other-trino-news&quot;&gt;Other Trino news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/20980&quot;&gt;Java 22 is coming to Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;David Phillips appointed dedicated &lt;a href=&quot;https://trino.io/development/roles.html#file-system-lead.html&quot;&gt;file system lead&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-21-mar-2024&quot;&gt;Trino Contributor Call&lt;/a&gt; on the 21st of March&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2024/02/27/the-definitive-guide-2-jp.html&quot;&gt;Japenese edition of Trino: The Definitive Guide is out&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;opentelemetry&quot;&gt;OpenTelemetry&lt;/h2&gt;

&lt;p&gt;OpenTelemetry is a widely-used collection of APIs, SDKs, and tools that
instrument, generate, collect, and export telemetry data such as metrics, logs,
and traces to help you analyze application performance and behavior.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#opentelemetry&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/opentelemetry.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our conversation with Matt and David we cover all the following aspects, and
a few other topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is &lt;a href=&quot;https://trino.io/ecosystem/add-on#opentelemetry&quot;&gt;OpenTelemetry&lt;/a&gt;?&lt;/li&gt;
  &lt;li&gt;Some basic concepts like &lt;a href=&quot;https://opentelemetry.io/docs/concepts/observability-primer/&quot;&gt;logs, spans, traces&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;How is this related to JMX and system data and other monitoring&lt;/li&gt;
  &lt;li&gt;What is &lt;a href=&quot;https://openmetrics.io/&quot;&gt;OpenMetrics&lt;/a&gt;? How is it related to
&lt;a href=&quot;https://trino.io/ecosystem/data-source.html#prometheus&quot;&gt;Prometheus&lt;/a&gt;?&lt;/li&gt;
  &lt;li&gt;What tools can you use with OpenTelemetry? Jaeger, Datadog, …&lt;/li&gt;
  &lt;li&gt;Reasoning to add OpenTelemetry to Trino&lt;/li&gt;
  &lt;li&gt;Implementation details&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/admin/opentelemetry.html&quot;&gt;Trino documentation&lt;/a&gt; with
local example usage with Docker containers for Trino and Jaeger&lt;/li&gt;
  &lt;li&gt;Practical experience&lt;/li&gt;
  &lt;li&gt;Demo of real world usage with Starburst Galaxy and Datadog&lt;/li&gt;
  &lt;li&gt;Bonus topic - JSON-format logging via TCP socket&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino Fest 2024 and Trino Contributor Congregation&lt;/a&gt; are happening in June in Boston.
Submit your speaker proposals now, and register for the free event as soon as
you can, especially for live attendance.&lt;/p&gt;

&lt;p&gt;Check out the upcoming &lt;a href=&quot;https://trino.io/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; episodes about Mitzu and jOOQ.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from O’Reilly.
You can get &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or buy the
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;English, Polish, Chinese, or Japanese
edition&lt;/a&gt; online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Blazing ahead with 22</title>
      <link href="https://trino.io/blog/2024/03/13/java-22.html" rel="alternate" type="text/html" title="Blazing ahead with 22" />
      <published>2024-03-13T00:00:00+00:00</published>
      <updated>2024-03-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/03/13/java-22</id>
      <content type="html" xml:base="https://trino.io/blog/2024/03/13/java-22.html">&lt;p&gt;It was not that long ago that we &lt;a href=&quot;/blog/2023/11/03/java-21.html&quot;&gt;first announced support for Java 21&lt;/a&gt;, and subsequently made it a build and runtime
requirement with &lt;a href=&quot;https://trino.io/docs/current/release/release-436.html&quot;&gt;Trino 436&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since then, the codebase received some significant improvements in readability,
and we have also seen better performance. However, innovation in Trino and Java
is not holding still, on the contrary - it’s accelerating. On the Java
community side, Java 22 is just about to be released, and we think it is time
to drive innovation in Trino even further. Trino is going to use and require
Java 22 soon!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background-and-motivation&quot;&gt;Background and motivation&lt;/h2&gt;

&lt;p&gt;The planned move to use and require Java 22 for build and runtime of Trino is
driven by numerous aspects:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Take advantage of performance and runtime improvements of the new JVM version.&lt;/li&gt;
  &lt;li&gt;Use the newly available language features to further improve readability and
maintenance aspects of the codebase.&lt;/li&gt;
  &lt;li&gt;Enable the use of further performance improvements for Trino under the umbrella
of &lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Attract and motivate more contributors for Trino as an opportunity to work
with a modern Java stack on a cutting edge, complex application and work with
the relevant language features and APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Speaking about APIs and new features, let’s look at a list of JDK Enhancement
Proposals (JEPs) that we are actively looking at. Specifically we plan to
experiment, and adopt any non-preview JEPs where we see benefits. We also plan
to submit any issues and problems we encounter back upstream to the Java
community:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Region Pinning for G1 (&lt;a href=&quot;https://openjdk.org/jeps/423&quot;&gt;JEP 423&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Foreign Function &amp;amp; Memory API (&lt;a href=&quot;https://openjdk.org/jeps/454&quot;&gt;JEP 454&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Unnamed Variables and Patterns (&lt;a href=&quot;https://openjdk.org/jeps/456&quot;&gt;JEP 456&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Class File API in preview (&lt;a href=&quot;https://openjdk.org/jeps/457&quot;&gt;JEP 457&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;String Templates in second preview (&lt;a href=&quot;https://openjdk.org/jeps/459&quot;&gt;JEP 459&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Vector API in 7th incubator (&lt;a href=&quot;https://openjdk.org/jeps/460&quot;&gt;JEP 460&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Structured Concurrency in second preview  (&lt;a href=&quot;https://openjdk.org/jeps/462&quot;&gt;JEP 462&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Scoped Values in second preview  (&lt;a href=&quot;https://openjdk.org/jeps/464&quot;&gt;JEP 464&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many of these API’s allow us to further modernize the feature set of Trino and
adapt it to current hardware and compute power realities. Specifically we can
continue with our commitment to the Java ecosystem and avoid many of the
complexities and pitfalls of JNI - the traditional, now legacy integrations with
native code and specific hardware features.&lt;/p&gt;

&lt;p&gt;Another aspect some of you might wonder about is the move from a Java LTS
version to a Java STS release – from “long term support” to “short term
support”. So far Trino was using Java 8, Java 11, Java 17, and then Java 21 as
requirements. Since all of them are LTS releases, some of you might have
concluded that we have a policy of only using Java LTS versions. That is not the
case, it is only a coincidence.&lt;/p&gt;

&lt;p&gt;We always thrived to use up to date source code, dependencies, runtime
environments, and so forth. The benefits, including better performance,
available and included bug fixes, reduced need for backports, less security
issues, and support for modern language features, development environments, and
tooling, always far outweighed the effort of staying up to date.&lt;/p&gt;

&lt;p&gt;We are now finally at the long planned status where we can move quick enough as
a project to use latest tools, dependencies, and Java releases and keep
iterating on our frequent releases. And that is exactly what we are doing for
the benefit of everyone contributing to Trino and using Trino. Java 22 now. And
then later this year we can move to Java 23, and next year to 24 and 25.&lt;/p&gt;

&lt;p&gt;So what are we specifically doing now?&lt;/p&gt;

&lt;h2 id=&quot;current-status-and-plans&quot;&gt;Current status and plans&lt;/h2&gt;

&lt;p&gt;Java 22 is scheduled to ship in March 2024. The various JDK distribution
binary packages will become available shortly after the official release.&lt;/p&gt;

&lt;p&gt;Early access source and binaries are already available, and our continuous
integration builds already use such an EA build successfully.&lt;/p&gt;

&lt;p&gt;Overall the transition is going well. Our plan is to follow the same approach as
our switch to Java 21:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ensure everything works with Java 22.&lt;/li&gt;
  &lt;li&gt;Change the container image to use Java 22.&lt;/li&gt;
  &lt;li&gt;Cut a release and get community feedback from testing with the container.&lt;/li&gt;
  &lt;li&gt;Adjust to any feedback and available improvements for a few releases.&lt;/li&gt;
  &lt;li&gt;Switch the requirement for build and runtime to Java 22.&lt;/li&gt;
  &lt;li&gt;Cut another release and celebrate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then the real fun starts all over. We can update code, libraries, and start
working with the new APIs. Timing on all the work depends on obstacles we find
on the way and how we progress with removing them.&lt;/p&gt;

&lt;p&gt;We use the &lt;a href=&quot;https://github.com/trinodb/trino/issues/20980&quot;&gt;Java 22 tracking
issue&lt;/a&gt; and the linked issues and
pull requests to manage progress, discuss next steps, and work with the
community.&lt;/p&gt;

&lt;p&gt;Feel free to chime in there or find us on the &lt;a href=&quot;https://trinodb.slack.com/archives/CP1MUNEUX&quot;&gt;#dev
channel&lt;/a&gt; on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino community
Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Join us in this exciting next step for Trino.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Update from 8 May 2024:&lt;/strong&gt;
The release of &lt;a href=&quot;https://trino.io/docs/current/release/release-447.html&quot;&gt;Trino 447&lt;/a&gt;
includes the switch to Java 22 as a requirement for running Trino.&lt;/p&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>It was not that long ago that we first announced support for Java 21, and subsequently made it a build and runtime requirement with Trino 436. Since then, the codebase received some significant improvements in readability, and we have also seen better performance. However, innovation in Trino and Java is not holding still, on the contrary - it’s accelerating. On the Java community side, Java 22 is just about to be released, and we think it is time to drive innovation in Trino even further. Trino is going to use and require Java 22 soon!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/java-duke-22.png" />
      
    </entry>
  
    <entry>
      <title>A cache refresh for Trino</title>
      <link href="https://trino.io/blog/2024/03/08/cache-refresh.html" rel="alternate" type="text/html" title="A cache refresh for Trino" />
      <published>2024-03-08T00:00:00+00:00</published>
      <updated>2024-03-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/03/08/cache-refresh</id>
      <content type="html" xml:base="https://trino.io/blog/2024/03/08/cache-refresh.html">&lt;p&gt;Thinking about our recent work on caching in Trino reminds me of the famous
saying, &lt;a href=&quot;https://www.karlton.org/2017/12/naming-things-hard/&quot;&gt;“There are only two hard things in computer science: cache invalidation
and naming things&lt;/a&gt;.” Well,
in the Trino community we know all about caching and naming. With the recent
&lt;a href=&quot;https://trino.io/docs/current/release/release-439.html&quot;&gt;Trino 439 release&lt;/a&gt;, caching
from object storage file systems got a refresh. Catalogs using the Delta Lake,
Hive, Iceberg, and soon Hudi connectors now get to access performance benefits
from the new Alluxio-powered file system caching.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;in-the-past&quot;&gt;In the past&lt;/h2&gt;

&lt;p&gt;So how did we get here? A long, long time ago, Qubole open-sourced a &lt;a href=&quot;https://github.com/qubole/rubix&quot;&gt;light
light-weight data caching framework called
RubiX&lt;/a&gt;. The library was integrated into the
Trino Hive connector, and it enabled &lt;a href=&quot;https://trino.io/docs/438/connector/hive-caching.html&quot;&gt;Hive connector storage
caching&lt;/a&gt;. But over time, any
open source project without active maintenance becomes stale. And like a stale
cache, a stale open source project can cause issues, or becomes outdated and
unsuitable for modern use. Though RubiX had once served Trino well, it was time
to remove the dust, and RubiX had to go.&lt;/p&gt;

&lt;h2 id=&quot;making-progress&quot;&gt;Making progress&lt;/h2&gt;

&lt;p&gt;Catching back up to 2024, Trino now includes powerful connectors for the modern
lakehouse formats Delta Lake, Hudi, and Iceberg:&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/delta-lake.png&quot; title=&quot;Delta Lake connector&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://trino.io/docs/current/connector/hudi.html&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/apache-hudi.png&quot; title=&quot;Hudi connector&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/apache-iceberg.png&quot; title=&quot;Iceberg connector&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Hive is still around, just like HDFS, but we consider them both close to legacy
status. Yet all four connectors could benefit from caching. Good news came at
Trino Summit 2022 when Hope Wang and Beinan Wang from
&lt;a href=&quot;https://trino.io/ecosystem/add-on.html#alluxio&quot;&gt;Alluxio&lt;/a&gt; presented about their
integration with Trino and the Hive connector - &lt;a href=&quot;/blog/2023/07/21/trino-fest-2023-alluxio-recap.html&quot;&gt;Trino optimization with
distributed caching on data lake&lt;/a&gt;. They mentioned plans to open
source their implementation and an initial pull request (PR) was created.&lt;/p&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;&lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio.png&quot; title=&quot;Alluxio&quot; /&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;&lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id=&quot;collaboration&quot;&gt;Collaboration&lt;/h2&gt;

&lt;p&gt;The initial presentation and PR planted a seed in the community. The Trino
project had been moving fast in terms of deprecating the old dependencies from
the Hadoop and Hive ecosystem, so the initial Alluxio PR was no longer up to
date and compatible with latest Trino version. Discussions with &lt;a href=&quot;https://github.com/electrum&quot;&gt;David
Phillips&lt;/a&gt; laid out the path to adjust to the new
file system support and get ready for reviews towards a merge.&lt;/p&gt;

&lt;p&gt;In the end it was &lt;a href=&quot;https://github.com/pluies&quot;&gt;Florent Delannoy&lt;/a&gt; who started
another &lt;a href=&quot;https://github.com/trinodb/trino/pull/18719&quot;&gt;PR for file system caching support, specifically for the Delta Lake
connector&lt;/a&gt;. His teammate &lt;a href=&quot;https://github.com/jkylling&quot;&gt;Jonas
Irgens Kylling&lt;/a&gt;, also a &lt;a href=&quot;/blog/2023/07/14/trino-fest-2023-dune.html&quot;&gt;presenter from Trino Fest
2023&lt;/a&gt;, took over the work on the
PR. The collaboration on it was an &lt;strong&gt;epic effort&lt;/strong&gt;. After many months of time,
over 300 comments directly on GitHub and numerous hours of coding, reviewing,
testing, and discussion on Slack and elsewhere the work finally resulted in a
successful merge, and therefore inclusion in the next release.&lt;/p&gt;

&lt;p&gt;Special props for their help for Florent and Jonas must go out to &lt;a href=&quot;https://github.com/electrum&quot;&gt;David
Phillips&lt;/a&gt;, &lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;Raunaq
Morarka&lt;/a&gt;, &lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr
Findeisen&lt;/a&gt;, &lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz
Gajewski&lt;/a&gt;, &lt;a href=&quot;https://github.com/beinan&quot;&gt;Beinan Wang&lt;/a&gt;,
&lt;a href=&quot;https://github.com/amoghmargoor&quot;&gt;Amogh Margoor&lt;/a&gt;, &lt;a href=&quot;https://github.com/osscm&quot;&gt;Manish
Malhorta&lt;/a&gt;, and &lt;a href=&quot;https://github.com/marton-bod&quot;&gt;Marton
Bod&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;finishing&quot;&gt;Finishing&lt;/h2&gt;

&lt;p&gt;In parallel to the work on the initial PR for Delta Lake, yours truly ended up
working on the documentation, and pulled together an &lt;a href=&quot;https://github.com/trinodb/trino/issues/20550&quot;&gt;issue and conversations to
streamline the roll out&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt; had also put together a PR to
remove the old RubiX integration already. With the merge of the initial PR we
were off to the races. We merged the removal of RubiX and the addition of the
docs. Mateusz also added support for OpenTelemetry.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/osscm&quot;&gt;Manish Malhorta&lt;/a&gt; and &lt;a href=&quot;https://github.com/amoghmargoor&quot;&gt;Amogh
Margoor&lt;/a&gt; sent a PR for Iceberg support. They
were also about to add Hive support, when &lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;Raunaq
Morarka&lt;/a&gt; beat them and submitted that PR.&lt;/p&gt;

&lt;p&gt;After some final clean up, &lt;a href=&quot;https://github.com/colebow&quot;&gt;Cole Bowden&lt;/a&gt; and &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin
Traverso&lt;/a&gt; got the release notes together and shipped
&lt;a href=&quot;https://trino.io/docs/current/release/release-438.html&quot;&gt;Trino 439&lt;/a&gt;! Now you can use
it, too.&lt;/p&gt;

&lt;h2 id=&quot;using-file-system-caching&quot;&gt;Using file system caching&lt;/h2&gt;

&lt;p&gt;There are only a few relatively simple steps to add file system caching to your
catalogs that use Delta Lake, Hive, or Iceberg connectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Provision fast local file system storage on all your Trino cluster nodes. How
you do that depends on your cluster provisioning.&lt;/li&gt;
  &lt;li&gt;Enable file system caching and configure the cache location, for example at
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/trino-cache&lt;/code&gt; on the nodes, in your catalog properties files.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;fs.cache.enabled=true
fs.cache.directories=/tmp/trino-cache
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After a cluster restart, file system caching is active for the configured
catalogs, and you can tweak it with &lt;a href=&quot;https://trino.io/docs/current/object-storage/file-system-cache.html&quot;&gt;further, optional configuration
properties&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;What a success! It took many members from the global Trino village to get this
feature added. Now our users across the globe can enjoy even more benefits of
using Trino, and also participate in our next steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Further improvements to the current implementation, maybe adding
worker-to-worker connections for exchanging cached files.&lt;/li&gt;
  &lt;li&gt;Preparation to add file system caching with the Hudi connector is in progress
with &lt;a href=&quot;https://github.com/codope&quot;&gt;Sagar Sumit&lt;/a&gt; and &lt;a href=&quot;https://github.com/yihua&quot;&gt;Y Ethan
Guo&lt;/a&gt; and implementation is following next.&lt;/li&gt;
  &lt;li&gt;Adjust to any learnings from production usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our thanks, and those from all current and future users, go out to everyone
involved in this effort. What are we going to do next?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;PS: If you want to share your use of Trino or connect with other Trino users,
&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;join us for the free Trino Fest 2024&lt;/a&gt; as speaker or attendee live in Boston,
or virtually from your home.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Thinking about our recent work on caching in Trino reminds me of the famous saying, “There are only two hard things in computer science: cache invalidation and naming things.” Well, in the Trino community we know all about caching and naming. With the recent Trino 439 release, caching from object storage file systems got a refresh. Catalogs using the Delta Lake, Hive, Iceberg, and soon Hudi connectors now get to access performance benefits from the new Alluxio-powered file system caching.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-cache-refresh.png" />
      
    </entry>
  
    <entry>
      <title>Japanese edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2024/02/27/the-definitive-guide-2-jp.html" rel="alternate" type="text/html" title="Japanese edition of Trino: The Definitive Guide" />
      <published>2024-02-27T00:00:00+00:00</published>
      <updated>2024-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/02/27/the-definitive-guide-2-jp</id>
      <content type="html" xml:base="https://trino.io/blog/2024/02/27/the-definitive-guide-2-jp.html">&lt;p&gt;Do you know where the name ‘Trino’ comes from? It’s actually a shortened form of
‘neutrino’. These fast and lightweight subatomic particles have recently made
their way to Japan. You can now reserve your copy of the Japanese edition of
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce that the Japanese translation of the book
&lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt; is
available for the communities all across Japan and far beyond. Preorder today
and get your copy from the first batch in the middle of March. Hopefully it can
lower the barrier to Trino for native speakers. We invite you all to get your
own copy:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.hanmoto.com/bd/isbn/9784798071671&quot;&gt;
        分散SQLクエリエンジンTrino徹底ガイド 秀和システム
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Our thanks goes out Masanori Nishida and his teams at Shuwa System. I would also
like to thank my great team of translators and collaborators, &lt;a href=&quot;https://github.com/Lewuathe&quot;&gt;Kai
Sasaki&lt;/a&gt;, &lt;a href=&quot;https://github.com/aajisaka&quot;&gt;Akira
Ajisaka&lt;/a&gt;, &lt;a href=&quot;https://github.com/eurekaeru&quot;&gt;Kaname
Nishizuka&lt;/a&gt;, and &lt;a href=&quot;https://github.com/mikiT&quot;&gt;Miki
Takata&lt;/a&gt; for their help in making the book a reality.
We hope many readers can benefit from the translated edition.&lt;/p&gt;

&lt;p&gt;We look forward to chatting with many of our new readers and Trino users on the
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=general-jp&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;general-jp&lt;/code&gt;&lt;/a&gt;
channel in &lt;a href=&quot;/slack.html&quot;&gt;the Trino community Slack&lt;/a&gt;, other
channels, and direct messaging.&lt;/p&gt;

&lt;p&gt;Also, don’t forget to tell us about your usage of &lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino in the upcoming Trino
Fest 2024 as a speaker. Or just register to attend the free event&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yuya Ebihara&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Yuya Ebihara</name>
        </author>
      

      <summary>Do you know where the name ‘Trino’ comes from? It’s actually a shortened form of ‘neutrino’. These fast and lightweight subatomic particles have recently made their way to Japan. You can now reserve your copy of the Japanese edition of Trino: The Definitive Guide!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-jp-cover.jpg" />
      
    </entry>
  
    <entry>
      <title>56: The vast possibilities of VAST and Trino</title>
      <link href="https://trino.io/episodes/56.html" rel="alternate" type="text/html" title="56: The vast possibilities of VAST and Trino" />
      <published>2024-02-22T00:00:00+00:00</published>
      <updated>2024-02-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/56</id>
      <content type="html" xml:base="https://trino.io/episodes/56.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Trino Community Leadership at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://linkedin.com/in/colleen-tartow-phd&quot;&gt;Colleen Tartow&lt;/a&gt;, Field CTO and
Head of Strategy at &lt;a href=&quot;https://vastdata.com/&quot;&gt;VAST Data&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/roman-zeyde/&quot;&gt;Roman Zeyde&lt;/a&gt;, Senior Software
Engineer at &lt;a href=&quot;https://vastdata.com/&quot;&gt;VAST Data&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-439&quot;&gt;Release 439&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-439.html&quot;&gt;Trino 439&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New caching layer for Delta Lake, Hive, and Iceberg!&lt;/li&gt;
  &lt;li&gt;Documentation for new native file system support.&lt;/li&gt;
  &lt;li&gt;Fix for setting session properties on catalogs with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.&lt;/code&gt; in the name.&lt;/li&gt;
  &lt;li&gt;Fix for reading Snappy data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-gateway-6&quot;&gt;Trino Gateway 6&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Docker container setup!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-the-vast-database-and-data-platform&quot;&gt;Concept of the episode: The VAST database and data platform&lt;/h2&gt;

&lt;p&gt;Part database, part data warehouse, part data lake, describing
&lt;a href=&quot;https://vastdata.com/&quot;&gt;VAST&lt;/a&gt; in one sentence is not the easiest undertaking.
You can talk about features like deep write buffers with underlying flash
columnar storage, the automatic contextual layer added on top of the data, or
the similarity-based global compression that more than makes up for the smaller
columnar chunks and makes it so much faster to find exactly the data you’re
looking for.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/ecosystem/data-source.html#vast&quot;&gt;
  &lt;img src=&quot;https://trino.io/assets/images/logos/vast.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So what is VAST? It’s a state-of-the-art data platform. Why are we talking about
it on the Trino Community Broadcast? A world-class data storage solution still
needs a world-class query engine, and its speed paired with Trino’s makes for a
brilliant combination. We’re diving into how it works, why it is designed the
way it is, and maybe talk about the really cool &lt;a href=&quot;https://vastdata.com/database#performance-comparison&quot;&gt;performance
comparison&lt;/a&gt; they have on
their website showcasing Trino as their favorite query engine.&lt;/p&gt;

&lt;p&gt;Check out our conversation about the VAST database, VAST data platform, the
Trino connector, internal workings of the system, use case, customers and much
more in the interview.&lt;/p&gt;

&lt;p&gt;Also have a look at the &lt;a href=&quot;https://www.youtube.com/watch?v=RutbCY8i22Q&quot;&gt;presentation from Jason Russler about VAST from Trino
Summit 2023&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2024/02/20/announcing-trino-fest-2024.html&quot;&gt;Trino Fest 2024 has been announced&lt;/a&gt; for this summer in Boston! Make sure
to check out the announcement blog post and register to attend, submit your
talks, or contact Starburst for information on sponsoring!&lt;/p&gt;

&lt;p&gt;Check out the upcoming Trino Community Broadcast episodes about OpenTelemetry
and Mitzu.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino Fest goes to Boston in 2024</title>
      <link href="https://trino.io/blog/2024/02/20/announcing-trino-fest-2024.html" rel="alternate" type="text/html" title="Trino Fest goes to Boston in 2024" />
      <published>2024-02-20T00:00:00+00:00</published>
      <updated>2024-02-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/02/20/announcing-trino-fest-2024</id>
      <content type="html" xml:base="https://trino.io/blog/2024/02/20/announcing-trino-fest-2024.html">&lt;p&gt;After the resounding success of Trino Fest and Trino Summit in 2023, Commander
Bun Bun has exciting news to share: we’re taking our biggest events of the year
back to being in-person. They’ll be hybrid, to be more specific, so if you can’t
travel, don’t fret, you’ll still be able to watch and ask questions in chat.
But if you can travel, you won’t want to miss out! Everything you already know
and love about Trino Fest is moving to the East Coast for the lovely Boston
summer. The event is on the 13th of June in the Hyatt Regency Boston, where
we’ll have a full day of talks, time to network, and a happy hour at the end of
the day. You may even get to meet Commander Bun Bun, who’s ditching the hiking
gear in favor of training for the Olympics. Sound exciting?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;http://www.starburst.io/info/trino-fest-2024?utm_medium=trino&amp;amp;utm_source=website&amp;amp;utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&amp;amp;utm_content=Blog-1&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;join-us-in-person&quot;&gt;Join us in person&lt;/h2&gt;

&lt;p&gt;Our event will be hosted at the Hyatt Regency in Boston, where we are planning a
full day of festivities followed by a happy hour on the Hyatt Regency deck.
There is a
&lt;a href=&quot;https://www.hyatt.com/en-US/group-booking/BOSTO/G-STA4&quot;&gt;discounted room block&lt;/a&gt;
set aside for those interested in attending live and staying with us in Boston.
If you are looking to book hotel dates in addition to what is provided on the
room block, email &lt;a href=&quot;mailto:events@starburstdata.com&quot;&gt;events@starburstdata.com&lt;/a&gt;,
and they will help you coordinate your reservation.&lt;/p&gt;

&lt;p&gt;Regardless of whether you plan on attending in person or online, you do need to
register, so make sure to click the button above!&lt;/p&gt;

&lt;h2 id=&quot;call-for-speakers&quot;&gt;Call for speakers&lt;/h2&gt;

&lt;p&gt;Interested in speaking? We want to hear from everyone in the Trino community
who has something to share. If you aren’t sure whether it’s worth it to submit,
submit anyway! We’ll review all submissions, and we’ll do our best to work with
you to turn your talk into a smash hit. We are looking for both full sessions
(about 30 minutes) and lightning talks (10-15 minutes). We welcome intermediate
to advanced submissions for talks that are connected to Trino on any of the
following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Best practices and use cases&lt;/li&gt;
  &lt;li&gt;Data migrations&lt;/li&gt;
  &lt;li&gt;Optimizations and performance improvements&lt;/li&gt;
  &lt;li&gt;Data governance&lt;/li&gt;
  &lt;li&gt;Data engineering, including batch and streaming architectures&lt;/li&gt;
  &lt;li&gt;Data science&lt;/li&gt;
  &lt;li&gt;SQL analytics and BI&lt;/li&gt;
  &lt;li&gt;Cloud data lake use cases&lt;/li&gt;
  &lt;li&gt;Data lake architecture&lt;/li&gt;
  &lt;li&gt;Query federation&lt;/li&gt;
  &lt;li&gt;Table formats&lt;/li&gt;
  &lt;li&gt;Data ingestion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to speak?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://sessionize.com/trino-fest-2024&quot;&gt;
        Submit a talk!
    &lt;/a&gt;
&lt;/div&gt;

&lt;h2 id=&quot;-trino-contributor-congregation&quot;&gt;&lt;a name=&quot;tcc&quot;&gt;&lt;/a&gt; Trino contributor congregation&lt;/h2&gt;

&lt;p&gt;The day after Trino Fest, we’ll also be hosting an in-person meetup for
Trino contributors and engineers to catch up, discuss the Trino roadmap, and
engage directly with the maintainers in-person. It’s a great opportunity to put
faces and voices to those GitHub handles, align on the big ideas or tricky PRs
that have been moving slowly, and find more ways to get involved in Trino
development. If you’re interested in attending, message Manfred Moser or Cole
Bowden on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, and we’ll get you added to
the attendee list and share more details.&lt;/p&gt;

&lt;h2 id=&quot;sponsor-trino-fest&quot;&gt;Sponsor Trino Fest&lt;/h2&gt;

&lt;p&gt;Starburst is the organizing sponsor of the event, but to make Trino Fest a
smashing success, they’re excited and interested in collaborating with other
organizations within the community. If you are interested in sponsoring, email
&lt;a href=&quot;mailto:events@starburstdata.com&quot;&gt;events@starburstdata.com&lt;/a&gt; for information.&lt;/p&gt;

&lt;p&gt;And regardless of whether you’re planning on attending, speaking, or sponsoring,
we look forward to seeing you soon!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>After the resounding success of Trino Fest and Trino Summit in 2023, Commander Bun Bun has exciting news to share: we’re taking our biggest events of the year back to being in-person. They’ll be hybrid, to be more specific, so if you can’t travel, don’t fret, you’ll still be able to watch and ask questions in chat. But if you can travel, you won’t want to miss out! Everything you already know and love about Trino Fest is moving to the East Coast for the lovely Boston summer. The event is on the 13th of June in the Hyatt Regency Boston, where we’ll have a full day of talks, time to network, and a happy hour at the end of the day. You may even get to meet Commander Bun Bun, who’s ditching the hiking gear in favor of training for the Olympics. Sound exciting? Register to attend!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2024/announcement-banner.png" />
      
    </entry>
  
    <entry>
      <title>Open Policy Agent for Trino arrived</title>
      <link href="https://trino.io/blog/2024/02/06/opa-arrived.html" rel="alternate" type="text/html" title="Open Policy Agent for Trino arrived" />
      <published>2024-02-06T00:00:00+00:00</published>
      <updated>2024-02-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/02/06/opa-arrived</id>
      <content type="html" xml:base="https://trino.io/blog/2024/02/06/opa-arrived.html">&lt;p&gt;Trino now ships with an access control integration using the popular and widely
used &lt;a href=&quot;https://www.openpolicyagent.org/&quot;&gt;Open Policy Agent (OPA)&lt;/a&gt; from the Cloud Native
Computing Foundation. The release of &lt;a href=&quot;https://trino.io/docs/current/release/release-438.html&quot;&gt;Trino
438&lt;/a&gt; marks an important
milestone of the effort towards this integration.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;collaboration-and-history&quot;&gt;Collaboration and history&lt;/h2&gt;

&lt;p&gt;Open Policy Agent was first released in 2016 and has gained more and more
popularity in the ecosystem of cloud native applications and beyond.&lt;/p&gt;

&lt;p&gt;Initial efforts for an integration with Trino started at Bloomberg, Stackable,
Raft, and other places separately and sometimes in parallel, with only partial
collaboration. You might have first heard about it in August 2022 in the &lt;a href=&quot;https://trino.io/episodes/39.html&quot;&gt;Trino
Community Broadcast episode 39&lt;/a&gt; with a team from
Raft as guests.&lt;/p&gt;

&lt;p&gt;Usage and experience with OPA grew. In the end, Pablo Arteaga from
&lt;a href=&quot;https://www.techatbloomberg.com/&quot;&gt;Bloomberg&lt;/a&gt; and Sebastian Bernauer and Sönke
Liebau from &lt;a href=&quot;https://stackable.tech/&quot;&gt;Stackable&lt;/a&gt; had the initiative to start a
pull request to Trino. Their persistence and collaboration led them through many
review comments, update commits, and even a second PR, to submit a talk and
eventually present at Trino Summit 2023 about the Open Policy Agent access
control with Trino and their motivation to move from Apache Ranger to OPA.&lt;/p&gt;

&lt;h2 id=&quot;opa-at-trino-summit-2023&quot;&gt;OPA at Trino Summit 2023&lt;/h2&gt;

&lt;p&gt;The presentation from Pablo and Sönke titled “Trino OPA authorizer - An open
source love story” received a lot of interest from the audience at the event and
on YouTube since then. They detailed the architectural differences of using
Ranger and OPA. Sönke detailed the usage of OPA in the Stackable platform and
how it enables a single access control platform to apply across many systems.
They discussed their collaboration on the pull request, and Pablo showed a
migration path from Ranger, and a full demo of OPA with Trino.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/fbqqapQbAv0&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;They also made the &lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/opa-trino.pdf&quot;&gt;slide deck available for your
reference&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Edward Morgan and Bhaarat Sharma from &lt;a href=&quot;https://teamraft.com/&quot;&gt;Raft&lt;/a&gt; also
presented &lt;a href=&quot;https://www.youtube.com/watch?v=6KspMwCbOfI&quot;&gt;Avoiding pitfalls with query federation in data
lakehouses&lt;/a&gt; at Trino Summit, and
detailed their OPA usage in their Data Fabric platform. It combines Delta Lake,
Trino, Apache Kafka, and Open Policy Agent (OPA) into a robust lakehouse data
platform. They talked about access control in Trino overall and how important it
is for their customers, including the US Department of Defense. Their
presentation also included a demo of OPA with Trino.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/6KspMwCbOfI&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;opa-on-the-way-to-trino&quot;&gt;OPA on the way to Trino&lt;/h2&gt;

&lt;p&gt;Pablo and Sebastian continued their efforts on the &lt;a href=&quot;https://github.com/trinodb/trino/pull/19532&quot;&gt;pull
request&lt;/a&gt; after Trino Summit. They
worked successfully with Dain on the code review and necessary changes, and
helped Manfred with the documentation.&lt;/p&gt;

&lt;p&gt;Finally, with the release of Trino 438, the &lt;a href=&quot;https://trino.io/docs/current/security/opa-access-control.html&quot;&gt;Open Policy Agent access
control&lt;/a&gt; is available
to all Trino users.&lt;/p&gt;

&lt;p&gt;The community is already taking notice with follow up pull requests for further
improvements and blog posts such as &lt;a href=&quot;https://www.linkedin.com/pulse/enhancing-security-observability-trino-open-policy-agent-isa-inalcik-zhl9e/&quot;&gt;Enhancing Security and Observability in
Trino with Open Policy Agent and
OpenTelemetry&lt;/a&gt;
from Isa Inalcik.&lt;/p&gt;

&lt;h2 id=&quot;benefits-of-opa&quot;&gt;Benefits of OPA&lt;/h2&gt;

&lt;p&gt;The arrival of OPA support for Trino marks an important step. OPA is a mature
and widely used access control system. Its
&lt;a href=&quot;https://www.openpolicyagent.org/ecosystem/&quot;&gt;ecosystem&lt;/a&gt; includes many
integrations, user interfaces, development tools, and other resources.&lt;/p&gt;

&lt;p&gt;OPA is a very flexible authorization system, making it an ideal match for Trino.
Trino deployments are often part of a diverse data platform, spanning a variety
 of interconnected data sources, pipelines, client tools and applications.&lt;/p&gt;

&lt;p&gt;Trino users now have an alternative to the file-based access
control from the Trino project itself, the effort to support your own Ranger
integration, or the use of commercial offerings for access control.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;We reached another milestone but we are not done yet. Specifically for OPA, we
are looking at the following next tasks:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Get more features from various older, private forks converted into pull
requests to Trino so everyone can benefit.&lt;/li&gt;
  &lt;li&gt;Update the documentation with more practical advice and tips.&lt;/li&gt;
  &lt;li&gt;Provide further resources for running OPA with Trino, writing rego scripts,
and helping the community.&lt;/li&gt;
  &lt;li&gt;Implementation of row level filtering and column masking, based on the
&lt;a href=&quot;https://github.com/bloomberg/trino/pull/16&quot;&gt;draft&lt;/a&gt; from Pablo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks go to everyone participating so far. Consider this an open
invitation to join the effort.&lt;/p&gt;

&lt;p&gt;Ping me on Slack directly or find us in #opa-dev.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Trino now ships with an access control integration using the popular and widely used Open Policy Agent (OPA) from the Cloud Native Computing Foundation. The release of Trino 438 marks an important milestone of the effort towards this integration.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/opa-small.png" />
      
    </entry>
  
    <entry>
      <title>Trino 2023 wrapped</title>
      <link href="https://trino.io/blog/2024/01/19/trino-2023-wrapped.html" rel="alternate" type="text/html" title="Trino 2023 wrapped" />
      <published>2024-01-19T00:00:00+00:00</published>
      <updated>2024-01-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2024/01/19/trino-2023-wrapped</id>
      <content type="html" xml:base="https://trino.io/blog/2024/01/19/trino-2023-wrapped.html">&lt;p&gt;If &lt;a href=&quot;https://www.newsroom.spotify.com/2023-wrapped/&quot;&gt;“Wrapped” is good enough for Spotify&lt;/a&gt;, 
it’s good enough for Trino, right? As we look forward to a bright 2024, we can
also take a moment to get sentimental, look back at everything we’ve
accomplished, and reflect on the progress we’ve made. Commander Bun Bun has been
hard at work, so if you haven’t been paying close attention to Trino or want an
idea of all that went down in 2023, we’re happy to present you with an end of
year recap. We’ll be exploring what’s gone on in the community, on development,
the events we’ve hosted, and discuss the cool new features and technologies you
can use when you’re running Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/IRq3ZNR9Dgs&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;2023-by-the-numbers&quot;&gt;2023 by the numbers&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;64,288 views 👀 on YouTube&lt;/li&gt;
  &lt;li&gt;5,872 hours watched ⌚on YouTube&lt;/li&gt;
  &lt;li&gt;5,018 new commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;2,985 new stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;2,494 pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;1,227 issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;704 new subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;45 videos 🎥 uploaded to YouTube&lt;/li&gt;
  &lt;li&gt;30 Trino 🚀 releases&lt;/li&gt;
  &lt;li&gt;39 blog ✍️ posts&lt;/li&gt;
  &lt;li&gt;10 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;2 Trino ⛰️ Summits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re excited to say that Trino continued to grow in 2023:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;GitHub stars increased by nearly 50% total and by 8% more than last year&lt;/li&gt;
  &lt;li&gt;Commits increased by 7%&lt;/li&gt;
  &lt;li&gt;Slack usage picked up dramatically&lt;/li&gt;
  &lt;li&gt;YouTube viewership was up 7% despite a lack of Pokemon-themed musical content compared to 2022 (our bad)&lt;/li&gt;
  &lt;li&gt;30 releases kept new versions of Trino coming out more than every other week.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks in part to all that growth, it’s more important than ever to be on
&lt;a href=&quot;/slack.html&quot;&gt;our Slack&lt;/a&gt;. If you’re a Trino user or community member and aren’t
already on there, you’re missing out! Make sure to join up for community
announcements, release statuses, the shared expertise of the entire Trino
community, and event-specific channels for discussion when we’re hosting things 
like Trino Fest and Trino Summit. Speaking of those…&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;One of the best parts of being an open source community is that it’s easy to be
excited and connect with others about using such a cool piece of technology.
Whether that’s bringing Trino to new users who can take advantage of it, or
sharing our learnings with other Trino users to make the most, events are one of
the best ways to distribute that knowledge. So what were we up to this year?&lt;/p&gt;

&lt;h3 id=&quot;trino-fest-and-trino-summit&quot;&gt;Trino Fest and Trino Summit&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/blog/2023/12/18/trino-summit-recap.html&quot;&gt;Trino Summit&lt;/a&gt; are
becoming mainstays on the Trino calendar each year, and 2023 was no different.
Formerly “Cinco de Trino,” we ditched the Cinco de Mayo theme and went with the
simpler “Trino Fest” in June, opting to theme it around Commander Bun Bun’s Lake
House Summer Camp, with a focus on integrating Trino with lakehouse and data
lake architectures. Trino Summit only wrapped up a little over a month ago,
rounding out the year and highlighting some amazing developments that we’ll be
talking about later in this blog post.&lt;/p&gt;

&lt;p&gt;Trino Fest has historically been the smaller event, but it did some catching up
in 2023, as both Trino Fest and Trino Summit were made virtual and expanded to 2
days this year. Easier to attend than ever before, we reached a combined total
of about 1,200 live attendees, with thousands more views on demand.&lt;/p&gt;

&lt;p&gt;The lineups were packed with 34 talks across both events, featuring speakers
from huge Trino users like Salesforce, Stripe, Apple, and Lyft, as well as from
major Trino contributors like Starburst, Tabular, and Bloomberg. You can
view &lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7wbBu_czq-SS9iVdQ4CIv2z1&quot;&gt;recordings of every Trino Fest talk&lt;/a&gt;
and &lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7wYeJLUjUaEftCFfjymhgLcq&quot;&gt;every Trino Summit talk&lt;/a&gt;
on the Trino YouTube channel if you missed out.&lt;/p&gt;

&lt;h3 id=&quot;meetups-and-international-events&quot;&gt;Meetups and international events&lt;/h3&gt;

&lt;p&gt;One of the more exciting developments was our a major event in Japan -
&lt;a href=&quot;https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023.html&quot;&gt;Trino Conference Tokyo&lt;/a&gt;. 
A virtual event with four sessions, it brought Trino to a Japanese-speaking
audience and further pushed our favorite query engine across language borders.
On top of that,
&lt;a href=&quot;https://www.starburst.io/info/india-trino-meetup-miq/?utm_source=trino&amp;amp;utm_medium=slack&amp;amp;utm_campaign=APAC-FY24-Q4-CM-india-Meetup-at-MiQ-Digital&quot;&gt;Starburst co-hosted a Trino meetup in Bengaluru&lt;/a&gt;, 
and the community organized the first-ever Korean Trino meetup (pictured below).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2023-review/trino-kr-meetup.png&quot; float=&quot;center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And last but not least,
&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino, the Definitive Guide, 2nd Edition&lt;/a&gt;
was translated into Mandarin and Polish.&lt;/p&gt;

&lt;h2 id=&quot;the-trino-gateway&quot;&gt;The Trino Gateway&lt;/h2&gt;

&lt;p&gt;One of the biggest announcements in the Trino community this year was
the &lt;a href=&quot;https://trino.io/blog/2023/09/28/trino-gateway.html&quot;&gt;launch of the Trino Gateway&lt;/a&gt;. A proxy and
load-balancer, it’s a crucial piece of Trino infrastructure for organizations
that need more than one Trino cluster to suit their needs.&lt;/p&gt;

&lt;p&gt;Why would you want more than one Trino cluster? Maybe you want one cluster with
fault-tolerant execution enabled for ETL workloads and another cluster for
speedy ad-hoc analytics. Perhaps you have analysts performing wildly
differently-sized queries, and high-volume compute-intensive queries are proving
to be bad neighbors for lightweight and low-latency queries that shouldn’t take
more than milliseconds. Historically, users would have to manually manage
swapping between clusters, establish a new connection, and try not to get a
headache in the process.&lt;/p&gt;

&lt;p&gt;Enter the Trino Gateway! By routing all of your Trino traffic automatically,
it’s never been easier to manage, maintain, and query multiple Trino clusters at
once. Load balancing ensures that no one cluster gets overworked, and it’s the
perfect way to stop large queries from getting in the way of the little guys.
Add in the fact that you can seamlessly shut down an individual cluster for
updates or maintenance while the Trino Gateway routes traffic elsewhere, and
it’s easy to see why this is such a game-changer. We’re super excited for it to
be out there in the world, and we hope it makes running Trino at the largest
scales simpler and faster than ever before.&lt;/p&gt;

&lt;p&gt;For more information on the Trino Gateway, check out:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2023/09/28/trino-gateway.html&quot;&gt;The announcement blog post&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/quickstart.md&quot;&gt;The quickstart guide&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/tree/main&quot;&gt;The main Trino Gateway repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;new-features&quot;&gt;New features&lt;/h2&gt;

&lt;p&gt;With more development on Trino than ever before, there were obviously a ton of
new things being added to it. Let’s go over some of the biggest adds in 2023.&lt;/p&gt;

&lt;h3 id=&quot;sql-routines&quot;&gt;SQL routines&lt;/h3&gt;

&lt;p&gt;Whether you want to refer to them as SQL routines or as user-defined functions,
they’re a big deal. Fresh off the presses and only a few months old, they do
exactly what you’d expect them to do: you, a user, can define and re-use your
own functions! Define and use them inline as part of a query to make that query
cleaner, easier, and simpler to understand. Or, if you’re really cooking, you
can run a query that defines the routine in the schema of the catalog. This
allows other Trino users to access the same routine time and time again as part
of their other queries. It’s a level of customization that we’ve never had
before in Trino, and no longer do you need to write your own Java plugins to
create and re-use functions that do exactly what you need them to do.&lt;/p&gt;

&lt;p&gt;If you want to learn more about SQL routines, you can check
out &lt;a href=&quot;/docs/current/routines/introduction.html&quot;&gt;the introduction to SQL routines&lt;/a&gt;
in our documentation, as well as
&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;list=PLFnr63che7wYzZoo5yyEF5R1QrOH6VRq3&amp;amp;index=4&quot;&gt;a video from our SQL training series&lt;/a&gt;
and a few &lt;a href=&quot;/docs/current/routines/examples.html&quot;&gt;example routines&lt;/a&gt; which give a
good look at how they can be used.&lt;/p&gt;

&lt;h3 id=&quot;schema-evolution-and-dynamic-catalogs&quot;&gt;Schema evolution and dynamic catalogs&lt;/h3&gt;

&lt;p&gt;While we’re providing more power, customization, and flexibility to Trino users,
it’s also important to highlight just how much has been added this year to make
it easier to adjust things on the fly.&lt;/p&gt;

&lt;p&gt;Schema evolution in Hive was a big addition, allowing you to alter columns’ data
types, rename columns, and handle nested fields when dropping columns. Instead
of needing to use the underlying database or modify it some other way and reboot
Trino, Trino can handle the adjustments on the fly.&lt;/p&gt;

&lt;p&gt;But if you don’t use Hive and are feeling left out, we’ve experimentally taken
things one step further in 2023, adding dynamic catalogs to Trino. Rather than
adjusting your schema one column at a time, what about adding or dropping an
entire catalog in one go? You can do that now. Though it’s currently still
bleeding-edge and not ready for widespread use on your important production
data sources, we’re looking forward to improving it and making it resilient and
stable in 2024.&lt;/p&gt;

&lt;h3 id=&quot;project-hummingbird&quot;&gt;Project Hummingbird&lt;/h3&gt;

&lt;p&gt;Trino has always been about squeezing out every ounce of performance that you
can get. Check out our &lt;a href=&quot;/docs/current/release.html&quot;&gt;release notes&lt;/a&gt; and
you’ll see that every version includes at least a couple performance
improvements. Over time, these performance improvements add up to a substantial
gain, meaning that version-over-version, year-over-year, Trino is always getting
faster. Project Hummingbird was a concerted effort this year to take a look at
the core engine and make a number of architectural changes paired with small
improvements that would add up to something very substantial.
&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;The GitHub issue tracking it&lt;/a&gt;
lists a ton of work that’s been accomplished already, with a lot of that work
done in 2023. Though stay tuned for more, because that’s only scratching the
surface…&lt;/p&gt;

&lt;h3 id=&quot;lakehouse-improvements&quot;&gt;Lakehouse improvements&lt;/h3&gt;

&lt;p&gt;Want to leverage the historical log of all actions taken on a table in Hudi? The
new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$timeline&lt;/code&gt; system table has you covered. How about in Delta Lake? We’ve got
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table_changes&lt;/code&gt; function for that, and views were added there, too. Too many
metadata tables to list were added to Iceberg, along with the REST, JDBC, and
Nessie catalogs for metadata.&lt;/p&gt;

&lt;h3 id=&quot;java-21&quot;&gt;Java 21!&lt;/h3&gt;

&lt;p&gt;Java 21. It’s required to run version Trino versions 436 and later. With
&lt;a href=&quot;https://trino.io/blog/2023/11/03/java-21.html&quot;&gt;the upgrade from Java 17 to 21&lt;/a&gt;
comes a ton of improvements that will make development on Trino easier and
better than ever, which will in turn make it faster and smoother than ever.
Though not as huge of a deal as our upgrade to Java 17 last year, expect to see
the benefits coming down the pipeline as the engineers working on Trino are able
to take advantage of the latest and greatest features in Java.&lt;/p&gt;

&lt;h2 id=&quot;trino-ecosystem-updates&quot;&gt;Trino ecosystem updates&lt;/h2&gt;

&lt;p&gt;There’s more to Trino than Trino itself! With community updates and other
technologies integrating with Trino, the number of ways you can access and use
Trino are always growing. And the number of people taking care of Trino is
growing, too.&lt;/p&gt;

&lt;h3 id=&quot;python-clients&quot;&gt;Python clients&lt;/h3&gt;

&lt;p&gt;Trino’s own &lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Python client&lt;/a&gt; saw
heavy development in 2023. It was updated to support SQLAlchemy 2.0 and had type
support fully fleshed out, making it a robust, free, and open-source tool for
running your Trino queries.&lt;/p&gt;

&lt;p&gt;Elsewhere in the Python ecosystem, we heard from
both &lt;a href=&quot;https://youtu.be/aKhI1Phfn-o&quot;&gt;Fugue&lt;/a&gt;
and &lt;a href=&quot;https://youtu.be/JMUtPl-cMRc&quot;&gt;Ibis&lt;/a&gt; at Trino Fest, two different Python
clients that integrate Trino with Python in new ways. Fugue is a wrapper that
helps integrate with other Python tools and clients, and Ibis can help convert
your Python code into SQL queries, making it feasible to be a 100% Python-based
organization that still leverages the speed and power of a SQL query engine like
Trino. We had Phillip Cloud from Voltron Data on
for &lt;a href=&quot;/episodes/49&quot;&gt;an episode of the Trino Community Broadcast&lt;/a&gt; to talk about
Ibis in even more detail.&lt;/p&gt;

&lt;h3 id=&quot;and-other-clients-too&quot;&gt;And other clients, too!&lt;/h3&gt;

&lt;p&gt;Also on the Trino Community Broadcast repping new client support for Trino in
2023 were &lt;a href=&quot;/episodes/45&quot;&gt;Dolphin Scheduler&lt;/a&gt;, &lt;a href=&quot;/episodes/51&quot;&gt;PopSQL&lt;/a&gt;,
and &lt;a href=&quot;/episodes/53&quot;&gt;Coginiti&lt;/a&gt;. Dolphin Scheduler is a workflow orchestrator - and
scheduler! - that can be used to routinely run and coordinate Trino queries.
PopSQL is like Google Drive for SQL, providing a suite of collaborative tools
for editing and working on queries as a team, including synchronous query
editing, storing query history, and a robust commenting and feedback system.
Coginiti is a high-powered data workspace that connects to Trino among many
other things, supporting a host of powerful features that make it easier to
reuse code and snippets of queries, as well as featuring embedded variables to
minimize redundancy. If you want to learn more about any of these clients, click
in on the links above to check out the Trino Community Broadcast where we went
in-depth with them!&lt;/p&gt;

&lt;p&gt;Oh, and don’t forget
the &lt;a href=&quot;https://regadas.dev/trino-js-client/&quot;&gt;Trino Typescript client&lt;/a&gt;, for when
you want to work at the beautiful intersection of web development and accessing
tons of data.&lt;/p&gt;

&lt;h3 id=&quot;new-maintainers&quot;&gt;New maintainers&lt;/h3&gt;

&lt;p&gt;Trino saw three new maintainers added to its ranks this year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/mosabua&quot;&gt;Manfred Moser&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pettyjamesm&quot;&gt;James Petty&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred even took the liberty of updating the website’s
&lt;a href=&quot;/development/roles&quot;&gt;roles page&lt;/a&gt; to list out all our maintainers. Thank you to
them for their dedication to making Trino the best it can be, and
congratulations to them on their shiny maintainer titles!&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects.html&quot;&gt;2022 had been the busiest year in Trino’s history&lt;/a&gt;,
but 2023 has managed to surpass it. If you’re interested in contributing to
Trino, make sure to check it out on &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;GitHub&lt;/a&gt;.
Even if you’re not interested in contributing, give us a
&lt;a href=&quot;https://trino.io/star&quot;&gt;star&lt;/a&gt; on GitHub, anyway! It’s been a great year for
Commander Bun Bun, and we can’t wait to show you what 2024 has in store for
everyone’s favorite data rabbit.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>If “Wrapped” is good enough for Spotify, it’s good enough for Trino, right? As we look forward to a bright 2024, we can also take a moment to get sentimental, look back at everything we’ve accomplished, and reflect on the progress we’ve made. Commander Bun Bun has been hard at work, so if you haven’t been paying close attention to Trino or want an idea of all that went down in 2023, we’re happy to present you with an end of year recap. We’ll be exploring what’s gone on in the community, on development, the events we’ve hosted, and discuss the cool new features and technologies you can use when you’re running Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/2023-review/wrapped.png" />
      
    </entry>
  
    <entry>
      <title>55: Commander Bun Bun peeks at Peaka</title>
      <link href="https://trino.io/episodes/55.html" rel="alternate" type="text/html" title="55: Commander Bun Bun peeks at Peaka" />
      <published>2024-01-18T00:00:00+00:00</published>
      <updated>2024-01-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/55</id>
      <content type="html" xml:base="https://trino.io/episodes/55.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://linkedin.com/in/sakalsiz&quot;&gt;Mustafa Sakalsiz&lt;/a&gt;, CEO at
&lt;a href=&quot;https://www.peaka.com/&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/alitekin/&quot;&gt;Ali Tekin&lt;/a&gt;, Principal Software
Architect at &lt;a href=&quot;https://www.peaka.com/&quot;&gt;Peaka&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-437-438&quot;&gt;Releases 437-438&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-437.html&quot;&gt;Trino 437&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for configuring compression codecs&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;char&lt;/code&gt; values in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_utf8()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lpad()&lt;/code&gt; functions&lt;/li&gt;
  &lt;li&gt;Improved performance for Delta Lake queries without table statistics&lt;/li&gt;
  &lt;li&gt;Improved performance for Iceberg queries with filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-438.html&quot;&gt;Trino 438&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for access control with &lt;a href=&quot;https://trino.io/blog/2024/02/06/opa-arrived&quot;&gt;Open Policy Agent&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER COLUMN ... DROP NOT NULL&lt;/code&gt; in Iceberg and PostgreSQL&lt;/li&gt;
  &lt;li&gt;Support for configuring page sizes in Delta Lake, Hive, and Iceberg&lt;/li&gt;
  &lt;li&gt;Better type support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reduce_agg()&lt;/code&gt; function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And over in the land of the Trino Gateway…&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/release-notes.md#trino-gateway-5-24-jan-2024&quot;&gt;Trino Gateway version 5&lt;/a&gt;
released!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-peaka&quot;&gt;Concept of the episode: Peaka&lt;/h2&gt;

&lt;p&gt;Another Trino Community Broadcast episode means another cool piece of technology
that uses Trino for us to show off to the community. This time it’s Peaka,
a no-code approach to date warehousing that makes it easier than ever to set up
your data stack without needing a ton of complex engineering.&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;https://www.peaka.com/docs/getting-started/what-is-peaka/&quot;&gt;their own words&lt;/a&gt;,
Peaka is a platform that merges disparate data sources into a single data layer,
letting you join and blend them, query them using SQL or natural language, and 
expose your data to outside users through APIs. Sounds a bit like Trino, right?
That’s because underneath the hood, Trino is a key part of how they’re making it
happen. In this episode, we talk to the team at Peaka about where they got
started, how they’re making it easier than ever to leverage the federation that
Trino is capable of, and the work they’ve done on top to integrate their
platform with every SaaS data source under the sun.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-using-peaka&quot;&gt;Demo of the episode: Using Peaka!&lt;/h2&gt;

&lt;p&gt;If you want to see what the platform is like, then look no further. We’ll be
exploring:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Connecting to data sources&lt;/li&gt;
  &lt;li&gt;Filtering and combining data&lt;/li&gt;
  &lt;li&gt;Editing and running queries, including their visual query editor&lt;/li&gt;
  &lt;li&gt;Natural language queries&lt;/li&gt;
  &lt;li&gt;Visualizing data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-18719-filesystem-caching-with-alluxio&quot;&gt;PR of the episode: #18719: Filesystem caching with Alluxio&lt;/h2&gt;

&lt;p&gt;Perhaps it’s a little easier to link to the issue for tracking
&lt;a href=&quot;https://github.com/trinodb/trino/issues/20550&quot;&gt;the rollout&lt;/a&gt;, but however you
want to present it, caching in Trino is renewed! Caching is a huge performance win
for a wide variety of use cases, allowing the engine to run faster, better, and
pump out query results at an unparalleled pace. This is going to lead to 
performance improvements for Trino queries using the supported object storage 
connectors, and you’ll hear more from us about it once it’s officially launched.
The best part is that there’s even more coming down the line as support for it
is expanded.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>54: Trino 2023 wrapped</title>
      <link href="https://trino.io/episodes/54.html" rel="alternate" type="text/html" title="54: Trino 2023 wrapped" />
      <published>2024-01-18T00:00:00+00:00</published>
      <updated>2024-01-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/54</id>
      <content type="html" xml:base="https://trino.io/episodes/54.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of
Technical Content at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt;, Trino co-creator and CTO at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-434-436&quot;&gt;Releases 434-436&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-434.html&quot;&gt;Trino 434&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FILTER&lt;/code&gt; clause to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LISTAGG&lt;/code&gt; function&lt;/li&gt;
  &lt;li&gt;Support reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; columns and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; statements in BigQuery connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-435.html&quot;&gt;Trino 435&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON_TABLE&lt;/code&gt; function&lt;/li&gt;
  &lt;li&gt;Improve reliability when reading from GCS&lt;/li&gt;
  &lt;li&gt;Improve query planning performance on Delta Lake tables&lt;/li&gt;
  &lt;li&gt;Improve reliability and memory usage for inserts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-436.html&quot;&gt;Trino 436&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for Elasticsearch 8&lt;/li&gt;
  &lt;li&gt;New OpenSearch connector&lt;/li&gt;
  &lt;li&gt;Faster selective joins on partition columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional comments:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Disallow invalid configuration options with Delta Lake and Iceberg connector in 434&lt;/li&gt;
  &lt;li&gt;Separate metadata caching in numerous connectors&lt;/li&gt;
  &lt;li&gt;Various improvements for schema evolution in Hive connector&lt;/li&gt;
  &lt;li&gt;Require JDK 21.0.1 to run Trino with 436&lt;/li&gt;
  &lt;li&gt;Remove support of Elasticsearch 6 in 436&lt;/li&gt;
  &lt;li&gt;Fix minor issues for SQL routine and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON_TABLE&lt;/code&gt; function users&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;recap-of-trino-in-2023&quot;&gt;Recap of Trino in 2023&lt;/h2&gt;

&lt;p&gt;We chat about all the developments in the Trino project and the Trino community
from 2023, including the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Various statistics about the project&lt;/li&gt;
  &lt;li&gt;Features and releases&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest&lt;/a&gt;, &lt;a href=&quot;/blog/2023/12/18/trino-summit-recap.html&quot;&gt;Trino
Summit&lt;/a&gt;, and other events&lt;/li&gt;
  &lt;li&gt;New Trino maintainers&lt;/li&gt;
  &lt;li&gt;Polish and Chinese editions of definitive guide published&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Find more details and other topics in our &lt;a href=&quot;/blog/2024/01/19/trino-2023-wrapped.html&quot;&gt;blog post &lt;strong&gt;Trino 2023 wrapped&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Upcoming events in NYC and Vienna, details available in the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;events
calendar&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Trino Contributor Congregation coming soon&lt;/li&gt;
  &lt;li&gt;Trino Gateway developer sync every two week, ping Manfred for invite&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from O’Reilly.
You can download &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF from
Starburst&lt;/a&gt; or &lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;buy the book
online&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino Summit 2023 recap</title>
      <link href="https://trino.io/blog/2023/12/18/trino-summit-recap.html" rel="alternate" type="text/html" title="Trino Summit 2023 recap" />
      <published>2023-12-18T00:00:00+00:00</published>
      <updated>2023-12-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/12/18/trino-summit-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/12/18/trino-summit-recap.html">&lt;p&gt;Two days of non-stop Trino action are done! Last week, Trino Summit 2023
took place virtually another great community event. Great presentations from Trino
experts across the globe showed different use cases and experiences with Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;During the event, our lively audience of over 600 attendees asked questions from
the speakers and each other on chat, and we had fun with Trino trivia questions.&lt;/p&gt;

&lt;p&gt;We talked about the &lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;SQL routine competition&lt;/a&gt; and announced Kevin Liu from Stripe and Jan Was from Starburst as the
winners. You can find their submissions in &lt;a href=&quot;https://trino.io/docs/current/routines/examples.html&quot; target=&quot;_blank&quot;&gt;the examples page for SQL
routines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Starburst announced their &lt;a href=&quot;https://www.starburst.io/community/trino-champions/&quot; target=&quot;_blank&quot;&gt;Trino Champions
program&lt;/a&gt;.
Kevin and Jan are the first recipients of the award and will receive their swag
packs soon. Going forward, new champions will be crowned regularly, and
Starburst is &lt;a href=&quot;https://www.starburst.io/community/trino-champions/&quot; target=&quot;_blank&quot;&gt;looking for
nominations&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;If you missed out on the event, the following list of all the sessions provides
links to the recordings. Over time, we will follow up with blog posts about each
session with the presentation and further details.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=pXdZqpwgdxA&quot; target=&quot;_blank&quot;&gt;The mountains Trino climbed in 2023&lt;/a&gt;
presented by Martin Traverso from
&lt;a href=&quot;https://www.starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/mountains-trino-climbed.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=qZejzyxT2fo&quot; target=&quot;_blank&quot;&gt;Trino workload management&lt;/a&gt;
presented by Jinyang Li and Tingting Ma from
&lt;a href=&quot;https://www.airbnb.ccom&quot; target=&quot;_blank&quot;&gt;Airbnb&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=FaytoXxKXOQ&quot; target=&quot;_blank&quot;&gt;Secure exchange SQL: Building a privacy-preserving data clean room service over Trino&lt;/a&gt;
presented by Taro Saito from
&lt;a href=&quot;https://www.treasuredata.com/&quot; target=&quot;_blank&quot;&gt;Treasure Data&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=MYLepz-hIys&quot; target=&quot;_blank&quot;&gt;Powering Bazaar`s business operation using Trino&lt;/a&gt;
presented by Umair Abro from
&lt;a href=&quot;https://www.youtube.com/watch?v=MYLepz-hIys&quot; target=&quot;_blank&quot;&gt;Bazaar&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/powering-bazaar-business-operations.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=qUT-uaEE-Fk&quot; target=&quot;_blank&quot;&gt;Efficient Kappa architecture with Trino&lt;/a&gt;
presented by Sanghyun Lee at
&lt;a href=&quot;https://www.sktelecom.com&quot; target=&quot;_blank&quot;&gt;SK Telecom&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/efficient-kappa-architecture-sk-telecom.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=2qwBcKmQSn0&quot; target=&quot;_blank&quot;&gt;Many clusters and only one gateway&lt;/a&gt;
presented by Will Morrison (&lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;),
Andy Su (&lt;a href=&quot;https://www.techatbloomberg.com/&quot; target=&quot;_blank&quot;&gt;Bloomberg&lt;/a&gt;), and
Jaeho Yoo (&lt;a href=&quot;https://www.naver.com&quot; target=&quot;_blank&quot;&gt;Naver&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=dg16M6bFN2w&quot; target=&quot;_blank&quot;&gt;Trino upgrade at exabytes scale&lt;/a&gt;
presented by Ramanathan Ramu from
&lt;a href=&quot;https://www.linkedin.com/&quot; target=&quot;_blank&quot;&gt;LinkedIn&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ooUGJ6BYt90&quot; target=&quot;_blank&quot;&gt;Powering data marts through Trino Iceberg connector at Zomato&lt;/a&gt;
presented by Shubham Gupta and Bhanu Mittal from
&lt;a href=&quot;https://www.zomato.com/&quot; target=&quot;_blank&quot;&gt;Zomato&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/powering-data-marts-at-zomato.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=RC8K6pIvAtI&quot; target=&quot;_blank&quot;&gt;Pinterest journey to achieving 2x efficiency improvement on Trino&lt;/a&gt;
presented by Carlos Benavides from
&lt;a href=&quot;https://www.pinterest.com/&quot; target=&quot;_blank&quot;&gt;Pinterest&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=6KspMwCbOfI&quot; target=&quot;_blank&quot;&gt;Avoiding pitfalls with query federation in data lakehouses&lt;/a&gt;
presented by  Edward Morgan and
Bhaarat Sharma from &lt;a href=&quot;https://teamraft.com/&quot; target=&quot;_blank&quot;&gt;Raft&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=rmotnvBWXv4&quot; target=&quot;_blank&quot;&gt;Adopting Trino’s fault-tolerant execution mode at Quora&lt;/a&gt;
presented by Gabriel Fernandes de Oliveira and Yifan Pan from
&lt;a href=&quot;https://www.quora.com/&quot; target=&quot;_blank&quot;&gt;Quora&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/fte-mode-at-quora.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=fYCoI8kkdRQ&quot; target=&quot;_blank&quot;&gt;Inherent race condition in Guava Cache invalidation and how to escape it&lt;/a&gt;
presented by Piotr Findeisen from
&lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/inherent-race-in-cache-invalidation.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=LynEiteEtPk&quot; target=&quot;_blank&quot;&gt;Unstructured data analysis using polymorphic table function in Trino&lt;/a&gt;
presented by YongHwan Lee from
&lt;a href=&quot;https://www.sktelecom.com&quot; target=&quot;_blank&quot;&gt;SK Telecom&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/polymorphic-table-function-sk-telecom.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=_wocf0NK6Kc&quot; target=&quot;_blank&quot;&gt;Transitioning to Trino: Evaluating Lyft’s query engine capabilities&lt;/a&gt;
presented by Charles Song from
&lt;a href=&quot;https://www.lyft.com/&quot; target=&quot;_blank&quot;&gt;Lyft&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/transition-to-trino-at-lyft.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=idk0GMxs8vE&quot; target=&quot;_blank&quot;&gt;Visualizing Trino with Apache Superset&lt;/a&gt;
presented by Evan Rusackas from
&lt;a href=&quot;https://preset.io/&quot; target=&quot;_blank&quot;&gt;Preset&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=fbqqapQbAv0&quot; target=&quot;_blank&quot;&gt;Trino OPA authorizer - An open source love story&lt;/a&gt;
presented by Sönke Liebau (&lt;a href=&quot;https://stackable.tech/&quot; target=&quot;_blank&quot;&gt;Stackable&lt;/a&gt;)
and Pablo Arteaga (&lt;a href=&quot;https://www.techatbloomberg.com/&quot; target=&quot;_blank&quot;&gt;Bloomberg&lt;/a&gt;).
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/opa-trino.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=RutbCY8i22Q&quot; target=&quot;_blank&quot;&gt;VAST database catalog&lt;/a&gt;
presented by Jason Russler from
&lt;a href=&quot;https://vastdata.com/&quot; target=&quot;_blank&quot;&gt;VAST&lt;/a&gt;.
&lt;a href=&quot;https://trino.io/assets/blog/trino-summit-2023/vast-connector.pdf&quot;&gt;(Slides)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ZJExdGeC4eA&quot; target=&quot;_blank&quot;&gt;Support for Parquet decryption and aggregate pushdown In Trino&lt;/a&gt;
presented by Amogh Margoor and Manish Malhotra from
&lt;a href=&quot;https://www.apple.com/&quot; target=&quot;_blank&quot;&gt;Apple&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;shout-outs&quot;&gt;Shout outs&lt;/h2&gt;

&lt;p&gt;Shout outs for all their work with the speakers and organizing the event go to
Anna Schibli, Mandy Darnell, and Monica Miller from the Trino Summit event team,
and everyone else at Starburst who helped make this event a success.&lt;/p&gt;

&lt;p&gt;Special thanks for making this Trino Software Foundation event a reality go out
to our hosting sponsor &lt;a href=&quot;https://starburst.io&quot; target=&quot;_blank&quot;&gt;Starburst&lt;/a&gt;, and
our other sponsors &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;Alluxio&lt;/a&gt;,
&lt;a href=&quot;https://www.coginiti.co&quot; target=&quot;_blank&quot;&gt;Coginiti&lt;/a&gt; and &lt;a href=&quot;https://www.montecarlodata.com/&quot; target=&quot;_blank&quot;&gt;Monte
Carlo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We will see you all at future Trino Contributor Congregations, Trino Fest 2024,
Trino Summit 2024, and &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;other events related to Trino&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;sponsors&quot;&gt;Sponsors&lt;/h2&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst.png&quot; title=&quot;Starburst, event host and organizer&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.coginiti.co&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/coginiti-small.png&quot; title=&quot;Coginiti, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.montecarlodata.com/&quot; target=&quot;_blank&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/monte-carlo-small.png&quot; title=&quot;Monte Carlo, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden</name>
        </author>
      

      <summary>Two days of non-stop Trino action are done! Last week, Trino Summit 2023 took place virtually another great community event. Great presentations from Trino experts across the globe showed different use cases and experiences with Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Final reminder for Trino Summit 2023</title>
      <link href="https://trino.io/blog/2023/12/11/trino-summit-reminder.html" rel="alternate" type="text/html" title="Final reminder for Trino Summit 2023" />
      <published>2023-12-11T00:00:00+00:00</published>
      <updated>2023-12-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/12/11/trino-summit-reminder</id>
      <content type="html" xml:base="https://trino.io/blog/2023/12/11/trino-summit-reminder.html">&lt;p&gt;Are you ready? &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=final-reg-blog&quot;&gt;Trino Summit
2023&lt;/a&gt;
is just two days away, and our lineup of speakers, sponsors, and activities is
truly amazing. Make sure to register and join us live.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Over the two days of the event we will enjoy sessions with our speakers from
numerous well-known and respected companies, including Airbnb, Apple, Bloomberg,
LinkedIn, Pinterest, SK Telecom, and others. Look at the &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=final-reg-blog&quot;&gt;full lineup for
details&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Just like &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;last time at Trino Fest 2023&lt;/a&gt; we will have some fun Trino quiz
questions for you all to puzzle over, and are ready to reward your fast and
correct answers.&lt;/p&gt;

&lt;p&gt;Cole Bowden and I will guide you through the two days of the event as hosts. The
chat on the event platform as well as the Trino slack channel for the event will
allow you to talk to other community members and the presenters, ask questions,
and follow up for more answers and discussions.&lt;/p&gt;

&lt;p&gt;We will announce the winning entries for our SQL routine competition and look a
bit at the implementation. And if you are keen to write one, there is still have
time to share your best SQL routine. You might be among the winners.&lt;/p&gt;

&lt;p&gt;So you see - Trino Summit 2023 will be great. The event is virtual and free, so
there really is no excuse for missing out:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=final-reg-blog&quot;&gt;
        Register for Trino Summit 2023
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Special thanks for their help with making this Trino Software Foundation event a
reality go out to our hosting sponsor &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;, and our
other sponsors &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;Alluxio&lt;/a&gt;,
&lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt; and &lt;a href=&quot;https://www.montecarlodata.com/&quot;&gt;Monte
Carlo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We all look forward to see you in just two days. So exciting!&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Are you ready? Trino Summit 2023 is just two days away, and our lineup of speakers, sponsors, and activities is truly amazing. Make sure to register and join us live.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Functions with SQL and Trino</title>
      <link href="https://trino.io/blog/2023/11/29/sql-training-4.html" rel="alternate" type="text/html" title="Functions with SQL and Trino" />
      <published>2023-11-29T00:00:00+00:00</published>
      <updated>2023-11-29T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/29/sql-training-4</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/29/sql-training-4.html">&lt;p&gt;In the fourth part of our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the
experts&lt;/a&gt; Martin Traverso, Dain
Sundstrom and I took on the big topic of aggregation functions, and covered the
two new and exciting features of table functions and SQL routines.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/1siAYR6BzzY&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Following are a couple of specific timestamps for interesting topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=582&quot;&gt;First simple aggregation example&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=2384&quot;&gt;Table functions introduction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=3093&quot;&gt;Query pass through table function&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=3442&quot;&gt;SQL routine use cases&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&amp;amp;t=4355&quot;&gt;Human readable days example&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More timestamps for every part of the talk are in the description on
YouTube. Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the
series&lt;/a&gt;,
with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community
chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-functions/index.html&quot;&gt;Functions with SQL and
Trino&lt;/a&gt;,
including files with all SQL statements, configurations and more ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this last episode of the series for 2023 we are ready to showcase Trino
with an &lt;a href=&quot;/blog/2023/11/22/trino-summit-2023-nears-lineup.html&quot;&gt;amazing lineup of speakers and sessions&lt;/a&gt; at the upcoming Trino Summit 2023.
Register now and catch all the presenters live for questions in the chat:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register for Trino Summit 2023
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you at Trino Summit 2023, upcoming &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community Broadcast
episodes&lt;/a&gt;, and maybe even more SQL
training in 2024.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In the fourth part of our training series Learning SQL with Trino from the experts Martin Traverso, Dain Sundstrom and I took on the big topic of aggregation functions, and covered the two new and exciting features of table functions and SQL routines.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2023 nears with an awesome lineup</title>
      <link href="https://trino.io/blog/2023/11/22/trino-summit-2023-nears-lineup.html" rel="alternate" type="text/html" title="Trino Summit 2023 nears with an awesome lineup" />
      <published>2023-11-22T00:00:00+00:00</published>
      <updated>2023-11-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/22/trino-summit-2023-nears-lineup</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/22/trino-summit-2023-nears-lineup.html">&lt;p&gt;As winter nears, the days may be getting shorter, but so is the wait until
Trino Summit 2023! It’ll be here before you know it on December 13th and 14th.
We’ve got a packed speaker lineup full of exciting talks, and we’re ready to
share some details with the Trino community today. Read on for a preview of some
talks, and if you’re interested in attending, make sure to…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-lineup-announcement&quot;&gt;
        Register!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;So, who’s going to be talking at Trino Summit? Here’s a quick rundown of the
talks coming in from various companies.&lt;/p&gt;

&lt;h2 id=&quot;starburst-the-mountains-trino-climbed-in-2023&quot;&gt;Starburst: The mountains Trino climbed in 2023&lt;/h2&gt;

&lt;p&gt;As always, our keynote will come from Martin Traverso, Trino co-founder and
co-CTO at Starburst. He’ll be giving a project update on everything exciting
that’s happened in Trino since
&lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest&lt;/a&gt;, as well as a
sneak peek at the roadmap for features coming to Trino in 2024. It’s one of the
best ways to keep up with the ongoing developments in the Trino community, and
you won’t want to miss it.&lt;/p&gt;

&lt;h2 id=&quot;starburst-bloomberg-and-naver-many-clusters-and-only-one-gateway&quot;&gt;Starburst, Bloomberg, and Naver: Many clusters and only one gateway&lt;/h2&gt;

&lt;p&gt;A second talk, which is a collaboration among Starburst, Bloomberg, and Naver,
will be exploring the new &lt;a href=&quot;https://github.com/trinodb/trino-gateway&quot;&gt;Trino Gateway&lt;/a&gt;,
a proxy and load-balancer that has been in the works for a long while in the
Trino community. There’s no more need to worry about noisy neighbors or huge
queries bullying out the quick and small workloads - with multiple clusters and
the Trino Gateway on top, users interact with Trino like normal, but under the
hood, queries get routed to available clusters to ensure that the time it takes
to get your insights are shorter than ever before.&lt;/p&gt;

&lt;h2 id=&quot;airbnb-trino-workload-management&quot;&gt;Airbnb: Trino workload management&lt;/h2&gt;

&lt;p&gt;Trino is the main interactive compute engine for offline ad-hoc analytics at
Airbnb. Recently, they’ve redesigned their query workload processing on Trino
clusters, introducing query cost forecasting and workload awareness scheduling
systems. This helps them deliver a more stable and consistent analytics query
service to offline data users at Airbnb, with improved performance and speed.
And they’ll be explaining how they did it!&lt;/p&gt;

&lt;h2 id=&quot;pinterest-journey-to-achieving-2x-efficiency-improvement-on-trino&quot;&gt;Pinterest: Journey to achieving 2x efficiency improvement on Trino&lt;/h2&gt;

&lt;p&gt;Trino usage has been growing at Pinterest each year, which comes with growing
costs and increased demand on the existing Trino clusters. To help reduce costs
and serve their Trino users, the engineering team there has migrated to AWS
Graviton, taken advantage of Trino improvements, consolidated traffic, improved
job scheduling, and worked to optimize their data and metadata formats. The end
result has been a reduction in cost &lt;em&gt;and&lt;/em&gt; an increase in query throughput.
They’ll be sharing the details on the effort it took to make Trino faster and
cheaper at the same time.&lt;/p&gt;

&lt;h2 id=&quot;quora-adopting-trinos-fault-tolerant-execution-mode&quot;&gt;Quora: Adopting Trino’s fault-tolerant execution mode&lt;/h2&gt;

&lt;p&gt;Quora will be covering how they adopted Trino’s fault-tolerant execution mode
to run some of their heaviest ETL jobs. They separate Trino queries
from their main data pipelines in two clusters, one running the FTE mode for
memory-intensive and longer jobs and another without it for lighter, general
pipelines. This separation helped achieve better query failure rates, improved
the execution time of long queries due to the more flexible autoscaling in
FTE, and provided an alternative to run queries that would otherwise run out of
memory without scaling up the cluster.&lt;/p&gt;

&lt;h2 id=&quot;linkedin-trino-upgrades-at-exabyte-scale&quot;&gt;LinkedIn: Trino upgrades at exabyte scale&lt;/h2&gt;

&lt;p&gt;LinkedIn has been keeping up with Trino releases at an impressive rate, but
getting to that point has required a lot of time, effort, and work on
streamlining the update process. They’ll be discussing the challenges of
breaking changes, applying internal patches, and ensuring that there are no
meaningful performance regressions. They’ve automated much of this, including
implementing a post-commit integration test suite that ensures nothing has
broken, and creating an automated test framework that can validate the
performance of each new Trino release before it deploys to users.&lt;/p&gt;

&lt;h2 id=&quot;ea-migrating-120-million-hms-metadata-records-without-customer-impact&quot;&gt;EA: Migrating 120 million HMS metadata records without customer impact&lt;/h2&gt;

&lt;p&gt;Migrating production databases is a scary task no matter who you are. It’s
scarier when you’re talking about 600+ databases, 35,000+ tables, and over 120
million partitions, all of which you need to migrate while avoiding any customer
impact. EA managed to pull it off with the help of Trino, and they’ll be at
Trino Summit to share how they made it work and what they learned along the way.&lt;/p&gt;

&lt;h2 id=&quot;sk-telecom-efficient-kappa-architecture-with-trino&quot;&gt;SK Telecom: Efficient Kappa architecture with Trino&lt;/h2&gt;

&lt;p&gt;SK Telecom is bringing us two talks this year, as they’ve got a lot going on and
some unique Trino stories to share!&lt;/p&gt;

&lt;p&gt;The first talk will dive into Kappa architecture and the challenges
involved in getting it to run in real-time at the massive scale SK Telecom
needs. They started with Trino’s Kafka connector, but the limitations of that
architecture steered them towards a solution with Flink and Trino’s Iceberg
connector, which they’ll explain. They’ll also be sharing some tips and tricks
for tuning Flink and Iceberg to get the most out of your Trino deployments.&lt;/p&gt;

&lt;h2 id=&quot;sk-telecom-unstructured-data-analysis-using-polymorphic-table-functions-in-trino&quot;&gt;SK Telecom: Unstructured data analysis using polymorphic table functions in Trino&lt;/h2&gt;

&lt;p&gt;The second talk will discuss the challenges of dealing with unstructured data.
Pre-processing is essential for analyzing unstructured data, and it’s difficult
for ordinary users and analysts to distribute large amounts of unstructured
data. With the power of a custom-built polymorphic table function,
they were able to invoke Python code within Trino to help structure that data
for analysis, solving the problem in a powerful and fascinating way. We’ll get
to hear about polymorphic table functions, how they work in Trino, and how
anyone else may be able to leverage them to solve problems.&lt;/p&gt;

&lt;h2 id=&quot;raft-avoiding-pitfalls-with-query-federation-in-data-lakehouses&quot;&gt;Raft: Avoiding pitfalls with query federation in data lakehouses&lt;/h2&gt;

&lt;p&gt;Raft has partnered with the US Department of Defense to build a data fabric that
is built on top of Delta Lake, Trino, Apache Kafka, and Open Policy Agent (OPA).
This talk will discuss the challenges involved, provide solutions and
considerations for each, and end with a demo of Raft’s data fabric. The talk
will focus on a plugin for Trino, developed by Raft, that uses OPA as a policy
engine to provide fine-grained access control at query time based on a user’s
JWT passed along with the query.&lt;/p&gt;

&lt;h2 id=&quot;treasure-data-secure-exchange-sql&quot;&gt;Treasure Data: Secure exchange SQL&lt;/h2&gt;

&lt;p&gt;Secure Exchange SQL is a production data clean room service deployed at Treasure
Data, which leverages Trino and differential privacy technology to enable
cross-company data analysis while mitigating the risk of privacy breaches.
In their session, they’ll introduce the concept of differential privacy and
discuss the privacy protection methods that need to be implemented during SQL
processing. To minimize changes to Trino’s codebase, they employed approaches of
SQL rewriting and validation at the logical plan level. They’ll explain these
methods and provide some practical use cases of their data clean room.&lt;/p&gt;

&lt;h2 id=&quot;zomato-powering-data-marts-through-the-trino-iceberg-connector&quot;&gt;Zomato: Powering data marts through the Trino Iceberg connector&lt;/h2&gt;

&lt;p&gt;It’s a common theme in the Trino community - Zomato recently migrated from a
traditional data warehouse to a Trino-powered data lakehouse in conjunction with
Iceberg. They’ll be discussing how this has enabled their analytics to run
better than ever, including periodic updates to their data marts and tackling
the challenges involved in maintaining Iceberg tables.&lt;/p&gt;

&lt;h2 id=&quot;bazaar-powering-bazaars-business-operations-using-trino&quot;&gt;Bazaar: Powering Bazaar`s business operations using Trino&lt;/h2&gt;

&lt;p&gt;Bazaar’s talk will discuss how they leverage Trino’s capabilities to optimize
data analysis and support data-driven decision-making. The talk specifically
explores including real-time data querying across multiple sources and
performance optimization, illustrating Trino’s role in Bazaar’s data-centric
strategies. This presentation provides in-depth insights for individuals
well-versed in Trino, shedding light on the platform’s transformative impact on
enhancing e-commerce operations.&lt;/p&gt;

&lt;h2 id=&quot;preset-visualizing-trino-with-superset&quot;&gt;Preset: Visualizing Trino with Superset&lt;/h2&gt;

&lt;p&gt;Preset will be diving into the “last mile” of the modern data stack and
show you how to query and visualize data pulled from Trino with Apache Superset
and/or Preset. Specifically, they’ll discuss things like Trino’s federated query
support (a common wish for Superset users) and how Superset can support
near-real-time analytics for Trino users. They’ll also give a demo of connecting
to Trino, building SQL queries, designing charts and dashboards, and other ways
to gain insight and stay on top of your data.&lt;/p&gt;

&lt;h2 id=&quot;vast-the-vast-database-catalog&quot;&gt;VAST: The VAST database catalog&lt;/h2&gt;

&lt;p&gt;The VAST Database connector for Trino was open-sourced this year! They’ll be
discussing the architecture of VAST and the connector, the purpose and major use
cases for it, and demonstrate the workflows surrounding the VAST Database in the
Trino ecosystem.&lt;/p&gt;

&lt;h2 id=&quot;and-still-more-to-come&quot;&gt;And still more to come!&lt;/h2&gt;

&lt;p&gt;Believe it or not, the great lineup we’ve gone over here still isn’t every talk.
Stay tuned here or on the &lt;a href=&quot;https://trino.io/slack&quot;&gt;Trino Slack&lt;/a&gt; to hear about the
other speakers as they’re announced. And of course, if you want to catch all
these talks live, engage in chat, and have an opportunity to ask questions, make
sure to &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-lineup-announcement&quot;&gt;register to attend&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>As winter nears, the days may be getting shorter, but so is the wait until Trino Summit 2023! It’ll be here before you know it on December 13th and 14th. We’ve got a packed speaker lineup full of exciting talks, and we’re ready to share some details with the Trino community today. Read on for a preview of some talks, and if you’re interested in attending, make sure to… Register!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/lineup-blog-banner.png" />
      
    </entry>
  
    <entry>
      <title>53: Understanding your data with Coginiti and Trino</title>
      <link href="https://trino.io/episodes/53.html" rel="alternate" type="text/html" title="53: Understanding your data with Coginiti and Trino" />
      <published>2023-11-16T00:00:00+00:00</published>
      <updated>2023-11-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/53</id>
      <content type="html" xml:base="https://trino.io/episodes/53.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/manfredmoser&quot;&gt;Manfred Moser&lt;/a&gt;, Director of
Technical Content at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/msmullins/&quot;&gt;Matthew Mullins&lt;/a&gt;, CTO at
&lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/mullinsms&quot;&gt;@mullinsms&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/rnestertsov/&quot;&gt;Roman Nestertsov&lt;/a&gt;, Principle
Engineer at &lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt;,
(&lt;a href=&quot;https://twitter.com/nestertsov&quot;&gt;@nestertsov&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-431-433&quot;&gt;Releases 431-433&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-431.html&quot;&gt;Trino 431&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://trino.io/docs/current/routines.html&quot;&gt;SQL routines&lt;/a&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP FUNCTION&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REPLACE&lt;/code&gt; modifier in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Improved latency for prepared statements in JDBC driver&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-432.html&quot;&gt;Trino 432&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster filtering on columns containing long strings in Parquet data.&lt;/li&gt;
  &lt;li&gt;Predicate pushdown for real and double columns in MongoDB.&lt;/li&gt;
  &lt;li&gt;Support for Iceberg REST catalog in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register_table&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unregister_table&lt;/code&gt; procedures.&lt;/li&gt;
  &lt;li&gt;Support for BEARER authentication for Nessie catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-433.html&quot;&gt;Trino 433&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved support for Hive schema evolution.&lt;/li&gt;
  &lt;li&gt;Add support for altering table comments in the Glue catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also note that Trino 433 also includes documentation for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP CATALOG&lt;/code&gt;.
Check out the third SQL training session for a demo.&lt;/p&gt;

&lt;h2 id=&quot;sql-routine-competition&quot;&gt;SQL routine competition&lt;/h2&gt;

&lt;p&gt;Trino 431 finally delivered the long-awaited support for SQL routines. To
celebrate and see what you all come up with, we are running a competition.
&lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;Share your best SQL routine&lt;/a&gt;, and win a
reward sponsored by &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;call-for-java-21-testing&quot;&gt;Call for Java 21 testing&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/java-duke-21.png&quot; width=&quot;100px&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Java 21, the latest LTS release of Java, arrived in September 2023, and we want
to take advantage of the performance improvements, language features, and new
libraries. But to do so, &lt;a href=&quot;/blog/2023/11/03/java-21.html&quot;&gt;we need your input and confirmation that everything
works as expected&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-jdbc-driver&quot;&gt;Concept of the episode: JDBC driver&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/logos/jdbc-small.png&quot; width=&quot;100px&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Java_Database_Connectivity&quot;&gt;Java Database Connectivity
(JDBC)&lt;/a&gt; is an
important standard for any JVM-based application, that wants to access a
relational database. Trino ships a JDBC driver that abstracts all the low-level
details of our conversational REST API for client tools and supports various
authentication mechanisms, TLS, and other features. This allows tools like
Coginiti to ignore those details, and work with the community on any
improvements for the benefit of all users.&lt;/p&gt;

&lt;h2 id=&quot;client-tool-focus-on-coginiti&quot;&gt;Client tool focus on Coginiti&lt;/h2&gt;

&lt;p&gt;Matthew and Roman are joining us from &lt;a href=&quot;https://www.coginiti.co&quot;&gt;Coginiti&lt;/a&gt;.
Coginiti delivers higher-quality analytics faster. Coginiti provides an
AI-enabled enterprise data workspace that integrates modular development,
version control, and data quality testing throughout the analytic development
lifecycle.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.coginiti.co&quot;&gt;
  &lt;img src=&quot;/assets/images/logos/coginiti-small.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With support for Trino, Coginiti as a client tool provides access to all the
configured catalogs in Trino. It enables data engineers and analyst to work
together in a shared platform, reducing duplication in their work, and bringing
“Don’t repeat yourself (DRY)” to analysts.&lt;/p&gt;

&lt;p&gt;We talk about why Coginiti added &lt;a href=&quot;https://www.coginiti.co/databases/trino/&quot;&gt;support for
Trino&lt;/a&gt;. Coginiti is not a compute
platform itself, but access to many platforms enables a “data blender thinking”.
So as a user you start caring less about the location and source of the
database, and more about the data itself and how you can mix it together to gain
better insights. Every enterprise has more than one data platform, with
different data warehouses, RDBMSes, and data lakes. Matthew talks about reasons
for this situation,. and how Trino as a partner platform to enables users to
federate across all of these platforms when needed.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-coginiti-and-trino&quot;&gt;Demo of the episode: Coginiti and Trino&lt;/h2&gt;

&lt;p&gt;In the demo of Coginiti, Roman and Matthew show some of the features of the tool
that enable code reuse and managing transformations on Trino. A tour through
major aspects of the application gives a good impression on benefits and
supported use cases.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Our line up for speakers and sessions for Trino Summit is nearly finalized. Join
us on the 13th and 14th of December for the free, virtual event. Stay tuned for
details about all the sessions soon, and in the meantime - &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=tcb&quot;&gt;don’t forget to
register&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Trino SQL training series&lt;/a&gt; just
had a successful third session yesterday, and you can check out all the material
in our follow up blog posts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;Getting started with Trino and SQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/11/01/sql-training-2.html&quot;&gt;Advanced analytics with SQL and Trino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is still a chance for you &lt;a href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;to register and attend the fourth session
live&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Data management with SQL and Trino</title>
      <link href="https://trino.io/blog/2023/11/15/sql-training-3.html" rel="alternate" type="text/html" title="Data management with SQL and Trino" />
      <published>2023-11-15T00:00:00+00:00</published>
      <updated>2023-11-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/15/sql-training-3</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/15/sql-training-3.html">&lt;p&gt;In the third part of our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the
experts&lt;/a&gt; David Phillips and I changed
gears from reading data and performing analytics with Trino. We looked the the
topic of write operations. We covered creating catalogs, schema, tables, and
then inserting and updating data, and talked about related topics such as data
source and connector support.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/q2uyV7mBKVc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;The full timestamps for every part of the talk are in the description on
YouTube.&lt;/p&gt;

&lt;p&gt;Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the
series&lt;/a&gt;,
with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community
chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-data-mgt/index.html&quot;&gt;Data management with SQL and
Trino&lt;/a&gt;,
including a file with all SQL statements ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One more episode to go this year, and then we are going to celebrate our users
at Trino Summit 2023. Register now and catch us live for both events:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you next time. I am excited to show you more about &lt;a href=&quot;/blog/2023/11/09/routines.html&quot;&gt;SQL routines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In the third part of our training series Learning SQL with Trino from the experts David Phillips and I changed gears from reading data and performing analytics with Trino. We looked the the topic of write operations. We covered creating catalogs, schema, tables, and then inserting and updating data, and talked about related topics such as data source and connector support.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>Share your best Trino SQL routine</title>
      <link href="https://trino.io/blog/2023/11/09/routines.html" rel="alternate" type="text/html" title="Share your best Trino SQL routine" />
      <published>2023-11-09T00:00:00+00:00</published>
      <updated>2023-11-09T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/09/routines</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/09/routines.html">&lt;p&gt;We want to see the best &lt;a href=&quot;/docs/current/routines.html&quot;&gt;SQL routines&lt;/a&gt;
you can write, feature them as &lt;a href=&quot;/docs/current/routines/examples.html&quot;&gt;examples in the
documentation&lt;/a&gt;, and send you
some goodies as a reward!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;With the recent &lt;a href=&quot;/docs/current/release/release-431.html&quot;&gt;Trino 431
release&lt;/a&gt; we shipped a
feature that has been awaited by many Trino users for a long, long time. &lt;a href=&quot;/docs/current/routines.html&quot;&gt;SQL
routines&lt;/a&gt; are an easy way to define our
own procedural, custom functions. All users on your Trino instance can then use
that function in their queries and enjoy the new feature to simplify their
queries.&lt;/p&gt;

&lt;p&gt;The new process of writing a routine in your client tool in SQL can be used as
alternative to the old way of having to create a custom plugin in Java,
compiling it, and getting the binary deployed in your cluster. The time it takes
to use a function has gone from hours to minutes and a few commands!&lt;/p&gt;

&lt;p&gt;Our documentation includes details for all the supported statements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BEGIN&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECLARE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FUNCTION&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IF&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ITERATE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LEAVE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LOOP&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REPEAT&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RETURN&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHILE&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the memory connector and the Hive connector supporting routine storage, you
can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE FUNCTION&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP FUNCTION&lt;/code&gt;, so that everyone using the
cluster has access to your routines.&lt;/p&gt;

&lt;p&gt;The unit tests and our &lt;a href=&quot;/docs/current/routines/examples.html&quot;&gt;examples
documentation&lt;/a&gt; contain a
number of routines that scratch the surface of what is possible. Now, we are
looking for you to help us improve the documentation and maybe even find some
bugs. So here is what we are asking from you:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Upgrade your Trino cluster, CLI, and other clients to 431 or newer. Support in
client tools may vary.&lt;/li&gt;
  &lt;li&gt;Learn from the documentation and write your own routines.&lt;/li&gt;
  &lt;li&gt;Send us your best SQL routine.
    &lt;ul&gt;
      &lt;li&gt;Create a pull request to add to the &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/docs/src/main/sphinx/routines/examples.md&quot;&gt;examples in the
documentation&lt;/a&gt;
with a new section, and request a review from &lt;a href=&quot;https://github.com/mosabua&quot;&gt;Manfred
(mosabua)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Alternatively, &lt;a href=&quot;mailto:manfred@starburst.io&quot;&gt;email the details&lt;/a&gt; and submit a
&lt;a href=&quot;https://github.com/trinodb/cla&quot;&gt;CLA&lt;/a&gt; separately.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Explain the use case, what the routine does, and maybe also how it works.&lt;/li&gt;
  &lt;li&gt;Include the full statement for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE FUNCTION&lt;/code&gt; definition and an example
invocation.&lt;/li&gt;
  &lt;li&gt;Add any necessary tables or data so we can test the function.&lt;/li&gt;
  &lt;li&gt;Reach out to us on the &lt;a href=&quot;/slack.html&quot;&gt;Trino community Slack&lt;/a&gt;,
if you need any help.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We plan to present submissions at &lt;a href=&quot;/blog/2023/09/14/trino-summit-2023-announcement.html&quot;&gt;Trino Summit 2023&lt;/a&gt;, write a blog post, add them to
the documentation, and &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt; will send a cool
reward for the ten best entries.&lt;/p&gt;

&lt;p&gt;Also, if you have more great Trino usage to talk about and share, we would love
to see your &lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;speaker proposal for Trino
Summit&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We look forward to seeing many great submissions from you all.&lt;/p&gt;

&lt;p&gt;See you at Trino Summit 2023, and don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;register&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Martin, Dain, David, and Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips, Manfred Moser</name>
        </author>
      

      <summary>We want to see the best SQL routines you can write, feature them as examples in the documentation, and send you some goodies as a reward!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql-routine.png" />
      
    </entry>
  
    <entry>
      <title>Trino is moving to Java 21</title>
      <link href="https://trino.io/blog/2023/11/03/java-21.html" rel="alternate" type="text/html" title="Trino is moving to Java 21" />
      <published>2023-11-03T00:00:00+00:00</published>
      <updated>2023-11-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/03/java-21</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/03/java-21.html">&lt;p&gt;We’re excited to announce that as of version 432, Trino can run with Java 21. In
fact, the Trino Docker image uses Java 21 now. We have done upgrades to newer
Java LTS versions successfully before when we upgraded to Java 11 and then &lt;a href=&quot;/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;Java
17 with Trino 390&lt;/a&gt;. Each
time the improvements to the JVM runtime, the garbage collectors, the involved
libraries, and the dependencies resulted in performance gains that came nearly
for free.&lt;/p&gt;

&lt;p&gt;And each time we were able to take advantage of new language constructs and
standard libraries to improve the codebase for all contributors and maintainers
of the project.&lt;/p&gt;

&lt;p&gt;Now it is time to do it again.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In September, &lt;a href=&quot;https://blogs.oracle.com/java/post/the-arrival-of-java-21&quot;&gt;Java 21 was
released&lt;/a&gt; as the
newest long-term support version. The &lt;a href=&quot;https://www.oracle.com/java/technologies/javase/21all-relnotes.html&quot;&gt;consolidated release
notes&lt;/a&gt; are
truly impressive when it comes to breath and depth of improvements throughout
the runtime, the standard libraries, the included tools, and the overall system.&lt;/p&gt;

&lt;p&gt;Java 21 provides numerous great opportunities to improve Trino. Even without
many code changes, the performance benefits can have a significant impact on the
cost of running a Trino cluster.&lt;/p&gt;

&lt;p&gt;Taking it one step further, and into the codebase and used libraries, we are
able to move our performance work to the next level. &lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project
Hummingbird&lt;/a&gt;, our performance
fine-tuning initiative, is buzzing already. &lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt; shipped some great improvements recently again. Just
like with our Java 17 upgrade, &lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;
has been of critical importance to pull all the necessary changes together.&lt;/p&gt;

&lt;p&gt;With the &lt;a href=&quot;https://trino.io/docs/current/release/release-432.html&quot;&gt;Trino 432
release&lt;/a&gt; we have now
made the next big step. The Trino Docker image was changed to use the &lt;a href=&quot;https://adoptium.net/temurin/releases/&quot;&gt;Eclipse
Temurin&lt;/a&gt; distribution of Java 21. We
have been running our test suites with Java 21 for quite some time and all looks
good. With this release, you are now able to easily test Trino with Java 21.
Just use the Docker container in your deployment or testing with your own
pipeline or with the &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Trino Helm charts&lt;/a&gt;. The
new version 0.14.0 of the chart already uses the right JVM configuration and
Trino 432 by default.&lt;/p&gt;

&lt;p&gt;Our plan is to make Java 21 the required runtime and move towards adopting the
new language features and libraries. However, before we do that, we want your
input. Are you ready to move to Java 21 for Trino? Did you do some testing with
it already? Are there any issue you encounters? We want to know all about your
experience. Find us on the Trino community chat and ping us in the &lt;a href=&quot;https://trinodb.slack.com/archives/CP1MUNEUX&quot;&gt;#dev
channel&lt;/a&gt;. Or leave comments in our
&lt;a href=&quot;https://github.com/trinodb/trino/issues/17017&quot;&gt;Java 21 tracking issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We want to hear from you. Any input and feedback is welcome.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Update from 11 Jan 2024:&lt;/strong&gt;
The release of &lt;a href=&quot;https://trino.io/docs/current/release/release-436.html&quot;&gt;Trino 436&lt;/a&gt;
includes the switch to Java 21 as a requirement for running Trino.&lt;/p&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>We’re excited to announce that as of version 432, Trino can run with Java 21. In fact, the Trino Docker image uses Java 21 now. We have done upgrades to newer Java LTS versions successfully before when we upgraded to Java 11 and then Java 17 with Trino 390. Each time the improvements to the JVM runtime, the garbage collectors, the involved libraries, and the dependencies resulted in performance gains that came nearly for free. And each time we were able to take advantage of new language constructs and standard libraries to improve the codebase for all contributors and maintainers of the project. Now it is time to do it again.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/java-duke-21.png" />
      
    </entry>
  
    <entry>
      <title>Advanced analytics with SQL and Trino</title>
      <link href="https://trino.io/blog/2023/11/01/sql-training-2.html" rel="alternate" type="text/html" title="Advanced analytics with SQL and Trino" />
      <published>2023-11-01T00:00:00+00:00</published>
      <updated>2023-11-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/11/01/sql-training-2</id>
      <content type="html" xml:base="https://trino.io/blog/2023/11/01/sql-training-2.html">&lt;p&gt;In the second part of our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the
experts&lt;/a&gt; Martin Traverso and I built
on top of the foundational knowledge from the &lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;first training session&lt;/a&gt;. We continued to learn more about data
types and working with them, including the important strings, numeric, temporal,
and JSON types.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/S-mfueDmXds&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Following are a couple of specific timestamps for interesting
topics snippets:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=601s&quot;&gt;Temporal data types&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=1920s&quot;&gt;Strings&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=2442s&quot;&gt;Numeric types&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=2705s&quot;&gt;URL parsing and more&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&amp;amp;t=2850s&quot;&gt;JSON&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full timestamps for every part of the talk are in the description on
YouTube.&lt;/p&gt;

&lt;p&gt;Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the
series&lt;/a&gt;,
with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community
chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-adv-analytics/index.html&quot;&gt;Advanced analytics with SQL and
Trino&lt;/a&gt;,
including a file with all SQL statements ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are halfway through the series, and there is lots more to cover. Don’t forget
to register for the next session, join us to ask specific questions, and learn
much more about SQL and Trino:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you next time,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In the second part of our training series Learning SQL with Trino from the experts Martin Traverso and I built on top of the foundational knowledge from the first training session. We continued to learn more about data types and working with them, including the important strings, numeric, temporal, and JSON types.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>52: Commander Bun Bun takes a bite out of Yugabyte</title>
      <link href="https://trino.io/episodes/52.html" rel="alternate" type="text/html" title="52: Commander Bun Bun takes a bite out of Yugabyte" />
      <published>2023-10-26T00:00:00+00:00</published>
      <updated>2023-10-26T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/52</id>
      <content type="html" xml:base="https://trino.io/episodes/52.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/dmagda/&quot;&gt;Denis Magda&lt;/a&gt;, Director of Developer
Relations at &lt;a href=&quot;https://www.yugabyte.com/&quot;&gt;Yugabyte&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-428-430&quot;&gt;Releases 428-430&lt;/h2&gt;

&lt;p&gt;Unofficial highlights from Cole:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-428.html&quot;&gt;Trino 428&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reduced memory usage for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Simplified configuration for managing writer counts&lt;/li&gt;
  &lt;li&gt;Faster reads for small Parquet files on data lakes&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://docs.pinot.apache.org/users/user-guide-query/query-options&quot;&gt;query options&lt;/a&gt;
on dynamic tables in Pinot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-429.html&quot;&gt;Trino 429&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster reading of ORC files in Hive&lt;/li&gt;
  &lt;li&gt;More types supported for schema evolution in Hive&lt;/li&gt;
  &lt;li&gt;Security improvements, including logging out of a session with the Web UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-430.html&quot;&gt;Trino 430&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for setting a timezone on the session level&lt;/li&gt;
  &lt;li&gt;Table statistics in MariaDB&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-jdbc-based-connectors&quot;&gt;Concept of the episode: JDBC-based connectors&lt;/h2&gt;

&lt;p&gt;In Trino, we have a lot of connectors that are based on top of JDBC. JDBC could
stand for “just da best connectors,” but it’s really Java database connectivity,
and it’s one of the core APIs by which many of the most prominent connectors in
the Trino ecosystem function. It’s so common, in fact, that we have
&lt;a href=&quot;/docs/current/develop/example-jdbc.html&quot;&gt;an example JDBC connector in Trino&lt;/a&gt; to
make it easier to go implement your own JDBC-based connector if you need one.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-yugabytedb&quot;&gt;Concept of the episode: YugabyteDB&lt;/h2&gt;

&lt;p&gt;But if the topic of today’s episode is YugabyteDB, why are we talking about
PostgreSQL? Well, if you’re unfamiliar with Yugabyte, lifting from
&lt;a href=&quot;https://docs.yugabyte.com/&quot;&gt;their docs&lt;/a&gt;: “YugabyteDB is distributed PostgreSQL
that delivers on-demand scale, built-in resilience, and a multi-API interface.”
Distributed architecture should be a familiar concept to a community involved
with a distributed query engine, and if you understand how Trino is able to
leverage it, you should also understand why it makes sense to pair with
Yugabyte. We’ll be discussing why Yugabyte got started, what it does differently
from other databases, what it does better than other databases, and how you
might want to use it with Trino.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-trino-on-yugabytedb&quot;&gt;Demo of the episode: Trino on YugabyteDB&lt;/h2&gt;

&lt;p&gt;As part of the episode, we’ll also be showing off how you can use YugabyteDB 
with Trino. We start with using the PostgreSQL connector, then Denis shows how 
to use the PostgreSQL connector to run Trino with Yugabyte. It’s always hard to
explain demos in show notes, so tune into the YouTube video and take a look for
yourself if you’re curious!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Summit, the biggest Trino event of the year, is coming up on the 13th and
14th of December, and like Trino Fest, it’ll be fully virtual. If you’d like to
give a talk about anything related to Trino, we’re looking for speakers now.
&lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;Submit your talk here!&lt;/a&gt; If you’d
rather attend, you can also
&lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=tcb&quot;&gt;go register to attend now&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Prior to Trino Summit, if you’d like to learn about SQL from the absolute
experts, we’ve also gotten started with the
&lt;a href=&quot;/blog/2023/09/27/training-series&quot;&gt;Trino Training Series&lt;/a&gt;
that we’ll be running as a buildup to the summit. The
&lt;a href=&quot;/blog/2023/10/18/sql-training-1&quot;&gt;recap for the first session&lt;/a&gt;
is live, but there’s three more to come! Register now and look forward
to those great sessions starting from the ground up and ending with some key
tricks and Trino specifics that even a seasoned SQL veteran may not know about.&lt;/p&gt;

&lt;p&gt;We also have a talk about Trino on Ice and data meshes coming up in Redwood City
with Slalom and Starburst. If you’re local, consider
&lt;a href=&quot;https://go.slalom.com/starburstnorcal&quot;&gt;signing up and checking it out!&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Getting started with Trino and SQL</title>
      <link href="https://trino.io/blog/2023/10/18/sql-training-1.html" rel="alternate" type="text/html" title="Getting started with Trino and SQL" />
      <published>2023-10-18T00:00:00+00:00</published>
      <updated>2023-10-18T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/10/18/sql-training-1</id>
      <content type="html" xml:base="https://trino.io/blog/2023/10/18/sql-training-1.html">&lt;p&gt;In our training series &lt;a href=&quot;/blog/2023/09/27/training-series.html&quot;&gt;Learning SQL with Trino from the experts&lt;/a&gt; Martin Traverso, Dain Sundstrom, David Phillips,
and myself will run through the wide range of SQL support and features of Trino with
our audience. In the first episode, we covered the concepts of Trino and SQL, and
then started to learn some basic SQL. Now you can take advantage of the
recording and available resources to learn at your own pace.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The recording of the event allows you to watch it all as if you attended live,
jump to specific sections as desired, or pause while you follow along with the
demos:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/SnvSBYhRZLg&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Following are a couple of specific timestamps for interesting
topics snippets:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=380&quot;&gt;What is Trino?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=1163&quot;&gt;Catalogs and connectors&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=1658&quot;&gt;Clients&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&amp;amp;t=3224&quot;&gt;SQL WHERE statement&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full timestamps for every part of the talk are in the description on
YouTube.&lt;/p&gt;

&lt;p&gt;Also make sure you take advantage of these additional resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/blog/sql-training-series-starburst-2023.pdf&quot;&gt;General overview slide deck for the series&lt;/a&gt;, with links to resources like our &lt;a href=&quot;/slack.html&quot;&gt;community chat&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-trino/index.html&quot;&gt;SQL and Trino concepts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slide deck for &lt;a href=&quot;https://trinodb.github.io/presentations/presentations/sql-basics/index.html&quot;&gt;SQL basics with Trino&lt;/a&gt;, including a file with all SQL statements ready to go&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that you know of the series and saw the first part of it, make sure you
register for the next ones, so you can ask specific questions and learn much
more about SQL and Trino:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;See you then,&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>In our training series Learning SQL with Trino from the experts Martin Traverso, Dain Sundstrom, David Phillips, and myself will run through the wide range of SQL support and features of Trino with our audience. In the first episode, we covered the concepts of Trino and SQL, and then started to learn some basic SQL. Now you can take advantage of the recording and available resources to learn at your own pace.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>A report from the Trino Conference Tokyo 2023</title>
      <link href="https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023.html" rel="alternate" type="text/html" title="A report from the Trino Conference Tokyo 2023" />
      <published>2023-10-11T00:00:00+00:00</published>
      <updated>2023-10-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023</id>
      <content type="html" xml:base="https://trino.io/blog/2023/10/11/a-report-about-trino-conference-tokyo-2023.html">&lt;p&gt;The Trino community in Japan held an online event on October 5th, 2023. This
article is a summary of the conference aiming to share the presentations and
provide an overview.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Watch a replay of the whole event, or jump to specific time stamps and topic of
interest:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/CTwk2rkatx8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;This year, there were 4 sessions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Trino, Starburst Galaxy, and Enterprise&lt;/li&gt;
  &lt;li&gt;Log infrastructure using Trino and Iceberg&lt;/li&gt;
  &lt;li&gt;Data infrastructure using Spark and Trino on bare metal k8s&lt;/li&gt;
  &lt;li&gt;Getting started Trino and a transactional data lake with serverless Athena&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;trino-starburst-galaxy-and-enterprise&quot;&gt;Trino, Starburst Galaxy, and Enterprise&lt;/h1&gt;

&lt;p&gt;The first session was presented by Yuya Ebihara (me) from Starburst. I explained
the Trino changes from 2022 and 2023, as well as features of Starburst Galaxy
and Starburst Enterprise. The session introduced &lt;a href=&quot;https://prtimes.jp/main/html/rd/p/000000226.000025237.html&quot;&gt;a press release of the
partnership of Starburst and Dell Technologies in
Japan&lt;/a&gt;.&lt;/p&gt;

&lt;iframe src=&quot;https://docs.google.com/presentation/d/e/2PACX-1vRubtZB9peROzcGgaTQQYkLs-9jZEbWuRszNInKviuj1RdPwp5CrElssLwLYSUuVeGUfj58wv428UFw/embed&quot; frameborder=&quot;0&quot; width=&quot;595&quot; height=&quot;485&quot; allowfullscreen=&quot;true&quot; mozallowfullscreen=&quot;true&quot; webkitallowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;log-infrastructure-using-trino-and-iceberg&quot;&gt;Log infrastructure using Trino and Iceberg&lt;/h1&gt;

&lt;p&gt;The second session was presented by Tadahisa Kamijo from Sakura Internet. He
 explained some requirements for new analytics environments such as concurrent
read/write, schema evolution, record-level modification, restoring past
snapshots, and addressing performance issues with the Hive metastore. They
decided to use Trino and Iceberg for handling these requests. Kamijo-san also
introduced the file layout in Iceberg and demonstrated how to debug Iceberg
files using their Java client.&lt;/p&gt;

&lt;iframe class=&quot;speakerdeck-iframe&quot; frameborder=&quot;0&quot; src=&quot;https://speakerdeck.com/player/4c9229c81e36494ca0c722b20bfdf20e&quot; title=&quot;TrinoとIcebergで ログ基盤の構築 / 2023-10-05 Trino Presto Meetup&quot; allowfullscreen=&quot;true&quot; style=&quot;border: 0px; background: padding-box padding-box rgba(0, 0, 0, 0.1); margin: 0px; padding: 0px; border-radius: 6px; box-shadow: rgba(0, 0, 0, 0.2) 0px 5px 40px; width: 100%; height: auto; aspect-ratio: 560 / 315;&quot; data-ratio=&quot;1.7777777777777777&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;data-infrastructure-using-spark-an-trino-on-bare-metal-k8s&quot;&gt;Data infrastructure using Spark an Trino on bare metal k8s&lt;/h1&gt;

&lt;p&gt;The third session was presented by Yasukazu Nagatomi from MicroAd. They started
a migration to Trino from Impala to resolve the following issues - separating
computing and storage, refreshing and utilizing table and column statistics even
with large tables, and supporting schema evolution. Nagatomi-san shared a use
case of the Trino features fault-tolerant execution and spill-to-disk, which is
the first public use case of these features in Japan.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/NTzgv4IUvAPIvp&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/microad_engineer/trino-conference-tokyo-2023&quot; title=&quot;ベアメタルで実現するSpark＆Trino on K8sなデータ基盤&quot; target=&quot;_blank&quot;&gt;ベアメタルで実現するSpark＆Trino on K8sなデータ基盤&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;//www.slideshare.net/microad_engineer&quot; target=&quot;_blank&quot;&gt;MicroAd, Inc.(Engineer)&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;getting-started-trino-and-a-transactional-data-lake-with-serverless-athena&quot;&gt;Getting started Trino and a transactional data lake with serverless Athena&lt;/h1&gt;

&lt;p&gt;The last session was presented by Sotaro Hikita from AWS. Athena is a serverless
service for ad hoc analytics with Trino and Presto foundation. It supports not only S3
data but also various datasources via Federated Query. In Athena, Iceberg
supports both read and write operations, while Hudi and Delta Lake only support
read operations.&lt;/p&gt;

&lt;iframe class=&quot;speakerdeck-iframe&quot; frameborder=&quot;0&quot; src=&quot;https://speakerdeck.com/player/e1f3188001ca4919b227177f3934b626&quot; title=&quot;サーバレスなAmazon Athenaで始めるTrinoとTransactional Data Lake&quot; allowfullscreen=&quot;true&quot; style=&quot;border: 0px; background: padding-box padding-box rgba(0, 0, 0, 0.1); margin: 0px; padding: 0px; border-radius: 6px; box-shadow: rgba(0, 0, 0, 0.2) 0px 5px 40px; width: 100%; height: auto; aspect-ratio: 560 / 315;&quot; data-ratio=&quot;1.7777777777777777&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap up&lt;/h1&gt;

&lt;p&gt;We sincerely appreciate the participation of community members in Japan. Thank
you so much for watching the live event. We are planning to hold an offline
event next year, see you next time!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yuya&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Yuya Ebihara</name>
        </author>
      

      <summary>The Trino community in Japan held an online event on October 5th, 2023. This article is a summary of the conference aiming to share the presentations and provide an overview.</summary>

      
      
    </entry>
  
    <entry>
      <title>51: Trino cools off with PopSQL</title>
      <link href="https://trino.io/episodes/51.html" rel="alternate" type="text/html" title="51: Trino cools off with PopSQL" />
      <published>2023-10-05T00:00:00+00:00</published>
      <updated>2023-10-05T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/51</id>
      <content type="html" xml:base="https://trino.io/episodes/51.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jakeptrsn/&quot;&gt;Jake Peterson&lt;/a&gt;, Head of Customer
Success at &lt;a href=&quot;https://popsql.com/&quot;&gt;PopSQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Matthew Peveler, Software Engineer at &lt;a href=&quot;https://popsql.com/&quot;&gt;PopSQL&lt;/a&gt;,
&lt;a href=&quot;https://github.com/MasterOdin&quot;&gt;MasterOdin&lt;/a&gt; on GitHub&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-423-427&quot;&gt;Releases 423-427&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-423.html&quot;&gt;Trino 423&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Schema evolution for nested fields&lt;/li&gt;
  &lt;li&gt;Support for comments on materialized view columns&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASCADE&lt;/code&gt; option in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; for Clickhouse, MariaDB, MySQL,
Oracle and SingleStore&lt;/li&gt;
  &lt;li&gt;Various performance improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-424.html&quot;&gt;Trino 424&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for JSON, CSV, text and related formats in Hive&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CASCADE&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; for PostgreSQL and Iceberg&lt;/li&gt;
  &lt;li&gt;Improved coordinator CPU utilization for large clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-425.html&quot;&gt;Trino 425&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for check constraints in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; for Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for the Decimal128 in MongoDB connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-426.html&quot;&gt;Trino 426&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET/RESET SESSION AUTHORIZATION&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance of aggregations over decimal values.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for Databricks 13.3 LTS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-427.html&quot;&gt;Trino 427&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for pushing down &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; statements into connectors.&lt;/li&gt;
  &lt;li&gt;Support for reading Delta Lake tables with Deletion Vectors.&lt;/li&gt;
  &lt;li&gt;Faster writing to Parquet files in Delta Lake and Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for querying tags in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-popsql&quot;&gt;Concept of the episode: PopSQL&lt;/h2&gt;

&lt;p&gt;It may be familiar to some of our viewers to describe an environment where
key queries and dashboards are buried in someone’s personal workspace, and you
have to go ask them directly every time you want to check on your metrics.
When you’re running a world-class, highly-performant query engine like Trino and
investing time and resources into maintaining it, shouldn’t you treat your
queries like a first-class, collaborative, versioned system, too?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://popsql.com/&quot;&gt;PopSQL&lt;/a&gt;, a playful spin on the word popsicle, solves the
sadness that is disorganized and siloed insights by centralizing queries into a
platform that has versioning, security, and a suite of collaborative tools
comparable to Google Drive. Want to work with your teammate on a query? You can
open up the same editor and see the same thing. Want to see what that query
someone ran last week was to see how the new feature is doing? It’s there. Have
a suggestion to improve something? Leave a comment. Realize your suggestion was
wrong and need to undo the change? You can view past versions of the query.&lt;/p&gt;

&lt;p&gt;PopSQL and Trino make sense together. PopSQL provides a best-in-class interface
for organizing, collaborating, and working together on all of your SQL queries
across the business, and Trino handles running those queries at unparalleled
speeds. They go hand-in-hand for treating your data and SQL analytics as first
class citizens. In today’s episode, we’ll be exploring what PopSQL is, how it
integrates with Trino, and how the engineers at PopSQL have done some cool
things with Trino to make the integration better than ever before. We’ll start
with that last one, actually.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-a-new-nodejs-adapter-for-trino&quot;&gt;Concept of the episode: A new Node.js adapter for Trino&lt;/h2&gt;

&lt;p&gt;Trino in the frontend is… a tricky thing. We can go ahead and admit that the
&lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Trino web UI&lt;/a&gt; isn’t going to win any
awards for design or functionality. And while a couple Node-based libraries
exist out there, including &lt;a href=&quot;https://www.npmjs.com/package/presto-client&quot;&gt;presto-client-node&lt;/a&gt;
and &lt;a href=&quot;https://github.com/vweevers/lento&quot;&gt;lento&lt;/a&gt;. But presto-client-node lacked
support for streaming and had some issues handling 500 errors, and lento doesn’t
quite support Trino out of the box and only supports single streams, which
wasn’t ideal for PopSQL’s distributed architecture. So when PopSQl’s engineers
went to build their frontend and integrate with Trino, what did they do? Build
their own adapter.&lt;/p&gt;

&lt;p&gt;We’ll talk about how it was implemented, what key features it unlocks, and why
it makes using PopSQL with Trino an even better experience.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-using-popsql-with-trino&quot;&gt;Demo of the episode: Using PopSQL with Trino&lt;/h2&gt;

&lt;p&gt;It’s hard to write show notes for a demo, because you can’t really experience
the demo by reading about what’s happening. But as a surface-level overview,
we’ll be going over:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Setting up a connection&lt;/li&gt;
  &lt;li&gt;The schema explorer&lt;/li&gt;
  &lt;li&gt;The SQL editor&lt;/li&gt;
  &lt;li&gt;Query scheduling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-57-on-trino-gateway-release-version-3&quot;&gt;PR of the episode: #57 on trino-gateway: Release version 3&lt;/h2&gt;

&lt;p&gt;Last week, the community officially released the
&lt;a href=&quot;https://github.com/trinodb/trino-gateway&quot;&gt;trino-gateway&lt;/a&gt;, a proxy and load
balancer that enables large operations to run multiple Trino clusters in
harmony with each other to serve big queries and small queries alike. If you or
your organization have a need for more than one Trino cluster and want the
seamless experience of being able to connect to any of them through a single
interface, then check it out! It’s the product of many months of effort and
should be a fantastic solution for running Trino at the absolute largest scales.&lt;/p&gt;

&lt;p&gt;To learn more about it, you should check out
&lt;a href=&quot;/blog/2023/09/28/trino-gateway&quot;&gt;the blog post announcing its first release.&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Trino Summit, the biggest Trino event of the year, is coming up on the 13th and
14th of December, and like Trino Fest, it’ll be fully virtual. If you’d like to
give a talk about anything related to Trino, we’re looking for speakers now.
&lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;Submit your talk here!&lt;/a&gt; If you’d
rather attend, you can also
&lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=tcb&quot;&gt;go register to attend now&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Prior to Trino Summit, if you’d like to learn about SQL from the absolute
experts, we’ve also announced the &lt;a href=&quot;/blog/2023/09/27/training-series&quot;&gt;Trino Training Series&lt;/a&gt;
that we’ll be running as a buildup to the summit. Register now and look forward
to four great sessions starting from the ground up and ending with some key
tricks and Trino specifics that even a seasoned SQL veteran may not know about.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino Gateway has arrived</title>
      <link href="https://trino.io/blog/2023/09/28/trino-gateway.html" rel="alternate" type="text/html" title="Trino Gateway has arrived" />
      <published>2023-09-28T00:00:00+00:00</published>
      <updated>2023-09-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/28/trino-gateway</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/28/trino-gateway.html">&lt;p&gt;You started with one Trino cluster, and your users like the power for SQL and
&lt;a href=&quot;/ecosystem/index.html#data-sources&quot;&gt;querying all sorts of data sources&lt;/a&gt;.
Then you needed to upgrade and got a cluster for testing going. That was a while
ago, and now you run a separate cluster configured for ETL workloads with
fault-tolerant execution, and some others with different configurations.&lt;/p&gt;

&lt;p&gt;With Trino Gateway we now have an answer to your users request to provide one URL
for all the clusters. Trino Gateway has arrived!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce our &lt;a href=&quot;https://github.com/trinodb/trino-gateway/blob/main/docs/release-notes.md#trino-gateway-3-26-sep-2023&quot;&gt;first release of Trino
Gateway&lt;/a&gt;.
The release is the result of many, many months of effort to move the legacy
Presto Gateway to Trino, start a refactor of the project, and add numerous new
features.&lt;/p&gt;

&lt;p&gt;Many larger deployments across the Trino community rely on the gateway as a load
balancer, proxy server, and configurable routing gateway for multiple Trino
clusters. Users don’t need to worry about what catalog and data source is
available in what Trino cluster. Trino Gateway exposes one URL for them all.
Administrators can ensure routing is correct and use the REST API to configure
the necessary rules. This also allows seamless upgrades of clusters behind Trino
Gateway in a blue/green deployment mode.&lt;/p&gt;

&lt;p&gt;Up to now, many users had to maintain separate forks of the legacy Presto
Gateway. Some of these users created numerous improvements in isolation of each
other, sometimes even implementing the same feature multiple times. This first
release of Trino Gateway starts a strong collaboration of some of these users.
Bloomberg contributed the main bulk of the new features, including the
much-requested support for authentication and authorization on Trino Gateway
itself. Maintainers and contributors from Starburst pulled together the
stakeholders and managed the project, and collaborators from Naver, LinkedIn,
Dune, and others are already helping out and ready to move the project forward.&lt;/p&gt;

&lt;p&gt;There are exciting times ahead for the project, and we have big plans for
documentation, installation, and general modernizations of the app, so go and
have a look at the project, read the documentation and release notes, file an
issue, or submit a pull request:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://github.com/trinodb/trino-gateway&quot;&gt;
        Trino Gateway
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Interested to find out more? Find us and others users and contributors on the
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=trino-gateway&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-gateway&lt;/code&gt;&lt;/a&gt;
and
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=trino-gateway-dev&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-gateway-dev&lt;/code&gt;&lt;/a&gt;
channels in &lt;a href=&quot;/slack.html&quot;&gt;the Trino community Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Also, don’t forget to tell us about your usage of Trino Gateway or Trino and
&lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;submit a talk for Trino Summit
2023&lt;/a&gt;. And if you just want to learn
and listen to others, &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;register as
attendee&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and all the other Trino Gateway contributors&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso</name>
        </author>
      

      <summary>You started with one Trino cluster, and your users like the power for SQL and querying all sorts of data sources. Then you needed to upgrade and got a cluster for testing going. That was a while ago, and now you run a separate cluster configured for ETL workloads with fault-tolerant execution, and some others with different configurations. With Trino Gateway we now have an answer to your users request to provide one URL for all the clusters. Trino Gateway has arrived!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/trino-gateway-small.png" />
      
    </entry>
  
    <entry>
      <title>Learning SQL with Trino from the experts</title>
      <link href="https://trino.io/blog/2023/09/27/training-series.html" rel="alternate" type="text/html" title="Learning SQL with Trino from the experts" />
      <published>2023-09-27T00:00:00+00:00</published>
      <updated>2023-09-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/27/training-series</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/27/training-series.html">&lt;p&gt;Do you have a rough idea of what SQL is? Do you need to get data out of object
storage in the cloud and some relational database at the same time? You should
look at Trino and learn about SQL.&lt;/p&gt;

&lt;p&gt;Or do you know the ins and outs of joins, window functions, and your SQL
queries are counted by the pages and not lines? You may even be the expert on SQL on
your team. You should &lt;em&gt;also&lt;/em&gt; look at Trino and SQL.&lt;/p&gt;

&lt;p&gt;Luckily for you all, we have the right SQL training for everyone in our upcoming
series with the founders of the Trino project and SQL experts Martin Traverso,
Dain Sundstrom, and David Phillips, and myself as host and co-trainer.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the SQL training series, we start with the basics of Trino. You will learn
that despite the fact that there is leopard frog on the cover of &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The
Definitive Guide&lt;/a&gt;, SQL does
not stand for Silly Quacking Leopardfrogs. Instead SQL stands for Structured
Query Language, and you will learn about the benefits of connecting &lt;a href=&quot;/ecosystem/index.html#data-sources&quot;&gt;many
data sources&lt;/a&gt; to Trino, and using
&lt;a href=&quot;/ecosystem/index.html#clients&quot;&gt;different clients&lt;/a&gt;. And you can always use
the same powerful SQL. And for the SQL pros, you learn about catalogs and
queries that go across data sources.&lt;/p&gt;

&lt;p&gt;Then we’ll glance at the basic SQL foundations, since there are literally
hundreds of books, videos, and training course around. All of them teach you
things like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; statements, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clauses, and unravel the confusions
around &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LEFT OUTER JOIN&lt;/code&gt; and the like.&lt;/p&gt;

&lt;p&gt;And after this is when we get to the interesting stuff. Following is a list of
some of the topics we will cover:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino concepts like cluster, data source, client, catalog, and more&lt;/li&gt;
  &lt;li&gt;Overview of all the SQL support with statements, data types, functions, and
connector support&lt;/li&gt;
  &lt;li&gt;Working with data types, including numerical and text values, dates and times,
JSON, …&lt;/li&gt;
  &lt;li&gt;Lots of scalar, aggregation, window functions&lt;/li&gt;
  &lt;li&gt;Object storage and other data sources&lt;/li&gt;
  &lt;li&gt;Creating schemas, tables, and views&lt;/li&gt;
  &lt;li&gt;Inserting, merging, moving and deleting data&lt;/li&gt;
  &lt;li&gt;Metadata in general and in hidden tables like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Table procedures&lt;/li&gt;
  &lt;li&gt;Trino views, Trino materialized views and other views&lt;/li&gt;
  &lt;li&gt;Global and connector level table functions, including query pass-through&lt;/li&gt;
  &lt;li&gt;Support for SQL routines, also known as user-defined functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interested now? No matter how great your SQL knowledge or Trino expertise is,
you will learn something new in this series.  So what are you waiting for?&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trino-training-series/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=Global-FY24-Trino-Training-Series&amp;amp;utm_content=1&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Join us in one or all of the sessions on the following dates:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;18th of October 2023: &lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;Getting started with Trino and SQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;1st of November 2023: &lt;a href=&quot;/blog/2023/11/01/sql-training-2.html&quot;&gt;Advanced analytics with SQL and Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;15th of November 2023: &lt;a href=&quot;/blog/2023/11/15/sql-training-3.html&quot;&gt;Data management with SQL and Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;29th November 2023: &lt;a href=&quot;/blog/2023/11/29/sql-training-4.html&quot;&gt;Functions with SQL and Trino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We look forward to seeing you in class.&lt;/p&gt;

&lt;p&gt;Martin, Dain, David, and Manfred&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Videos, slide decks, and other resources for all classes are now available:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Getting started with Trino and SQL: &lt;a href=&quot;/blog/2023/10/18/sql-training-1.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=SnvSBYhRZLg&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Advanced analytics with SQL and Trino: &lt;a href=&quot;/blog/2023/11/01/sql-training-2.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=S-mfueDmXds&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Data management with SQL and Trino: &lt;a href=&quot;/blog/2023/11/15/sql-training-3.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=q2uyV7mBKVc&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Functions with SQL and Trino: &lt;a href=&quot;/blog/2023/11/29/sql-training-4.html&quot;&gt;Blog post with resources and video&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=1siAYR6BzzY&quot;&gt;Video on YouTube&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Do you have a rough idea of what SQL is? Do you need to get data out of object storage in the cloud and some relational database at the same time? You should look at Trino and learn about SQL. Or do you know the ins and outs of joins, window functions, and your SQL queries are counted by the pages and not lines? You may even be the expert on SQL on your team. You should also look at Trino and SQL. Luckily for you all, we have the right SQL training for everyone in our upcoming series with the founders of the Trino project and SQL experts Martin Traverso, Dain Sundstrom, and David Phillips, and myself as host and co-trainer.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-sql.png" />
      
    </entry>
  
    <entry>
      <title>Chinese edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2023/09/21/the-definitive-guide-2-cn.html" rel="alternate" type="text/html" title="Chinese edition of Trino: The Definitive Guide" />
      <published>2023-09-21T00:00:00+00:00</published>
      <updated>2023-09-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/21/the-definitive-guide-2-cn</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/21/the-definitive-guide-2-cn.html">&lt;p&gt;Trino, Trino, Trino everywhere. Just looking at our website stats and the users
in our community chat, we know that Trino is going places. We also know that one
of these places with a large user community is China. And now we have good news
for you. A translation of the second edition of the book to Chinese is now
available.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce that a Chinese translation of the book &lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino:
The Definitive Guide&lt;/a&gt; is now
available for the communities all across China and far beyond and hopefully a
lowers the barrier to Trino for native speakers. We invite you all to get your
own copy:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://product.dangdang.com/11487789827.html&quot;&gt;
        Trino权威指南(原书第2版) 机械工业出版社
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Our thanks goes out the teams at O’Reilly and dangdang for making this happen.
We hope many readers will benefit from the translated edition.&lt;/p&gt;

&lt;p&gt;We look forward to chatting with many of our new readers and Trino users on the
&lt;a href=&quot;https://trinodb.slack.com/app_redirect?channel=general-cn&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;general-cn&lt;/code&gt;&lt;/a&gt; channel in &lt;a href=&quot;/slack.html&quot;&gt;the Trino community Slack&lt;/a&gt;,
other channels, and direct messaging.&lt;/p&gt;

&lt;p&gt;Also, don’t forget to tell us about your usage of Trino. You can contact us on
Slack to be a guest in &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt; or &lt;a href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;submit a talk for Trino
Summit 2023&lt;/a&gt;. And if you just want
to learn and listen to others, &lt;a href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;register as
attendee&lt;/a&gt; for Trino Summit 2023.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and Matt&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Matt Fuller</name>
        </author>
      

      <summary>Trino, Trino, Trino everywhere. Just looking at our website stats and the users in our community chat, we know that Trino is going places. We also know that one of these places with a large user community is China. And now we have good news for you. A translation of the second edition of the book to Chinese is now available.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-cn-cover.png" />
      
    </entry>
  
    <entry>
      <title>Join us for Trino Summit 2023</title>
      <link href="https://trino.io/blog/2023/09/14/trino-summit-2023-announcement.html" rel="alternate" type="text/html" title="Join us for Trino Summit 2023" />
      <published>2023-09-14T00:00:00+00:00</published>
      <updated>2023-09-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/09/14/trino-summit-2023-announcement</id>
      <content type="html" xml:base="https://trino.io/blog/2023/09/14/trino-summit-2023-announcement.html">&lt;p&gt;The Trino community is buzzing. Commander Bun Bun is ready to invite you all to
join us for Trino Summit 2023. And “all” really means everyone in the community.
The event is free to attend, virtual, and full of news and shared knowledge from
your peers using Trino. Don’t hesitate to submit your talk and register to
attend now.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We are pleased to announce the upcoming Trino Summit 2023. The summit is
scheduled as a virtual event on the &lt;strong&gt;13th and 14th of December 2023&lt;/strong&gt;, and
attendance is free!&lt;/p&gt;

&lt;p&gt;If you’d like to share your knowledge and information about Trino usage and give
a talk at this year’s Trino Summit, we’re putting out a call for speakers. We
are accepting submissions from now until the &lt;strong&gt;12th of November&lt;/strong&gt;, but we
recommend submitting as soon as possible, because we expect slots to fill up
fast.&lt;/p&gt;

&lt;p&gt;We’re looking for intermediate to advanced-level talks on a variety of themes.
If you have an interesting story about how you leverage Trino in your data
platform for analytics and other workloads, found a neat way to extend it with a
custom plugin or add-on, or swapped to Trino for a performance win, we’d love to
hear about it. We’re excited to expand our speaker lineup with talks from the
broader Trino community. Find more information about duration, technical
details, and more suggestions when you submit your talk.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit2023/?utm_source=trino&amp;amp;utm_medium=website&amp;amp;utm_campaign=NORAM-FY24-Q4-EV-Trino-Summit-2023&amp;amp;utm_content=blog-1&quot;&gt;
        Register to attend
    &lt;/a&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://sessionize.com/trino-summit-2023/&quot;&gt;
        Submit a talk
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;The event of the Trino Software Foundation is organized and sponsored by
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;, and we invite other sponsors to help make
this a successful event for the Trino community.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://starburst.io&quot;&gt;
  &lt;img src=&quot;/assets/images/logos/starburst-small.png&quot; title=&quot;Starburst&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If that interests you or your employer, &lt;a href=&quot;mailto:events@starburst.io&quot;&gt;contact the Trino events team for more
information&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And of course, we’re looking forward to reading your proposals and seeing you
then.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>The Trino community is buzzing. Commander Bun Bun is ready to invite you all to join us for Trino Summit 2023. And “all” really means everyone in the community. The event is free to attend, virtual, and full of news and shared knowledge from your peers using Trino. Don’t hesitate to submit your talk and register to attend now.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2023/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>50: Celebrating 50 episodes of Trino Community Broadcast</title>
      <link href="https://trino.io/episodes/50.html" rel="alternate" type="text/html" title="50: Celebrating 50 episodes of Trino Community Broadcast" />
      <published>2023-07-27T00:00:00+00:00</published>
      <updated>2023-07-27T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/50</id>
      <content type="html" xml:base="https://trino.io/episodes/50.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Head of Developer Relations at &lt;a href=&quot;https://tabular.io/&quot;&gt;Tabular&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino co-creator and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/daindumb&quot;&gt;@daindumb&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-421-422&quot;&gt;Releases 421-422&lt;/h2&gt;

&lt;p&gt;Unofficial highlights from Cole:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-421.html&quot;&gt;Trino 421&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CHECK&lt;/code&gt; constraints in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; statements.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; on Google Sheets.&lt;/li&gt;
  &lt;li&gt;Faster queries on MongoDB tables with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; columns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-422.html&quot;&gt;Trino 422&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS ... SELECT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Support for nested fields in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ADD COLUMN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Faster Avro reader for Hive.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register_table&lt;/code&gt; procedure to register Hadoop tables in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-episode-50&quot;&gt;Concept of the episode: 50!&lt;/h2&gt;

&lt;p&gt;No, that’s not a factorial, we’re just excited to have made it to 50 Trino
Community Broadcast episodes. We’ve brought back some familiar faces to talk
about what we’ve done, how we’ve got here, what it takes to keep an open source
project ticking for over a decade, and celebrate the steps we’ve taken along
the way. It’s unscripted, and the discussion carries to wherever it feels like.&lt;/p&gt;

&lt;p&gt;Tune in to hear about the history of the Trino Community Broadcast, the upcoming
Snowflake connector, and a few of the core philosophies that have kept Trino
running. Manfred also shows off updates to the Trino website, highlighting all
the tools, data sources, and add-ons that you can use with Trino.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;Trino Fest was a little over a month ago, and we’re publishing the last recap of
all the talks to the Trino blog today! Check out our YouTube channel and the
Trino website to catch up on everything you missed.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>FugueSQL: Interoperable Python and Trino for interactive workloads</title>
      <link href="https://trino.io/blog/2023/07/27/trino-fest-2023-fugue-recap.html" rel="alternate" type="text/html" title="FugueSQL: Interoperable Python and Trino for interactive workloads" />
      <published>2023-07-27T00:00:00+00:00</published>
      <updated>2023-07-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/27/trino-fest-2023-fugue-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/27/trino-fest-2023-fugue-recap.html">&lt;p&gt;Fugue may be an unfamiliar name to those in the Trino ecosystem. It’s another
Python tool, a programming model built to enhance interoperability between
Python and SQL. On the Python side of things, it’s a wrapper around common tools
like pandas and Polars that convert code into SQL for high-performance,
large-scale query execution. So why are we talking about it at Trino Fest?
Because Fugue recently launched an integration with Trino, enabling you to write
Python code that can be converted to SQL to run on a high-powered Trino backend.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/aKhI1Phfn-o&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Though Trino users are quite familiar with SQL, it does present some challenges.
Iterating on a SQL query and improving it can be difficult, and finding ways to
optimize or speed things up can be a challenge that requires sophisticated
external tools or working on hunches. Testing queries, especially incrementally,
has never been super easy, either. Compare that to Python, which does not have
those problems, but has issues of its own. Python, especially at scale, is not
very performant. So it’s natural to try to take the advantages of both, which is
what Fugue is aiming to do.&lt;/p&gt;

&lt;p&gt;After that brief intro into Fugue, the rest of the talk consists of technical
demos of the many various things that you can do with Fugue. This includes
setting a query up, breaking it up into smaller parts, bringing it to pandas,
and demonstrating extensions that are built into Fugue. With all of these
intermediate steps, it becomes easier to unit test queries before sending them
into production, making sure that everything works as expected.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Kevin Kho and Cole Bowden</name>
        </author>
      

      <summary>Fugue may be an unfamiliar name to those in the Trino ecosystem. It’s another Python tool, a programming model built to enhance interoperability between Python and SQL. On the Python side of things, it’s a wrapper around common tools like pandas and Polars that convert code into SQL for high-performance, large-scale query execution. So why are we talking about it at Trino Fest? Because Fugue recently launched an integration with Trino, enabling you to write Python code that can be converted to SQL to run on a high-powered Trino backend.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Fugue.png" />
      
    </entry>
  
    <entry>
      <title>Starburst Galaxy: A romance of many architectures</title>
      <link href="https://trino.io/blog/2023/07/25/trino-fest-2023-datto.html" rel="alternate" type="text/html" title="Starburst Galaxy: A romance of many architectures" />
      <published>2023-07-25T00:00:00+00:00</published>
      <updated>2023-07-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/25/trino-fest-2023-datto</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/25/trino-fest-2023-datto.html">&lt;p&gt;Let’s cut straight to the chase with this lightning talk from Benjamin Jeter, a
data architect, platform manager, and data engineer at Datto. For those that are
not familiar with Datto, they are an American cybersecurity and data backup
company. They’re the leading global provider of security and cloud-based
software solutions purpose-built for Managed Service Providers (MSPs). In
Benjamin’s talk, he goes through some of the considerations and design goals of
a reference architecture pattern that they use and why they chose to use Trino
with Starburst Galaxy.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/K3AlAWB-Gmg&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Datto.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;But you might be wondering: what does Ben mean when he says “reference
architecture”? A reference architecture pattern is a pattern for making
arbitrary data available to end users in a reproducible and modular way. It’s an
opinionated representation of what best practices look like for a given class of
use cases. You can almost think of it as a conceptual tool for thinking
critically about specific patterns through a pragmatic balance of simplicity and
effectiveness. However, it is not something that will work for every use case
and not necessarily the best solution.&lt;/p&gt;

&lt;p&gt;The main design goal that Benjamin had was to facilitate near real-time data
access while using only Trino. In addition, he wanted it to be simple, easy to
understand, flexible, and adaptable. Accomplishing this design goal requires
many steps, such as first having a daily batch transform that transforms JSON
into Iceberg and serve as &lt;a href=&quot;https://www.investopedia.com/terms/t/tplus1.asp&quot;&gt;T-1
data&lt;/a&gt;. Then he created an
unpartitioned external table that is rebuilt every day as part of the daily
batch transform. Using the &lt;a href=&quot;https://docs.starburst.io/starburst-galaxy/sql/great-lakes.html&quot;&gt;Great Lakes
connectivity&lt;/a&gt;
with this table allows Datto to have scan on query semantics, which enables data
access about as real-time as you can get it without a streaming solutions like
Kafka or Kinesis. Benjamin shows how easy it is to design a use case with just a
couple lines of code using Trino with Starburst Galaxy.&lt;/p&gt;

&lt;p&gt;Interested? Check out the video where Benjamin shows the code and explains how
it works!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Benjamin Jeter, Ryan Duan</name>
        </author>
      

      <summary>Let’s cut straight to the chase with this lightning talk from Benjamin Jeter, a data architect, platform manager, and data engineer at Datto. For those that are not familiar with Datto, they are an American cybersecurity and data backup company. They’re the leading global provider of security and cloud-based software solutions purpose-built for Managed Service Providers (MSPs). In Benjamin’s talk, he goes through some of the considerations and design goals of a reference architecture pattern that they use and why they chose to use Trino with Starburst Galaxy.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Datto.png" />
      
    </entry>
  
    <entry>
      <title>Trino optimization with distributed caching on data lakes</title>
      <link href="https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html" rel="alternate" type="text/html" title="Trino optimization with distributed caching on data lakes" />
      <published>2023-07-21T00:00:00+00:00</published>
      <updated>2023-07-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html">&lt;p&gt;By 2025, there will be 100 zetabytes stored in the cloud. That’s
100,000,000,000,000,000,000,000 bytes - a huge, eye-popping number. But only
about 10% of that data is actually used on a regular basis. At Uber, for
example, only 1% of their disk space is used for 50% of the data they access on
any given day. With so much data but such a small percentage being used, it
raises the question: how can we identify frequently-used data and make it more
accessible, efficient, and lower-cost to access?&lt;/p&gt;

&lt;p&gt;Once we have identified that “hot data,” the answer is data caching. By caching
that data in storage, you can reap a ton of benefits: performance gains, lower
costs, less network congestion, and reduced throttling on the storage layer.
Data caching sounds great, but why are we talking about it at a Trino event?
Because &lt;a href=&quot;https://github.com/trinodb/trino/pull/16375&quot;&gt;data caching with Alluxio is coming to Trino&lt;/a&gt;!&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/oK1A5U1WzFc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Alluxio.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;So what are the key features of data caching? The first and foremost is that the
frequently-accessed data gets stored on local SSDs. In the case of Trino, this
means that the Trino worker nodes will store data to reduce latency and decrease
the number of loads from object storage. Even if the worker restarts, it also still has
that data stored. Caching will work on all the data lake connectors, so whether
you’re using Iceberg, Hive, Hudi, or Delta Lake, it’ll be speeding your queries
up. The best part is that once it’s in Trino, all you need to do is enable it,
set three configuration properties, and let the performance improvement speak
for itself. There’s no other change to how queries run or execute, so there’s no
headache or migration needed.&lt;/p&gt;

&lt;p&gt;Hope then gives deeper technical detail on exactly how data caching works. She
highlights a few existing examples of how large-scale companies, Uber and
Shopee, have utilized data caching to reap massive performance gains. Then the
talk is passed off to Beinan, who gives further technical detail,
exploring cache invalidation, how to maximize cache hit rate, cluster
elasticity, cache storage efficiency, and data consistency. He also explores
ongoing work on semantic caching, native/off-heap caching, and distributed
caching, all of which have interesting upsides and benefits.&lt;/p&gt;

&lt;p&gt;Give the full talk a listen if you’re interested, as both Hope and Beinan go
into a lot of great, technical detail that you won’t want to miss out on. And
don’t forget to keep an eye on Trino release notes to see when it’s live!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Hope Wang, Beinan Wang, and Cole Bowden</name>
        </author>
      

      <summary>By 2025, there will be 100 zetabytes stored in the cloud. That’s 100,000,000,000,000,000,000,000 bytes - a huge, eye-popping number. But only about 10% of that data is actually used on a regular basis. At Uber, for example, only 1% of their disk space is used for 50% of the data they access on any given day. With so much data but such a small percentage being used, it raises the question: how can we identify frequently-used data and make it more accessible, efficient, and lower-cost to access? Once we have identified that “hot data,” the answer is data caching. By caching that data in storage, you can reap a ton of benefits: performance gains, lower costs, less network congestion, and reduced throttling on the storage layer. Data caching sounds great, but why are we talking about it at a Trino event? Because data caching with Alluxio is coming to Trino!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Alluxio.png" />
      
    </entry>
  
    <entry>
      <title>Inspecting Trino on ice</title>
      <link href="https://trino.io/blog/2023/07/19/trino-fest-2023-stripe.html" rel="alternate" type="text/html" title="Inspecting Trino on ice" />
      <published>2023-07-19T00:00:00+00:00</published>
      <updated>2023-07-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/19/trino-fest-2023-stripe</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/19/trino-fest-2023-stripe.html">&lt;p&gt;For those unfamiliar, Stripe is an online payment processor that facilitates
online payments for digital-native merchants. They use Trino to facilitate ad
hoc analytics, enable dashboarding, and provide an API for internal services and
data apps to utilize Trino. In Kevin Liu’s session at &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;, he showcases the Trino Iceberg
connector and how it can replace more complex usage to access Iceberg metadata.
He also discusses how Trino is a core part of operations at Stripe.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/PSGuAMVc6-w&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Stripe.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Trino is the foundational infrastructure on which other data apps and services
are built upon. In Kevin’s words, “I call Trino the Swiss army knife in the data
ecosystem.”&lt;/p&gt;

&lt;p&gt;At Stripe, they use Iceberg tables extensively, replacing legacy Hive tables.
But Iceberg isn’t perfect: one problem with Iceberg is reading its metadata from
S3. To work with Iceberg metadata, Stripe developed an internal CLI tool. The
tool requires a privileged internal machine, which is only accessible to
developers. And outputs the result in JSON format, which is difficult to
process, read, and use for further analysis. However, Kevin found that the Trino
Iceberg connector can replace most of the functionality of the Iceberg CLI. The
connector brings Iceberg metadata information to Trino’s powerful analytical
engine and facilitates lightning fast debugging and analysis.&lt;/p&gt;

&lt;p&gt;Unfortunately, there was no way to grab all desired table property information
from the Trino Iceberg connector, because they were using an older version.
Thus, they use the Trino PostgreSQL connector to connect directly to the backend
database of the Hive Metastore, allowing them to inspect table metadata
directly. With the two connectors, they have all the information about the data
warehouse, powering their analysis and meta-analysis of the data and how it’s
used.&lt;/p&gt;

&lt;p&gt;They also use Trino to inspect Iceberg usage patterns. They log every Trino
query using the Trino event listener and store that in another PostgreSQL
database. This gives the full information of every query that has ever run
through Trino, and allows them to perform analysis using historical queries.
Combined with Trino’s built-in query metadata enrichment, this method enables a
multitude of auditing, debugging, and optimization use cases.&lt;/p&gt;

&lt;p&gt;In the future, they plan to use Trino to improve data quality by leveraging it
as a validation framework, to perform Iceberg table maintenance, and to optimize
tables based on historical read patterns.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Kevin Liu, Ryan Duan</name>
        </author>
      

      <summary>For those unfamiliar, Stripe is an online payment processor that facilitates online payments for digital-native merchants. They use Trino to facilitate ad hoc analytics, enable dashboarding, and provide an API for internal services and data apps to utilize Trino. In Kevin Liu’s session at Trino Fest 2023, he showcases the Trino Iceberg connector and how it can replace more complex usage to access Iceberg metadata. He also discusses how Trino is a core part of operations at Stripe.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Stripe.png" />
      
    </entry>
  
    <entry>
      <title>Data mesh implementation using Hive views</title>
      <link href="https://trino.io/blog/2023/07/17/trino-fest-2023-comcast-recap.html" rel="alternate" type="text/html" title="Data mesh implementation using Hive views" />
      <published>2023-07-17T00:00:00+00:00</published>
      <updated>2023-07-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/17/trino-fest-2023-comcast-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/17/trino-fest-2023-comcast-recap.html">&lt;p&gt;At Comcast, data is used in a data mesh ecosystem, with a vision where users can
discover data and request data through a self-service platform. With federation,
various tools, and the ability to create, read, and write data with different
platforms, it’s a full-blown data mesh. So how do you build that? With Trino, of
course, and with the power of Hive views. Tune into the 10-minute lightning talk
that Alejandro gave at Trino Fest to learn more about how Comcast pulled it off.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/ZgcVtPFkKHM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;With various different storage systems, like S3 and MinIO, and users that
want to be able to use a variety of data platforms, including Trino, but also
Databricks and Spark, Comcast needed something to sit between the data and those
platforms. The solution was the Hive CLI and Hive views, which could read from 
all their various forms of storage, and which could be read from all the
user-facing query engines and data platforms with no issues.&lt;/p&gt;

&lt;p&gt;By centralizing data, there was also the upside of easily integrating with
Privacera, which allowed for privacy policies to be implemented without much
issue. Users could request access to the data within the Hive views, and data
owners could approve or reject access as appropriate. Because of the
centralization, it was easy to go very fine-grained with data access rules,
allowing for access control as specific as column-level.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Alejandro Rojas, Cole Bowden</name>
        </author>
      

      <summary>At Comcast, data is used in a data mesh ecosystem, with a vision where users can discover data and request data through a self-service platform. With federation, various tools, and the ability to create, read, and write data with different platforms, it’s a full-blown data mesh. So how do you build that? With Trino, of course, and with the power of Hive views. Tune into the 10-minute lightning talk that Alejandro gave at Trino Fest to learn more about how Comcast pulled it off.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Comcast.png" />
      
    </entry>
  
    <entry>
      <title>DuneSQL - A query engine for blockchain data</title>
      <link href="https://trino.io/blog/2023/07/14/trino-fest-2023-dune.html" rel="alternate" type="text/html" title="DuneSQL - A query engine for blockchain data" />
      <published>2023-07-14T00:00:00+00:00</published>
      <updated>2023-07-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/14/trino-fest-2023-dune</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/14/trino-fest-2023-dune.html">&lt;p&gt;The need to make blockchain data easily accessible has risen over the recent
years due to the popularity of cryptocurrencies, NFTs, and other uses of
blockchains. Dune has made it their mission to make blockchain data more
accessible. Dune is a community data platform for querying public blockchain
data and building beautiful dashboards. They use their own query engine called
DuneSQL, built as extension of Trino, to query blockchain data. In the session,
Miguel and Jonas from Dune talk about the challenges of querying blockchain
data, their transition to Trino, and how DuneSQL is operated. Watch the
recording of the session or keep reading for a recap.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/sCJncarnGdU&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Dune.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;The Dune community data platform is a serverless, open access, community-wide
collaboration portal. Dune experienced some difficulties with blockchain data,
such as processing and ingesting raw data, deserializing and decoding function
calls and arguments, and allowing the community to build abstractions. Their
engine, DuneSQL, is Trino with custom extensions that they created. It runs tens
of thousands of queries that are executed, saved, and re-used each day.&lt;/p&gt;

&lt;p&gt;At first, Dune used PostgreSQL, where they sharded per blockchain and used
vertical scaling. However, they quickly ran into bottleneck issues on storage
size and IOPS (I/O operations per second). Thus, they switched to Apache Spark
with Databricks to allow horizontal scaling and support more blockchains
processing and to support the vast query volume that they had. Unfortunately,
the result was not performant and not interactive enough. In the end, Miguel
says that, “Trino was our choice for performance reasons, for the good
environment and ecosystem, and to fully support our scheme and our datasets.”
Using Trino addressed the performance issues.&lt;/p&gt;

&lt;p&gt;Operating DuneSQL requires modifications and extensions of Trino to suit the
needs of the users and platform as a whole. DuneSQL needs to manage the whole
fleet and the capacity they have, because they use over 4000 CPUs per hour, do
more than 100 billion S3 requests per month, and operate over 10 clusters. To
handle the scheduling and load balancing of these massive operations, DuneSQL
uses query execution services and
&lt;a href=&quot;https://github.com/lyft/presto-gateway&quot;&gt;gateway&lt;/a&gt;. Clusters have a fixed size to
have a predictable capacity and performance. The gateway exposes the clusters to
reduce the blast-radius so failures do not affect other clusters. Even with all
these adjustments, they still have work to do as they plan to optimize the
billions of S3 requests they receive, improve data layout, and implement
sandboxed user defined functions.&lt;/p&gt;

&lt;p&gt;Interested in DuneSQL? Check out the video where Jonas goes over the
specificities and unique characteristics of DuneSQL.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Miguel Filipe, Jonas Irgens Kylling, Ryan Duan</name>
        </author>
      

      <summary>The need to make blockchain data easily accessible has risen over the recent years due to the popularity of cryptocurrencies, NFTs, and other uses of blockchains. Dune has made it their mission to make blockchain data more accessible. Dune is a community data platform for querying public blockchain data and building beautiful dashboards. They use their own query engine called DuneSQL, built as extension of Trino, to query blockchain data. In the session, Miguel and Jonas from Dune talk about the challenges of querying blockchain data, their transition to Trino, and how DuneSQL is operated. Watch the recording of the session or keep reading for a recap.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Dune.png" />
      
    </entry>
  
    <entry>
      <title>Let it snow for Trino</title>
      <link href="https://trino.io/blog/2023/07/12/trino-fest-2023-let-it-snow-recap.html" rel="alternate" type="text/html" title="Let it snow for Trino" />
      <published>2023-07-12T00:00:00+00:00</published>
      <updated>2023-07-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/12/trino-fest-2023-let-it-snow-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/12/trino-fest-2023-let-it-snow-recap.html">&lt;p&gt;In this recap, we can skip right to the exciting part: through the joint efforts
of engineers at ForePaaS and Bloomberg, there is a Snowflake connector coming
to Trino! Though it hasn’t landed yet, it has been tested and run in production
at both companies, and a pull request is open and working its way towards
completion as this blog post goes up. In the talk, Yu and Erik talk about
difficulties in developing the connector, the motivations to make it happen, and
the new features that come as part of it for Trino users to take advantage of.
Sound interesting? Give the talk a listen, or read on for more details.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/kmpO_yM8OAs&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023LetItSnow.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For those unfamiliar, Snowflake is a cloud-based data warehousing and analytics
platform. It offers a great combination of scale, flexibility, and performance,
with the downside of being a proprietary software that is vendor-locked, and in
order to use Snowflake, you must go through Snowflake, Inc. ForePaaS and its
customers store data in Snowflake, but they also store data in many other 
formats and systems, and they rely on Trino to run their analytics. With no
Snowflake connector in Trino, this meant that while they could run analytics and
queries on most data, Trino had a blind spot. They needed to develop a Snowflake
connector in order to see and query 100% of their data. Bloomberg was in a
similar boat, having data in Snowflake, using Trino for analytics, and needing a
way to join those two together. With a shared need, ForePaaS and Bloomberg
joined forced and made the connector happen.&lt;/p&gt;

&lt;p&gt;The connector has been in use at both companies for some time, and it comes with
the full feature set one would expect from a Trino connector. With the connector,
you can query Snowflake directly from Trino, taking advantage of Trino’s
lightning-fast speeds and the underlying features of Snowflake with no issue.&lt;/p&gt;

&lt;p&gt;Curious to see more? For the rest of the talk, Erik Anderson at Bloomberg gives
a demo of the connector in action. Give the talk a watch, and you can check out
progress on how adding the connector to Trino is coming along on
&lt;a href=&quot;https://github.com/trinodb/trino/pull/17909&quot;&gt;the pull request contributing it&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Yu Teng, Erik Anderson, Cole Bowden</name>
        </author>
      

      <summary>In this recap, we can skip right to the exciting part: through the joint efforts of engineers at ForePaaS and Bloomberg, there is a Snowflake connector coming to Trino! Though it hasn’t landed yet, it has been tested and run in production at both companies, and a pull request is open and working its way towards completion as this blog post goes up. In the talk, Yu and Erik talk about difficulties in developing the connector, the motivations to make it happen, and the new features that come as part of it for Trino users to take advantage of. Sound interesting? Give the talk a listen, or read on for more details.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/ForePaaS%20and%20Bloomberg.png" />
      
    </entry>
  
    <entry>
      <title>Redis &amp; Trino - Real-time indexed SQL queries (new connector)</title>
      <link href="https://trino.io/blog/2023/07/10/trino-fest-2023-redis.html" rel="alternate" type="text/html" title="Redis &amp; Trino - Real-time indexed SQL queries (new connector)" />
      <published>2023-07-10T00:00:00+00:00</published>
      <updated>2023-07-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/10/trino-fest-2023-redis</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/10/trino-fest-2023-redis.html">&lt;p&gt;Ever since the pandemic, it has become clear that the need for a digital first
economy is becoming more and more necessary. As Redis’ Field CTO Allen Terleto
said during their talk from &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;, “In a digital first economy, data is the
lifeblood of the organization, which makes the databases the heart of enterprise
architectures”. Redis, a popular open source project, is a distributed in-memory
key–value database. It includes a cache, message broker, and optional
durability. In his talk, Allen demonstrates Redis’ new connector for Trino. It
can push down advanced queries and aggregations while leveraging Redis’ unique
in-memory secondary indexing. As a result, performance with the new connector is
much higher.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/JjBtZ26IHYk&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Redis is an open source, in-memory, NoSQL database that natively supports a
variety of data structures. Redis is designed for utmost performance and high
throughput use cases across different types of workloads. Redis is widely known
for being the fastest data store in the market with sub millisecond performance,
its ease of use, and being a multi-model database. Redis is able to map
relational tables to a key-value database by adding a key-value pair as a hash
attribute for each column. However, how can you search for a certain key in a
way that scales well in high throughput databases? Redis has a unique way to
deal with this problem: secondary indexing and Redis Search.&lt;/p&gt;

&lt;p&gt;Redis Search enables secondary indexing and full-text search, which allows Redis
to support many features such as multi-field queries, aggregations, exact phrase
matching, numeric filtering, geo-filtering, and vector similarity semantic
search on top of text queries. As Allen says, “Redis Search will be at the heart
of our new integration with Trino and game-changing better performance at scale
to the existing Redis Trino connector”. In addition, Redis supports a native
data model for JSON documents, allowing you to store, update, and retrieve JSON
values in a Redis database like other Redis data types. It also works with Redis
Search to let you index and query JSON documents.&lt;/p&gt;

&lt;p&gt;The syntax for Redis Search is a bit different from traditional SQL syntax, so
Redis is introducing a quicker and more reliable Redis-Trino connector that lets
you easily integrate with visualizations frameworks and platforms that support
Trino. The connector is open source and publicly available on their public
GitHub. In addition, it will be contributed directly to the Trino project.&lt;/p&gt;

&lt;p&gt;Want to see Redis in action? Check out the video where Julien does a demo on how
you can load data from some file system, relational database, or data warehouse
and query it without writing a single line of code.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Allen Terleto, Julien Ruaux, Ryan Duan</name>
        </author>
      

      <summary>Ever since the pandemic, it has become clear that the need for a digital first economy is becoming more and more necessary. As Redis’ Field CTO Allen Terleto said during their talk from Trino Fest 2023, “In a digital first economy, data is the lifeblood of the organization, which makes the databases the heart of enterprise architectures”. Redis, a popular open source project, is a distributed in-memory key–value database. It includes a cache, message broker, and optional durability. In his talk, Allen demonstrates Redis’ new connector for Trino. It can push down advanced queries and aggregations while leveraging Redis’ unique in-memory secondary indexing. As a result, performance with the new connector is much higher.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Redis.png" />
      
    </entry>
  
    <entry>
      <title>Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal indexing subsystem</title>
      <link href="https://trino.io/blog/2023/07/07/trino-fest-2023-onehouse-recap.html" rel="alternate" type="text/html" title="Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal indexing subsystem" />
      <published>2023-07-07T00:00:00+00:00</published>
      <updated>2023-07-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/07/trino-fest-2023-onehouse-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/07/trino-fest-2023-onehouse-recap.html">&lt;p&gt;Optimizing data access and query performance is crucial to building low-latency
applications and running analytics. Even with the modern data lakehouse designed
to be as efficient and performant as possible, there are a number of bottlenecks
that can slow things down and plenty of challenges to overcome. Nadine and Sagar
explored this at Trino Fest, introducing us to multi-modal indexing and the
metadata table in Hudi, how they work, and how leveraging them with Trino can
unlock queries faster than ever before.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/IiDOmAEOXUM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Onehouse.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;When you’re building large-scale data-based applications, bottlenecks are
inevitable. Finding ways to address these bottlenecks and optimizing your
platform to avoid them is going to be a huge cost, so it pays off to know your
requirements. In the same vein, if you know the types of services and features
you need to effectively scale, you can build with them in mind from the ground
up. Hudi has a couple key features you might be interested in that aren’t
present in all lakehouses:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Write indexing, speeding up and optimizing inserts and upserts&lt;/li&gt;
  &lt;li&gt;Automated table services, which handle clustering, cleaning, compacting,
and metadata indexing without any need for manual orchestration or overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nadine also goes on a deep dive into exactly how the Hudi table format works,
but emphasizes that these extra features elevate it to being an entire platform,
not just a table format.&lt;/p&gt;

&lt;p&gt;From there, Nadine passes things off to Sagar, who does an explanation of the
multi-modal indexing sub-system in Hudi, which features a scalable metadata
table, different types of indexes, and an async indexer. All of these features
minimize tradeoffs while maximizing performance, helping you read and write data
faster than ever. And with Trino’s Hudi connector, the Trino coordinator is able
to read the feature-rich Hudi metadata to more effectively delegate workers,
leveraging that speed as the best-in-class query engine for running analytics on
your data stored in Hudi.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Nadine Farah, Sagar Sumit, Cole Bowden</name>
        </author>
      

      <summary>Optimizing data access and query performance is crucial to building low-latency applications and running analytics. Even with the modern data lakehouse designed to be as efficient and performant as possible, there are a number of bottlenecks that can slow things down and plenty of challenges to overcome. Nadine and Sagar explored this at Trino Fest, introducing us to multi-modal indexing and the metadata table in Hudi, how they work, and how leveraging them with Trino can unlock queries faster than ever before.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Onehouse.png" />
      
    </entry>
  
    <entry>
      <title>49: Trino, Ibis, and wrangling Python in the SQL ecosystem</title>
      <link href="https://trino.io/episodes/49.html" rel="alternate" type="text/html" title="49: Trino, Ibis, and wrangling Python in the SQL ecosystem" />
      <published>2023-07-06T00:00:00+00:00</published>
      <updated>2023-07-06T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/49</id>
      <content type="html" xml:base="https://trino.io/episodes/49.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guest&quot;&gt;Guest&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/cpcloud&quot;&gt;Phillip Cloud&lt;/a&gt;, Principal Engineer at Voltron
Data. &lt;a href=&quot;https://www.youtube.com/@cpcloud&quot;&gt;Check out his YouTube channel!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-419-420&quot;&gt;Releases 419-420&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-419.html&quot;&gt;Trino 419&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array_histogram&lt;/code&gt; function.&lt;/li&gt;
  &lt;li&gt;Faster reading and writing of Parquet data.&lt;/li&gt;
  &lt;li&gt;Support for Nessie catalog in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-420.html&quot;&gt;Trino 420&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Underscores in numeric literals (e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1_000_000&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Hexadecimal, binary and octal numeric literals (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x1a&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0b1010&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0o12&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Support for comments on view columns in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RENAME COLUMN&lt;/code&gt; in MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Support for mixed case table names in Druid connector.&lt;/li&gt;
  &lt;li&gt;Faster queries when statistics are unavailable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;question-of-the-episode-what-is-ibis&quot;&gt;Question of the episode: What is Ibis?&lt;/h2&gt;

&lt;p&gt;Taken straight from &lt;a href=&quot;https://ibis-project.org/concept/why_ibis/&quot;&gt;the Ibis website&lt;/a&gt;,
Ibis is a dataframe interface to execution engines with support for 15+
backends (including Trino!). Ibis doesn’t replace your existing execution
engine, it extends it with powerful abstractions and intuitive syntax.&lt;/p&gt;

&lt;p&gt;For those who love doing all their data-related work in Python, this allows you
to write Python code that leverages the speed and power of Trino without needing
to become a SQL master. For the die-hard SQL users out there,
&lt;a href=&quot;https://ibis-project.org/tutorial/ibis-for-sql-users/&quot;&gt;they have a guide on Ibis for SQL users&lt;/a&gt;
that explains how it fully replaces SQL with Python code that is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Type-checked and validated as you go.&lt;/li&gt;
  &lt;li&gt;Easier to write. Pythonic function calls with tab completion in IPython.&lt;/li&gt;
  &lt;li&gt;More composable. Break complex queries down into easier-to-digest pieces.&lt;/li&gt;
  &lt;li&gt;Easier to reuse. Mix and match Ibis snippets to create expressions tailored
for your analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if you’ve been writing SQL queries since day 1 and swear by it, opening the
door to using Python for analytics creates many new possibilities, widens the
possible talent pool you can work with, and gives you an entire second ecosystem
to integrate with.&lt;/p&gt;

&lt;p&gt;And ultimately, at the end of the day, the idea is that you get the ease of
writing Python code with the power and performance of a blazing fast SQL engine.
&lt;a href=&quot;https://youtu.be/pAWseFS4eAk&quot;&gt;You get the best of both worlds&lt;/a&gt;, and using Ibis
doesn’t lock you out of rolling up your sleeves and writing some SQL when a
situation calls for it.&lt;/p&gt;

&lt;h3 id=&quot;and-you-dont-need-to-learn-different-sql-dialects&quot;&gt;And you don’t need to learn different SQL dialects&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/49/standards_2x.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Trino more or less adheres to ANSI SQL, but it implements some ANSI features
that are rarely seen in other query engines, and other query engines choose to
deviate in a variety of ways. This can be a headache if you’re migrating to
Trino, as queries need to be re-written, re-structured, and tested to make sure
they return the same results. If you got set up with Ibis, first, it would do
that thinking for you, and a Python query could be converted to whatever dialect
of SQL you need without any issue. It can save time, effort, headaches, or a
sense of being locked into a specific SQL dialect, freeing you up to move
between query engines without any pain points… because of course, you want to
move to Trino, which is the best query engine.&lt;/p&gt;

&lt;p&gt;It also needs pointing out that this allows you to federate your queries while
you federate your queries.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-converting-python-to-sql&quot;&gt;Concept of the episode: Converting Python to SQL&lt;/h2&gt;

&lt;p&gt;Take some Python like so:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ibis&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;movies&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ibis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;examples&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ml_latest_small_movies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fetch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rating_by_year&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;movies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg_rating&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rating_by_year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;order_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rating_by_year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;desc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And Ibis can automatically turn it into SQL that executes on Trino:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;con&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg_rating&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;movies&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Obviously, this example is lightweight, but as queries grow more complex and
sophisticated, the conversion becomes more and more worthwhile. And we mentioned
that the Python code is easier to re-use, but it really is - if you want to run
a similar query in conjunction with the query above, those &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;movies&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rating_by_year&lt;/code&gt; variables still exist, and writing some code to leverage them
is a lot easier and more intuitive than setting up SQL sub-queries and aliases.&lt;/p&gt;

&lt;h3 id=&quot;questions-for-phillip&quot;&gt;Questions for Phillip&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Why is it called Ibis?&lt;/li&gt;
  &lt;li&gt;How much of a normal SQL workload do you think could be handled and run by
Ibis?&lt;/li&gt;
  &lt;li&gt;How much can Ibis optimize SQL queries for performance?&lt;/li&gt;
  &lt;li&gt;Which SQL dialect has been the worst to deal with?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-15026-support-insert-in-google-sheets-connector&quot;&gt;PR of the episode: #15026: Support INSERT in Google Sheets connector&lt;/h2&gt;

&lt;p&gt;Google Sheets is one of our not-as-talked-about connectors in Trino, but it
still sees use and community updates, and we want to give that a shoutout in
today’s Trino Community Broadcast. &lt;a href=&quot;https://github.com/trinodb/trino/pull/15026&quot;&gt;#15026&lt;/a&gt;
from &lt;a href=&quot;https://github.com/sbernauer&quot;&gt;Sebastien Bernauer&lt;/a&gt; adds &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; support to
the connector, so now you can read &lt;em&gt;and&lt;/em&gt; write from Google Sheets in Trino,
empowering the world of SQL-on-spreadsheets.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-episode-477-on-trinoio-add-mateusz-gajewski-to-maintainer-list&quot;&gt;PR of the episode: #477 on trino.io: Add Mateusz Gajewski to maintainer list&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino.io/pull/477&quot;&gt;We’ve added another maintainer to Trino!&lt;/a&gt;
We just spend an episode introducing Manfred and James Petty as maintainers, and
Mateusz is right behind them after years of effort helping Trino as a
contributor and reviewer.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;Trino Fest wrapped up a few weeks ago, and we’re publishing recaps of all the
talks to the Trino blog! Keep an eye on our YouTube channel and the Trino
website to catch up on everything you missed.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>AWS Athena (Trino) in the cybersecurity space</title>
      <link href="https://trino.io/blog/2023/07/05/trino-fest-2023-arcticwolf.html" rel="alternate" type="text/html" title="AWS Athena (Trino) in the cybersecurity space" />
      <published>2023-07-05T00:00:00+00:00</published>
      <updated>2023-07-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/05/trino-fest-2023-arcticwolf</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/05/trino-fest-2023-arcticwolf.html">&lt;p&gt;Arctic Wolf Networks, a cybersecurity company that provides security monitoring
to cyber threats, is one of the companies that have recently switched to using
AWS Athena as a new and efficient service to query their data using Trino. AWS
Athena is a serverless, interactive analytics service built on open-source
frameworks that runs on Trino, supporting open table and file formats and
providing a simplified, flexible way to analyze petabytes of data where it
lives. Senior software developer Anas Shakra from Arctic Wolf Networks gave a
talk at &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;
detailing their switch to AWS Athena and how “queries that took hours with old
solution now take around a minute today”. Tune in to the talk or you can read
the recap!&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/WCuJaW7zC8k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;At Arctic Wolf, data access use-cases fall under three categories: investigations,
compliance, and customer self-serve platform. The process of preparing the data
follows an established pattern of starting with datastore, performing an
operation to filter or transform the data, and then outputting the data in some
format like a CSV or JSON, depending on the client needs. Arctic Wolf’s custom
legacy service was unable to match the growing service demand and had four main
problems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Optimized for breadth over depth&lt;/li&gt;
  &lt;li&gt;Struggles to handle growing service demand&lt;/li&gt;
  &lt;li&gt;Proprietary query language&lt;/li&gt;
  &lt;li&gt;Complicated design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This compelled Anas’ team to find a different and improved service: Trino as
provided by AWS Athena.&lt;/p&gt;

&lt;p&gt;They had four main objectives for the new service: defined access patterns,
performant at scale, user-friendly, and deterministic pricing. AWS Athena
satisfied these objectives, while also providing numerous benefits such as using
a powerful query engine, being purposefully built for large datasets, using SQL
syntax, and having a clear pricing structure. However, with these benefits come
some drawbacks for Athena. These includes being subject to quota limits, having
suboptimal file sizes for their system, and being unable to control access
sufficiently. Anas addresses this by using log queries that resolves these three
main impediments. As next step, Anas is considering switching to a self-managed
Trino deployment for more control with the same performance gains.&lt;/p&gt;

&lt;p&gt;Want to learn more about log queries that they use? Check out Anas’ explanation
in the video!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Anas Shakra, Ryan Duan</name>
        </author>
      

      <summary>Arctic Wolf Networks, a cybersecurity company that provides security monitoring to cyber threats, is one of the companies that have recently switched to using AWS Athena as a new and efficient service to query their data using Trino. AWS Athena is a serverless, interactive analytics service built on open-source frameworks that runs on Trino, supporting open table and file formats and providing a simplified, flexible way to analyze petabytes of data where it lives. Senior software developer Anas Shakra from Arctic Wolf Networks gave a talk at Trino Fest 2023 detailing their switch to AWS Athena and how “queries that took hours with old solution now take around a minute today”. Tune in to the talk or you can read the recap!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/ArcticWolf.png" />
      
    </entry>
  
    <entry>
      <title>Ibis: Because SQL is everywhere and so is Python</title>
      <link href="https://trino.io/blog/2023/07/03/trino-fest-2023-ibis.html" rel="alternate" type="text/html" title="Ibis: Because SQL is everywhere and so is Python" />
      <published>2023-07-03T00:00:00+00:00</published>
      <updated>2023-07-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/07/03/trino-fest-2023-ibis</id>
      <content type="html" xml:base="https://trino.io/blog/2023/07/03/trino-fest-2023-ibis.html">&lt;p&gt;The PyData stack has been described as “unreasonably effective,” empowering its
users to glean insights and analyze moderate amounts of data with a high level
of flexibility and excellent visualization. The large-scale, production data
stack using a query engine like Trino sits on the other side of the world,
capable of handling petabytes and exabytes, but perhaps not integrating as
seamlessly with the Python ecosystem as one would hope. SQL has been a means of
bridging this gap, but we’ve now got an exciting solution to bridge it even
better: Ibis.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/JMUtPl-cMRc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Ibis.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A major problem with bridging the gap between Python and SQL engines has been
the lack of standardization in SQL. Though Trino prides itself on being
ANSI-compliant and many other SQL dialects strive to be similar, the reality is
that every SQL engine is different, and a complicated SQL query will error out
or return different results based on what engine you’re using. So if you want to
convert some Python code to SQL, the question is… which SQL? If you’re doing
your data analysis in Python because you prefer to use it, spending time
scratching your head and trying to work out a SQL conversion can be frustrating,
time-consuming, and painful. But SQL is everywhere, and for large, performant,
efficient queries, you may need a SQL engine like Trino.&lt;/p&gt;

&lt;p&gt;Enter Ibis, a lightweight Python library for “data wrangling.” It can easily
convert your Python code into SQL queries for 16 different engines, including
Trino. With Ibis, you can leverage the ease of writing Python code with the
power and performance of running queries in Trino, getting the best of both
worlds in both the Python and SQL ecosystems. Want to learn more? Check out
&lt;a href=&quot;https://ibis-project.org/&quot;&gt;the Ibis project website&lt;/a&gt;, give the talk a listen,
and tune into the Trino Community Broadcast on July 6th, where we’ll be going
into even more detail about Ibis.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Phillip Cloud, Cole Bowden</name>
        </author>
      

      <summary>The PyData stack has been described as “unreasonably effective,” empowering its users to glean insights and analyze moderate amounts of data with a high level of flexibility and excellent visualization. The large-scale, production data stack using a query engine like Trino sits on the other side of the world, capable of handling petabytes and exabytes, but perhaps not integrating as seamlessly with the Python ecosystem as one would hope. SQL has been a means of bridging this gap, but we’ve now got an exciting solution to bridge it even better: Ibis.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Ibis.png" />
      
    </entry>
  
    <entry>
      <title>CDC patterns in Apache Iceberg</title>
      <link href="https://trino.io/blog/2023/06/30/trino-fest-2023-apacheiceberg.html" rel="alternate" type="text/html" title="CDC patterns in Apache Iceberg" />
      <published>2023-06-30T00:00:00+00:00</published>
      <updated>2023-06-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/30/trino-fest-2023-apacheiceberg</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/30/trino-fest-2023-apacheiceberg.html">&lt;p&gt;Have you ever wanted to keep your data in a table and have an efficient way to
interact with them? Iceberg, an open standard table format, is
exactly what you need. One of the great and unique features of the Iceberg
table format is its support for change data capture (CDC). Co-creator of
Apache Iceberg, Ryan Blue, presented at &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt; this past week detailing the CDC support
and the trade-offs between different patterns that can be used for writing
CDC streams into Iceberg tables.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/GM7EvRc7_is&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Iceberg.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;To begin, what is CDC and why should you use it? CDC is the idea that when
relational or transactional tables are modified, you emit an update stream.
This enables you to keep copies in sync by capturing changes to tables as
they happen. As Ryan states, “[CDC] is very lightweight on the source
database … rather than being super careful with what we run on the database,
what we want to do is just make a copy of it very easily and maintain that
copy.” Ryan continues giving an example of a bank using a transactional table
in Iceberg to offer some context on what’s going on.&lt;/p&gt;

&lt;p&gt;Although CDC has many advantages, there are also some problems that make it
difficult:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lower latency means more work&lt;/li&gt;
  &lt;li&gt;Write amplification - the work necessary to balance the trade-offs between
efficiency at write time and efficiency at read time&lt;/li&gt;
  &lt;li&gt;Batch writes with double update and possible inconsistency&lt;/li&gt;
  &lt;li&gt;Read requirements with the different types of deletes in a table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these types of problems, the importance of the trade-offs between the
different patterns rise due to the need for utmost efficiency. The first
trade-offs that Ryan talks about are the storage trade-offs between using direct
writes and a change log table, which is considered the most important and often
overlooked decision. The next trade-offs are in regards to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; pattern’s
choice of lazy merge (merge-on-read) or eager merge (copy-on-write). In
addition, the commit frequency trade-offs have different benefits depending on if you
prefer it to be faster or slower. The change log pattern and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; pattern both
have benefits you may want, so Ryan suggests using a hybrid version of both that
may give you what you want from both patterns. With Iceberg, you have the choice and the
different CDC patterns can be supported for you to adjust your usage to your
specific needs. Check out the video and review the slides for more details!&lt;/p&gt;

&lt;p&gt;Want to read more about CDC? Check out some of Ryan Blue’s blog posts:
&lt;a href=&quot;https://tabular.io/blog/hello-world-of-cdc/&quot;&gt;Hello, World of CDC!&lt;/a&gt; and &lt;a href=&quot;https://tabular.io/blog/cdc-data-gremlins/&quot;&gt;CDC
Data Gremlins&lt;/a&gt;!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Ryan Blue, Ryan Duan</name>
        </author>
      

      <summary>Have you ever wanted to keep your data in a table and have an efficient way to interact with them? Iceberg, an open standard table format, is exactly what you need. One of the great and unique features of the Iceberg table format is its support for change data capture (CDC). Co-creator of Apache Iceberg, Ryan Blue, presented at Trino Fest 2023 this past week detailing the CDC support and the trade-offs between different patterns that can be used for writing CDC streams into Iceberg tables.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/ApacheIceberg.png" />
      
    </entry>
  
    <entry>
      <title>Zero-cost reporting</title>
      <link href="https://trino.io/blog/2023/06/28/trino-fest-2023-starburst-recap.html" rel="alternate" type="text/html" title="Zero-cost reporting" />
      <published>2023-06-28T00:00:00+00:00</published>
      <updated>2023-06-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/28/trino-fest-2023-starburst-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/28/trino-fest-2023-starburst-recap.html">&lt;p&gt;Let’s say you have some data. Maybe it’s in a spreadsheet, a CSV file, a
relational database, or multiple terabytes of data in an S3 bucket. You need
to run SQL queries on this data, and you’d like to share those results with your
teammates, coworkers, and partner teams, but you want to do it in a way that
allows everyone to view those results on-demand, on the web, and with the latest
results without the need for any manual effort on your part.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/586qvEyuO_U&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;There are a lot of tools that might be able to do this for you, but whatever you
choose, you’ll need to spend time or money to set it up, and you don’t want to
spend a lot. With so many options, there’s the possibility of getting stuck in
analysis paralysis, and trying to find the best way forward may leave you
stymied. Jan Waś from Starburst has a suggestion: keep it simple with Trino,
plaintext files, Git, and GitHub actions, and you can set it all up for free.&lt;/p&gt;

&lt;p&gt;To start, why put results into plaintext files? With markdown, files are both
human-legible and machine-readable. By saving queries in normal files, it’s easy
to see and edit those queries. You can commit your queries and results to Git,
and then you can push them to a service like GitHub, where those files will be
even more readable thanks to the web UI. Then, once on GitHub, you can use the
power of actions to re-run the queries, update your results on a schedule, and
keep things up to date for teammates to view via GitHub Pages. Sound neat? Check
out the talk to see how Jan does it!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Jan Waś, Cole Bowden</name>
        </author>
      

      <summary>Let’s say you have some data. Maybe it’s in a spreadsheet, a CSV file, a relational database, or multiple terabytes of data in an S3 bucket. You need to run SQL queries on this data, and you’d like to share those results with your teammates, coworkers, and partner teams, but you want to do it in a way that allows everyone to view those results on-demand, on the web, and with the latest results without the need for any manual effort on your part.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Starburst.png" />
      
    </entry>
  
    <entry>
      <title>Anomaly detection for Salesforce’s production data using Trino</title>
      <link href="https://trino.io/blog/2023/06/26/trino-fest-2023-salesforce.html" rel="alternate" type="text/html" title="Anomaly detection for Salesforce’s production data using Trino" />
      <published>2023-06-26T00:00:00+00:00</published>
      <updated>2023-06-26T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/26/trino-fest-2023-salesforce</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/26/trino-fest-2023-salesforce.html">&lt;p&gt;Rolling into our next presentation from &lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt;, we’re excited to bring you
Tuli Navas and Geeta Shankar’s talk from the Performance Engineering Team at
Salesforce. They provide numerous reasons for why they need Trino and
further explain how it is essential for anomaly detection in
their data. It’s an insightful talk about using a query engine to ensure data
quality and how switching to Trino has massively improved their performance.
You definitely don’t want to miss it.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/nFuqpb2GjVI&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Salesforce.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Salesforce provides customer relationship management software and applications
focused on sales, customer service, marketing automation, e-commerce, analytics,
and application development. They host hundreds of thousands of customers that
generate millions of transactions per day. For a company of this size, they
need a query engine that is fast and efficient. During the talk, Tuli made it
clear how much Salesforce relies on Trino, stating, “Trino has been a one-stop
shop for analytics.” Trino is the perfect solution for them, as Tuli mentions,
“Because of how well Trino scales and how efficiently it has been able to
process even the most gnarly looking queries.” It allows them to do everything
they need.&lt;/p&gt;

&lt;p&gt;In addition, Trino has helped Salesforce get more value from their production
logging data by accelerating their access to it, speeding up their decision
making. For years, they used Splunk for all their production data, but after
switching to Trino, they have had numerous improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reducing their team’s analytics cost&lt;/li&gt;
  &lt;li&gt;Improving their cost-to-serve&lt;/li&gt;
  &lt;li&gt;Improving the time it takes to run the same query by 194%&lt;/li&gt;
  &lt;li&gt;Providing an SLA of 20-minute latency on all production logs&lt;/li&gt;
  &lt;li&gt;Retaining and accessing data up to 2 years compared to Splunk’s 30 days&lt;/li&gt;
  &lt;li&gt;Reducing the number of queries needed, which creates a smaller footprint&lt;/li&gt;
  &lt;li&gt;Creating tables and views for temporary data storage and analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this, they use specific heuristics to create an anomaly detection framework
with a very quick response time that they are able to constantly observe. This
also allows them to monitor customer behavior efficiently, allowing them to
respond to any urgent changes quickly. In the future, they plan to expand and
ramp up their usage of Trino throughout their teams.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Tuli Nivas, Geeta Shankar, Ryan Duan</name>
        </author>
      

      <summary>Rolling into our next presentation from Trino Fest 2023, we’re excited to bring you Tuli Navas and Geeta Shankar’s talk from the Performance Engineering Team at Salesforce. They provide numerous reasons for why they need Trino and further explain how it is essential for anomaly detection in their data. It’s an insightful talk about using a query engine to ensure data quality and how switching to Trino has massively improved their performance. You definitely don’t want to miss it.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Salesforce.png" />
      
    </entry>
  
    <entry>
      <title>Trino for lakehouses, data oceans, and beyond</title>
      <link href="https://trino.io/blog/2023/06/22/trino-fest-2023-keynote-recap.html" rel="alternate" type="text/html" title="Trino for lakehouses, data oceans, and beyond" />
      <published>2023-06-22T00:00:00+00:00</published>
      <updated>2023-06-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/22/trino-fest-2023-keynote-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/22/trino-fest-2023-keynote-recap.html">&lt;p&gt;&lt;a href=&quot;/blog/2023/06/20/trino-fest-2023-recap.html&quot;&gt;Trino Fest 2023&lt;/a&gt; got off to a
bang, as Trino co-creator and maintainer Martin Traverso gave an update on all
the amazing things that have happened to Trino since
&lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit last year&lt;/a&gt;. He
also provided some insight into what’s coming down the pipeline for Trino, with
a brief look at the project’s roadmap. You can watch the recording of the talk
if you want to see for yourself, or you can read on for the highlights.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/SJ1h-I7HoII&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-fest-2023/TrinoFest2023Keynote.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;It’s only been about 7 months since Trino Summit in 2022, but Trino moves
quickly. In the words of Martin, “the project is on fire” and “is as active as
it’s ever been,” leaving us a lot to catch up to since then:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;16 releases and 2,250 commits&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/episodes/47.html&quot;&gt;Two new maintainers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Several new table functions&lt;/li&gt;
  &lt;li&gt;Simplified configuration and improved performance for fault-tolerant execution&lt;/li&gt;
  &lt;li&gt;Better support for schema evolution and lakehouse migration&lt;/li&gt;
  &lt;li&gt;45 bullet points worth of performance improvements&lt;/li&gt;
  &lt;li&gt;Tracing with OpenTelemetry&lt;/li&gt;
  &lt;li&gt;An improved Python client and dbt Cloud support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And keep in mind that these are the highlights of the highlights! In the talk,
Martin goes into depth on all of the above, making it a worthwhile watch or
listen. There’s also a lot to look forward to, which you’ll hear more about as
they roll out in the coming months:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;SQL 2023, including enhancements to JSON functions and numeric literals&lt;/li&gt;
  &lt;li&gt;A new Snowflake connector and an improved Redis connector&lt;/li&gt;
  &lt;li&gt;Java 21&lt;/li&gt;
  &lt;li&gt;Project Hummingbird, the ongoing effort to incrementally make Trino faster
than ever before&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Cole Bowden</name>
        </author>
      

      <summary>Trino Fest 2023 got off to a bang, as Trino co-creator and maintainer Martin Traverso gave an update on all the amazing things that have happened to Trino since Trino Summit last year. He also provided some insight into what’s coming down the pipeline for Trino, with a brief look at the project’s roadmap. You can watch the recording of the talk if you want to see for yourself, or you can read on for the highlights.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/Keynote.png" />
      
    </entry>
  
    <entry>
      <title>Trino Fest 2023 recap</title>
      <link href="https://trino.io/blog/2023/06/20/trino-fest-2023-recap.html" rel="alternate" type="text/html" title="Trino Fest 2023 recap" />
      <published>2023-06-20T00:00:00+00:00</published>
      <updated>2023-06-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/20/trino-fest-2023-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/20/trino-fest-2023-recap.html">&lt;p&gt;Last week we held Trino Fest, and it kept us all so busy, we forgot to spend
time chilling by the lakehouse! Great demos, amazing announcements, new plugins,
and use cases reached our active audience. Thanks go to our event host and
organizer &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;, to our sponsors
&lt;a href=&quot;https://aws.amazon.com/&quot;&gt;AWS&lt;/a&gt; and &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;Alluxio&lt;/a&gt;, to our
many well-prepared speakers, and to our great live audience. Now you get a
chance to catch up on anything you missed.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.starburst.io/&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/starburst-small.png&quot; title=&quot; Starburst, event host and organizer &quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/alluxio-small.png&quot; title=&quot;Alluxio, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
    &lt;div class=&quot;col-sm&quot;&gt;
      &lt;a href=&quot;https://aws.amazon.com/&quot;&gt;
        &lt;img src=&quot;https://trino.io/assets/images/logos/aws-small.png&quot; title=&quot;AWS, event sponsor&quot; /&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In the weeks leading up to the event, we published numerous blog posts, and
racked up great interest in the Trino community and beyond. Over 1100
registrations blew away our numbers from last year. More importantly, during the
two half-days of the event, we had over 560 attendees watching live and
participating in the busy chat.&lt;/p&gt;

&lt;h2 id=&quot;sessions&quot;&gt;Sessions&lt;/h2&gt;

&lt;p&gt;If you could not attend every session, or if you missed out on attending
completely, then we’ve got great news for you! You still  have a chance to learn
from the presentations and the experience and knowledge of our speakers.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/22/trino-fest-2023-keynote-recap.html&quot;&gt;Trino for lakehouses, data oceans, and beyond&lt;/a&gt;
presented by Martin Traverso, co-creator of Trino and CTO at
&lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/26/trino-fest-2023-salesforce.html&quot;&gt;Anomaly detection for Salesforce’s production data using
Trino&lt;/a&gt; presented by Geeta Shankar and Tuli Nivas
from &lt;a href=&quot;https://www.salesforce.com/&quot;&gt;Salesforce&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/28/trino-fest-2023-starburst-recap.html&quot;&gt;Zero-cost reporting&lt;/a&gt; presented by Jan Waś from
&lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/06/30/trino-fest-2023-apacheiceberg.html&quot;&gt;CDC patterns in Apache Iceberg&lt;/a&gt; presented by Ryan
Blue from &lt;a href=&quot;https://tabular.io/&quot;&gt;Tabular&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/03/trino-fest-2023-ibis.html&quot;&gt;Ibis: Because SQL is everywhere and so is Python&lt;/a&gt;
presented by Phillip Cloud from &lt;a href=&quot;https://voltrondata.com/&quot;&gt;Voltron Data&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/05/trino-fest-2023-arcticwolf.html&quot;&gt;AWS Athena (Trino) in the cybersecurity space&lt;/a&gt;
presented by Anas Shakra from &lt;a href=&quot;https://arcticwolf.com/&quot;&gt;Artic Wolf&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/07/trino-fest-2023-onehouse-recap.html&quot;&gt;Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal
indexing subsystem&lt;/a&gt;
presented by Nadine Farah and  Sagar Sumit from &lt;a href=&quot;https://www.onehouse.ai/&quot;&gt;OneHouse&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/10/trino-fest-2023-redis.html&quot;&gt;Redis &amp;amp; Trino - Real-time indexed SQL queries (new
connector)&lt;/a&gt; presented by Allen Terleto and
Julien Ruaux from &lt;a href=&quot;https://redis.com/&quot;&gt;Redis&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/12/trino-fest-2023-let-it-snow-recap.html&quot;&gt;Let it SNOW for Trino&lt;/a&gt;
presented by Erik Anderson from &lt;a href=&quot;https://www.bloomberg.com/company/values/tech-at-bloomberg/open-source/projects/&quot;&gt;Bloomberg&lt;/a&gt;
and Yu Teng from &lt;a href=&quot;https://www.ovhcloud.com/en-ie/public-cloud/data-platform/&quot;&gt;ForePaaS&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/14/trino-fest-2023-dune.html&quot;&gt;DuneSQL, a query engine for blockchain data&lt;/a&gt; presented by Miguel Filipe and Jonas
Irgens Kylling from &lt;a href=&quot;https://dune.com/&quot;&gt;Dune&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/17/trino-fest-2023-comcast-recap.html&quot;&gt;Data Mesh implementation using Hive views&lt;/a&gt;
presented by Alejandro Rojas from &lt;a href=&quot;https://comcast.github.io/&quot;&gt;Comcast&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/19/trino-fest-2023-stripe.html&quot;&gt;Inspecting Trino on ice&lt;/a&gt; presented by Kevin Liu
from &lt;a href=&quot;https://stripe.com/&quot;&gt;Stripe&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/21/trino-fest-2023-alluxio-recap.html&quot;&gt;Trino optimization with distributed caching on Data Lake&lt;/a&gt;
presented by Hope Wang and Beinan Wang from &lt;a href=&quot;https://www.alluxio.io/&quot;&gt;Alluxio&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/25/trino-fest-2023-datto.html&quot;&gt;Starburst Galaxy: A romance of many architectures&lt;/a&gt; presented by Benjamin Jeter from
&lt;a href=&quot;https://www.datto.com/&quot;&gt;Datto&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2023/07/27/trino-fest-2023-fugue-recap.html&quot;&gt;FugueSQL, Interoperable Python and Trino for interactive workloads&lt;/a&gt;
presented by &lt;a href=&quot;https://www.linkedin.com/in/kvnkho/&quot;&gt;Kevin Kho&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;next-up&quot;&gt;Next up&lt;/h2&gt;

&lt;p&gt;This first recap is sharing all the video recordings with you all if you can’t
wait. But stay tuned, because we’ll also be publishing individual blog posts and
recaps for each session, and they’ll include additional useful info:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Summary of the main lessons and takeaways from the session&lt;/li&gt;
  &lt;li&gt;Slide decks for you to browse on your own&lt;/li&gt;
  &lt;li&gt;Interesting and fun quotes from the speakers and audience&lt;/li&gt;
  &lt;li&gt;Notes and impressions from the audience and event hosts&lt;/li&gt;
  &lt;li&gt;Questions and answer during the event&lt;/li&gt;
  &lt;li&gt;Links to further documentation, tutorials, and other resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll be rolling out recap posts for a few talks each week, so keep an eye out
on our &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;community chat&lt;/a&gt; or the website for updates.&lt;/p&gt;

&lt;p&gt;At the same time, we are already marching ahead and planning towards our next
major event in autumn. Trino Summit 2023 - here we come!&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden</name>
        </author>
      

      <summary>Last week we held Trino Fest, and it kept us all so busy, we forgot to spend time chilling by the lakehouse! Great demos, amazing announcements, new plugins, and use cases reached our active audience. Thanks go to our event host and organizer Starburst, to our sponsors AWS and Alluxio, to our many well-prepared speakers, and to our great live audience. Now you get a chance to catch up on anything you missed.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest.png" />
      
    </entry>
  
    <entry>
      <title>Trino Fest nears with an all-star lineup</title>
      <link href="https://trino.io/blog/2023/06/01/trino-fest-hype-speaker-lineup.html" rel="alternate" type="text/html" title="Trino Fest nears with an all-star lineup" />
      <published>2023-06-01T00:00:00+00:00</published>
      <updated>2023-06-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/06/01/trino-fest-hype-speaker-lineup</id>
      <content type="html" xml:base="https://trino.io/blog/2023/06/01/trino-fest-hype-speaker-lineup.html">&lt;p&gt;Trino Fest is just around the corner! We’re only two weeks away, and we’re
excited to share that we’ve got an incredible speaker lineup with a wide variety
of talks about all things Trino. If you’re out of the loop,
&lt;a href=&quot;/2023-04-05-announcing-trino-fest-2023.html&quot;&gt;we announced Trino Fest&lt;/a&gt; back in
April as a two-day, free, virtual event. If you want to attend, see talks live,
engage with our speakers in Q&amp;amp;As at the end of each session, you’ll need to
register, so don’t delay, and…&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-orange&quot; href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;
        Register to attend!
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;With that said, we’re also excited to bring you a preview of our exciting
speaker lineup. Read on if you’d like to learn more.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;new-connectors&quot;&gt;New connectors&lt;/h2&gt;

&lt;p&gt;We’ve got two talks, one from Bloomberg and ForePaaS and another from Redis,
about ongoing efforts to extend Trino’s functionality to query even more data
sources. Erik Anderson from Bloomberg and Yu Teng from ForePaaS will talk about
their shared need for a Snowflake connector and the collaboration to get their
two connectors merged and then merged into Trino. Allen Terleto and Julien Ruaux
at Redis will be talking about a new, custom, and improved Redis connector for
Trino, showing how you can leverage the speed of both Redis and Trino to run
queries faster than ever while seamlessly integrating with data visualization
frameworks.&lt;/p&gt;

&lt;h2 id=&quot;the-python-ecosystem&quot;&gt;The Python ecosystem&lt;/h2&gt;

&lt;p&gt;We’ve got talks from &lt;a href=&quot;https://github.com/fugue-project/fugue&quot;&gt;Fugue&lt;/a&gt; and
&lt;a href=&quot;https://ibis-project.org/&quot;&gt;Ibis&lt;/a&gt;, two different tools that integrate Python
with SQL, and then run that SQL on underlying data sources. Both have recently
added Trino support, and they’re excited to share their use cases and introduce
the Trino community to the new, powerful ways you can leverage it. Trino has
always been a SQL query engine, but with Fugue and Ibis, writing Python code to
run queries with Trino is suddenly a reality, and analysts and data scientists
may not even need to know much SQL to get the insights they’re looking for.&lt;/p&gt;

&lt;h2 id=&quot;data-lakes&quot;&gt;Data lakes&lt;/h2&gt;

&lt;p&gt;Ryan Blue, the co-founder of Iceberg and founder of Tabular, will be exploring
how to best write CDC (change data capture) streams into Iceberg tables. A talk
from Kevin Liu at Stripe will explore how a data engineer can monitor queries
being run on Iceberg to catch performance outliers and understand usage rates. A
talk from Alluxio highlights caching optimizations with Trino and data lakes.
OneHouse is giving a talk about using Trino with Hudi, exploring how to get
query latency down, how multi-modal indexing works in Hudi, and how Trino can
utilize that indexing to execute queries at astonishing speeds. A lightning talk
from Comcast will explore Hive views, and DuneSQL will be discussing its use of
Trino with Delta Lake, rounding out coverage on all four of Trino’s lakehouse
connectors.&lt;/p&gt;

&lt;h2 id=&quot;and-more&quot;&gt;And more!&lt;/h2&gt;

&lt;p&gt;We’ll hear from customers of Trino’s main commercial vendors - Datto will be
discussing their use of Starburst Galaxy, and Arctic Wolf will give an overview
of how AWS Athena helps them provide data to customers. Jan Was from Starburst
has a lightning talk on avoiding the costs of BI tools or expensive
visualization software by setting things up for free with GitHub Actions. And
Walmart has a talk on finding ways to cut costs with cloud storage, rounding out
our expansive lineup.&lt;/p&gt;

&lt;p&gt;Does any of that sound exciting?
&lt;a href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;Go sign up to attend Trino Fest 2023&lt;/a&gt;,
and we look forward to seeing you there!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>Trino Fest is just around the corner! We’re only two weeks away, and we’re excited to share that we’ve got an incredible speaker lineup with a wide variety of talks about all things Trino. If you’re out of the loop, we announced Trino Fest back in April as a two-day, free, virtual event. If you want to attend, see talks live, engage with our speakers in Q&amp;amp;As at the end of each session, you’ll need to register, so don’t delay, and… Register to attend! With that said, we’re also excited to bring you a preview of our exciting speaker lineup. Read on if you’d like to learn more.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest-featured-talks.png" />
      
    </entry>
  
    <entry>
      <title>48: What is Trino?</title>
      <link href="https://trino.io/episodes/48.html" rel="alternate" type="text/html" title="48: What is Trino?" />
      <published>2023-05-31T00:00:00+00:00</published>
      <updated>2023-05-31T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/48</id>
      <content type="html" xml:base="https://trino.io/episodes/48.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-417-418&quot;&gt;Releases 417-418&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-417.html&quot;&gt;Trino 417&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt; ALL queries.&lt;/li&gt;
  &lt;li&gt;Faster processing of Parquet data in Hudi, Iceberg, Hive, and Delta Lake
connectors.&lt;/li&gt;
  &lt;li&gt;Faster reads of nested row fields in Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-418.html&quot;&gt;Trino 418&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXECUTE IMMEDIATE&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table_changes&lt;/code&gt; function in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Faster joins on partition columns in Delta Lake, Hive, Hudi, and Iceberg
connectors.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in the Oracle connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;question-of-the-episode-what-is-trino&quot;&gt;Question of the episode: What is Trino?&lt;/h2&gt;

&lt;p&gt;We’ve put out nearly 50 Trino Community Broadcast episodes, but we haven’t yet
done the simplest, most obvious topic of them all - an exploration of what Trino
is, how Trino works, and how you can run it. This week, we’re taking a step back
and doing a broader overview of those things, because the world needs to know…
what is Trino?&lt;/p&gt;

&lt;p&gt;If you check the Trino documentation, it starts with a definition of what Trino
isn’t. But we’ll start with what Trino is: a distributed SQL query engine
written in Java. If you have a SQL query, Trino can process and run it on an
extremely wide variety of data sources and return a result to you that you’d
expect from that SQL query. It can run queries on traditional relational
databases like Oracle, MySQL, and PostgreSQL; it works on data likes like Hive,
Iceberg, Delta Lake, and Hudi; and it runs on no-SQL databases like Cassandra
and MongoDB. You give Trino a query, Trino gives you results. And the best part
is that it doesn’t just work, it works blazing fast.&lt;/p&gt;

&lt;p&gt;The key thing to point out is that Trino does not store data, and it is not a
database on its own. It is a query engine, designed to sit on top of databases
and provide an ANSI-standard SQL interface to query whatever you’re storing your
data in. In order to use Trino, you need to start by having data stored
somewhere else. Of course, Trino can write data to those underlying
sources with the same SQL syntax, so for the end user, it can be an all-in-one
interface to those underlying data sources, an abstraction that saves users from
needing to understand the differences between data being stored in Iceberg and
data being stored in Oracle.&lt;/p&gt;

&lt;h3 id=&quot;how-does-it-work&quot;&gt;How does it work?&lt;/h3&gt;

&lt;p&gt;Trino uses a distributed architecture, with a singular coordinator node that
schedules and orchestrates the workload, as well as many worker nodes that
carries out tasks and processes data.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-how-do-you-run-trino&quot;&gt;Concept of the episode: How do you run Trino?&lt;/h2&gt;

&lt;p&gt;The better question might be “how can’t you run Trino?” As the project has
matured, it’s been added to various third-party tools and integrated into
different apps that help make it easier to run than ever before. We have some
exciting news to share on that front soon, but for now, the biggest ways to run
Trino include:&lt;/p&gt;

&lt;h3 id=&quot;tarball&quot;&gt;Tarball&lt;/h3&gt;

&lt;p&gt;You can directly download the Trino server, manually configure it, and start it
up like any other program. Clients can connect to the server from there,
utilizing the web interface or the CLI to run queries. This is the most manual
way to set up Trino, but it works, and it doesn’t depend on anything else.
&lt;a href=&quot;https://trino.io/docs/current/installation/deployment.html&quot;&gt;Our docs go into a ton of detail on this process.&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;docker&quot;&gt;Docker&lt;/h3&gt;

&lt;p&gt;Trino provides a Docker image that can be run through the Docker software. You
start by downloading and installing Docker, create a container from
the Trino image, and then you can run that image to immediately get Trino up
and running. No manual configuration needed, no messing around with creating
directories or files, it just works. It’s perhaps the simplest way to get Trino
off the ground, and recommended for anyone trying to run it independently just
to fiddle around with it.
&lt;a href=&quot;https://trino.io/docs/current/installation/containers.html&quot;&gt;As always, you can refer to the docs for more information.&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;kubernetes-and-helm&quot;&gt;Kubernetes and Helm&lt;/h3&gt;

&lt;p&gt;Trino provides a Helm chart for use with Kubernetes, so after setting up
Kubernetes, kubectl, and Helm, you can install Trino on your Kubernetes cluster
with Helm. It comes with the same pre-configured image as Docker, so there’s no
need to manually set that up, but in order to run queries, you’ll also need to
set up a tunnel between the coordinator pod within Kubernetes and whatever
machine you want to run those queries on. If this is the right setup for you,
you probably already know that, and you don’t need us to go into more detail.
&lt;a href=&quot;https://trino.io/docs/current/installation/kubernetes.html&quot;&gt;More info is in the Trino docs.&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;trino-clients&quot;&gt;Trino clients&lt;/h3&gt;

&lt;p&gt;On the most basic side of things, Trino provides a command-line interface and a
web UI. If you want something more robust, a couple open source clients have
been made in the community -
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;one written for Python&lt;/a&gt; and
&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;one written in Go&lt;/a&gt;. There’s a
couple other Python clients that will be even easier to run coming soon, and
we’ll be hearing from them at Trino Fest in just two weeks.&lt;/p&gt;

&lt;h3 id=&quot;or&quot;&gt;Or…&lt;/h3&gt;

&lt;p&gt;On the not-so-free side of things, Starburst Galaxy and AWS Athena offer Trino
as a cloud service, which can make life even easier.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-how-can-you-contribute-to-trino&quot;&gt;Concept of the episode: How can you contribute to Trino?&lt;/h2&gt;

&lt;p&gt;We’ve got a page on the website dedicated to
&lt;a href=&quot;https://trino.io/development/process.html&quot;&gt;the contribution process&lt;/a&gt;, though we’d
like to welcome anyone and everyone listening to take a crack at contributing to
Trino if it’s something you’re interested in. Open source projects can always
use more help, and we’d like to see community contributions whenever. From that
process page, the steps are:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Sign the CLA.&lt;/li&gt;
  &lt;li&gt;Make sure your contribution is something that Trino wants/needs.&lt;/li&gt;
  &lt;li&gt;Implement your change.&lt;/li&gt;
  &lt;li&gt;Open a pull request.&lt;/li&gt;
  &lt;li&gt;Request and wait for a review.&lt;/li&gt;
  &lt;li&gt;Address review comments.&lt;/li&gt;
  &lt;li&gt;Wait for it to be merged.&lt;/li&gt;
  &lt;li&gt;Wait for the next release, and then… your code change is in Trino!&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;pr-of-the-episode-11701-support-nessie-catalog-in-iceberg-connector&quot;&gt;PR of the episode: #11701: Support Nessie Catalog in Iceberg connector&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://projectnessie.org/&quot;&gt;Nessie&lt;/a&gt; is a transactional catalog designed for use
with data lakes like Iceberg and Delta Lake. Its key selling point is git-like
version control, making it easy to view history, roll back, and see who made
what adjustments when. &lt;a href=&quot;https://github.com/trinodb/trino/pull/11701&quot;&gt;PR #11701&lt;/a&gt;
allows Trino’s Iceberg connector to query Nessie, adding yet another tool and
opportunity for query federation to Trino’s belt.&lt;/p&gt;

&lt;p&gt;And though we hate to say it, Nessie might just be the only other project in the
world with a mascot that can compete with Commander Bun Bun.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;Coming up in just two weeks, Trino Fest is a two-day event that will feature
talks from a wide range of speakers surrounding the Trino ecosystem. As already
hinted at, we’ll be hearing from a couple new Python clients, from Trino users
sharing tips and tricks to maximize the utility of the software, and from
community contributors adding exciting new features and extensions to Trino.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;Register to attend&lt;/a&gt; if you’re
interested and want to tune in to an awesome speaker lineup! It’s virtual and
completely free to attend, so all you’ve got to do is sign up.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino at Open Source Summit North America 2023</title>
      <link href="https://trino.io/blog/2023/05/15/oss-na.html" rel="alternate" type="text/html" title="Trino at Open Source Summit North America 2023" />
      <published>2023-05-15T00:00:00+00:00</published>
      <updated>2023-05-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/05/15/oss-na</id>
      <content type="html" xml:base="https://trino.io/blog/2023/05/15/oss-na.html">&lt;p&gt;Last week, I had the pleasure to attend &lt;a href=&quot;https://events.linuxfoundation.org/open-source-summit-north-america/&quot;&gt;Open Source Summit North America
2023&lt;/a&gt; in
Vancouver. A quick hop across the &lt;a href=&quot;https://en.wikipedia.org/wiki/Strait_of_Georgia&quot;&gt;Strait of
Georgia&lt;/a&gt; got me right into the
event and into the midst of my peers of open source developers, advocates, and
enthusiasts.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;A highlight of the event for me was catching up with many existing and new
friends from the open source communities. It was inspiring to learn details
about the success of open source projects, including
&lt;a href=&quot;https://opensearch.org/&quot;&gt;Opensearch&lt;/a&gt;, &lt;a href=&quot;https://riscv.org/about/&quot;&gt;RISC-V&lt;/a&gt;, the
British Columbia government &lt;a href=&quot;https://developer.gov.bc.ca/&quot;&gt;DevHub project&lt;/a&gt;, NASA
&lt;a href=&quot;https://code.nasa.gov/&quot;&gt;open source&lt;/a&gt; and &lt;a href=&quot;https://data.nasa.gov/&quot;&gt;open data
projects&lt;/a&gt;, and many others.&lt;/p&gt;

&lt;p&gt;In my interview with John Furrier and Rob Strechay for &lt;a href=&quot;https://www.thecube.net/&quot;&gt;SiliconANGLE
theCUBE&lt;/a&gt;, I was able to share more information about
Trino, query engines, lakehouses, and &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;. We also
talked about the benefits of using Trino for different use cases, how data
continues to be crucial, and how it is even important thanks to the new
wave of large language models.&lt;/p&gt;

&lt;div style=&quot;padding-bottom: 1rem&quot;&gt;
  &lt;a class=&quot;btn btn-orange&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://siliconangle.com/2023/05/11/making-data-accessibility-faster-and-friendly-using-distributed-query-insights-ossummit/&quot; target=&quot;_blank&quot;&gt;Read more about the interview and watch the video&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;SiliconANGLE theCUBE features &lt;a href=&quot;https://www.thecube.net/events/linux-foundation/open-source-summit-na-2023&quot;&gt;more interview coverage from the
summit&lt;/a&gt;.
and The Linux Foundation &lt;a href=&quot;https://events.linuxfoundation.org/open-source-summit-north-america/&quot;&gt;makes keynote and session videos as well as
presentation decks available&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;My special thanks goes to Starburst for sending me to represent the Trino
community at the summit. I also really appreciate the help for organizing Trino
Fest. The speaker proposals are all in, and the free, virtual event is promising
to be a great showcase of Trino, modern lakehouse platforms and tools from the
community of users, contributors and vendors, and our increased adoption for a
wide range of use cases.&lt;/p&gt;

&lt;div style=&quot;padding-bottom: 1rem&quot;&gt;
  &lt;a class=&quot;btn btn-pink&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://www.starburst.io/info/trinofest/&quot; target=&quot;_blank&quot;&gt;Register for Trino Fest 2023&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;Join us in June for the event, you don’t want to miss some of the announcements
and demos.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Last week, I had the pleasure to attend Open Source Summit North America 2023 in Vancouver. A quick hop across the Strait of Georgia got me right into the event and into the midst of my peers of open source developers, advocates, and enthusiasts.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/manfred-open-source-summit.jpg" />
      
    </entry>
  
    <entry>
      <title>47: Meet the new Trino maintainers</title>
      <link href="https://trino.io/episodes/47.html" rel="alternate" type="text/html" title="47: Meet the new Trino maintainers" />
      <published>2023-05-05T00:00:00+00:00</published>
      <updated>2023-05-05T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/47</id>
      <content type="html" xml:base="https://trino.io/episodes/47.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Technical Content at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pettyjamesm&quot;&gt;James Petty&lt;/a&gt;, Senior Software Engineer at AWS&lt;/li&gt;
  &lt;li&gt;Also Manfred. Kind of.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-411-416&quot;&gt;Releases 411-416&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-411.html&quot;&gt;Trino 411&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;migrate&lt;/code&gt; procedure to convert a Hive table to Iceberg.&lt;/li&gt;
  &lt;li&gt;Join and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; pushdown in Ignite.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; in Ignite.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;procedure&lt;/code&gt; table function for executing stored procedures in SQL Server.&lt;/li&gt;
  &lt;li&gt;Faster join queries over Hive bucketed tables.&lt;/li&gt;
  &lt;li&gt;Faster planning for tables with many columns in Hive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-412.html&quot;&gt;Trino 412&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exclude_columns&lt;/code&gt; table function.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ADD COLUMN&lt;/code&gt; in Ignite.&lt;/li&gt;
  &lt;li&gt;Support for table comments in PostgreSQL connector.&lt;/li&gt;
  &lt;li&gt;Faster sum(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT ...&lt;/code&gt;) queries for various connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-413.html&quot;&gt;Trino 413&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; in the Phoenix connector.&lt;/li&gt;
  &lt;li&gt;Support for table comments in the Oracle connector.&lt;/li&gt;
  &lt;li&gt;Improved performance of queries involving window functions or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-414.html&quot;&gt;Trino 414&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Experimental support for tracing using OpenTelemetry.&lt;/li&gt;
  &lt;li&gt;Support for Databricks 12.2 LTS in Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in Redshift connector.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sequence&lt;/code&gt; table function.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-415.html&quot;&gt;Trino 415&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/docs/current/release/release-416.html&quot;&gt;Trino 416&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A whole lot of minor performance improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-the-two-new-trino-maintainers&quot;&gt;Introducing the two new Trino maintainers&lt;/h2&gt;

&lt;p&gt;Manfred should hardly need an introduction to Trino Community Broadcast viewers,
as he’s been around and hosting episodes from the beginning, and authored
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
In the background, he’s also been quietly working on docs, the website, and
a wide variety of other initiatives in the Trino community.&lt;/p&gt;

&lt;p&gt;James should also be familiar to anyone who has contributed on Trino. Iconically
rocking a GitHub avatar of the face of
&lt;a href=&quot;https://en.wikipedia.org/wiki/Bob_Ross&quot;&gt;Bob Ross&lt;/a&gt;, it’s hard to miss when he
shows up on a pull request. And working on Trino as part of
&lt;a href=&quot;https://aws.amazon.com/athena/&quot;&gt;AWS Athena&lt;/a&gt;, he’s been a major engineering
contributor for the last several years, with 262 commits under his belt and more
on the way.&lt;/p&gt;

&lt;h2 id=&quot;what-is-a-maintainer&quot;&gt;What is a maintainer?&lt;/h2&gt;

&lt;p&gt;If you don’t go clicking around on the Trino website fanatically trying to find
everything you can possibly read about the project, there’s a chance you’ve
never bumped into our &lt;a href=&quot;https://trino.io/development/roles.html&quot;&gt;roles&lt;/a&gt; page,
which highlights how Trino is governed. To quote that page:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;In Trino, maintainer is an active role. A maintainer is responsible for
merging code only after ensuring it has been reviewed thoroughly and aligns with
the Trino vision and guidelines. In addition to merging code, a maintainer
actively participates in discussions and reviews. Being a maintainer does not
grant additional rights in the project to make changes, set direction, or
anything else that does not align with the direction of the project. Instead, a
maintainer is expected to bring these to the project participants as needed to
gain consensus. The maintainer role is for an individual, so if a maintainer
changes employers, the role is retained. However, if a maintainer is no longer
actively involved in the project, their maintainer status will be reviewed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or, in normal speech, a maintainer is a trusted individual with merge rights.
But with great power comes great responsibility, higher standards, and an
expectation to be an active steward of the Trino project. It’s not easy to
become a maintainer - prior to Manfred and James, it had been over a year since
the most recent maintainer was appointed. The high bar of activity, quality, and
attitude is not trivial by any stretch, and so we’re excited to talk to them
about the role, how they got here, and what they’re looking forward to for the
future of Trino.&lt;/p&gt;

&lt;h2 id=&quot;the-path-to-becoming-a-maintainer&quot;&gt;The path to becoming a maintainer&lt;/h2&gt;

&lt;h3 id=&quot;manfred&quot;&gt;Manfred&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;When did you first start working on Trino?&lt;/li&gt;
  &lt;li&gt;What’s your proudest contribution to the project?&lt;/li&gt;
  &lt;li&gt;Have a funny story you’ve wanted to share to the world?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;james&quot;&gt;James&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;When did you first start working on Trino?&lt;/li&gt;
  &lt;li&gt;What’s your proudest contribution to the project?&lt;/li&gt;
  &lt;li&gt;Why the Bob Ross avatar?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-16753-improve-topn-row-number--rank-performance&quot;&gt;PR of the episode: &lt;a href=&quot;https://github.com/trinodb/trino/pull/16753&quot;&gt;16753: Improve TopN row number / rank performance&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;We normally focus on flashy and user-facing PRs for the PR of the episode, but
this week, courtesy of our guest James, we’re going to highlight something that
better represents the more routine work that’s going on in Trino all the time:
a performance improvement.&lt;/p&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinofest/&quot;&gt;Trino Fest&lt;/a&gt; is coming up in just a
couple months. Register to attend or
&lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;sign up to submit a talk&lt;/a&gt; if you have
something to share!&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;. Kevin Haley’s
&lt;a href=&quot;https://www.meetup.com/boston-data-engineering/events/291662797/&quot;&gt;Getting to Know Trino&lt;/a&gt;
in Boston was a great success, and we’d love to hear from other Trino community 
members who’d be interested in hosting other events!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Refreshing at the lakehouse summer camp</title>
      <link href="https://trino.io/blog/2023/05/03/refresh-at-trino-fest.html" rel="alternate" type="text/html" title="Refreshing at the lakehouse summer camp" />
      <published>2023-05-03T00:00:00+00:00</published>
      <updated>2023-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/05/03/refresh-at-trino-fest</id>
      <content type="html" xml:base="https://trino.io/blog/2023/05/03/refresh-at-trino-fest.html">&lt;p&gt;Summer is just around the corner, and we are busy getting ready for &lt;a href=&quot;/blog/2023/04/05/announcing-trino-fest-2023.html&quot;&gt;Trino Fest
2023&lt;/a&gt;. Everything is
ramping up. Early birds are starting to register, and &lt;a href=&quot;https://www.starburst.io/info/trinofest&quot;&gt;so should
you&lt;/a&gt;. Our Trino Fest theme song is
available for your listening pleasure, and we are reviewing speaker submissions.
The festival is promising to be another great event to learn about lakehouse use
cases with Trino, but we are also featuring some great presentations for
querying data with Trino. And of course, we are still looking for more
presenters, so don’t hesitate and &lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;submit your
proposal&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Before you dive into the technical details of our upcoming conference, lean back
and listen to our theme song. Hopefully you are feeling the summer vibe coming
your way already.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/6oN-70jSbF8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Our event host &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt; is again helping us ensure
that Trino Fest is a venue for Trino beginners and experts to meet, exchange
ideas, and learn from each other. One of the Starburst engineers, &lt;a href=&quot;https://github.com/nineinchnick&quot;&gt;Jan
Waś&lt;/a&gt;, is scheduled to present about his
amazingly low-effort setup to use Trino for data analysis and report generation.&lt;/p&gt;

&lt;p&gt;Getting closer to the theme of the event “Lakehouse summer camp”, we are
planning to have sessions about Iceberg, Delta Lake, and Hudi usage with Trino.
Learn about the latest  developments from these projects and practical tips and
tricks from the user community.&lt;/p&gt;

&lt;p&gt;In the keynote, Martin Traverso will speak about the many new features that
arrived in Trino since &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit last year&lt;/a&gt;. This includes the new Apache Ignite
connector we talked about in the &lt;a href=&quot;https://trino.io/episodes/46.html&quot;&gt;Trino Community Broadcast episode
46&lt;/a&gt;. At Trino Fest we are going to share some
more exciting news about new connectors and integrations for Trino. Specifically
on the client tooling side you can expect some great demos and news from the
Python community.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? It’s time to register for the event. And if you
think you also want to share your knowledge and usage of Trino, submit a speaker
proposal.&lt;/p&gt;

&lt;div style=&quot;padding-bottom: 1rem&quot;&gt;
  &lt;a class=&quot;btn btn-orange&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://www.starburst.io/info/trinofest/&quot; target=&quot;_blank&quot;&gt;Register&lt;/a&gt;
  &lt;a class=&quot;btn btn-pink&quot; style=&quot;display: inline-grid;&quot; href=&quot;https://sessionize.com/trino-fest-2023&quot; target=&quot;_blank&quot;&gt;Submit a talk&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;In either case, as your hosts and guides through the two half days, we look
forward to have you at the event.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred and Cole&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Cole Bowden</name>
        </author>
      

      <summary>Summer is just around the corner, and we are busy getting ready for Trino Fest 2023. Everything is ramping up. Early birds are starting to register, and so should you. Our Trino Fest theme song is available for your listening pleasure, and we are reviewing speaker submissions. The festival is promising to be another great event to learn about lakehouse use cases with Trino, but we are also featuring some great presentations for querying data with Trino. And of course, we are still looking for more presenters, so don’t hesitate and submit your proposal.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest.png" />
      
    </entry>
  
    <entry>
      <title>Just the right time date predicates with Iceberg</title>
      <link href="https://trino.io/blog/2023/04/11/date-predicates.html" rel="alternate" type="text/html" title="Just the right time date predicates with Iceberg" />
      <published>2023-04-11T00:00:00+00:00</published>
      <updated>2023-04-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/11/date-predicates</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/11/date-predicates.html">&lt;p&gt;In the data lake world, data partitioning is a technique that is critical to the
performance of read operations. In order to avoid scanning large amounts of data
accidentally, and also to limit the number of partitions that are being
processed by a query, a query engine must push down constant expressions when
filtering partitions.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Partitions in an Iceberg table tend to be fairly large, containing up to tens or
even hundreds of data files. It is therefore crucial to be able to skip
irrelevant partitions while scanning a table in order to ensure high performance
query processing speed. When a table is created in a data lake, its partitioning
scheme constitutes a de-facto index, speeding up queries against it by pruning
out irrelevant partitions from the scan operation.&lt;/p&gt;

&lt;p&gt;Date and time are natural and universal partitioning candidates. Common
partition patterns revolve around month, day, hour. One exciting feature  of the
Iceberg table format is its &lt;a href=&quot;https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html#partition-specification-evolution&quot;&gt;hidden
partitioning&lt;/a&gt;.
Iceberg uses handy
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#partitioned-tables&quot;&gt;transforms&lt;/a&gt;
such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;year&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;month&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;day&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hour&lt;/code&gt; to deal with the complexities of mapping
a raw timestamp value to an actual partition value in a manner that is
transparent to the user.&lt;/p&gt;

&lt;p&gt;Let’s look at a typical example of an Iceberg table containing log events which
are partitioned by day:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;level&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;message&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partitioning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;day(event_time)&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When dealing with logs, it often happens that we want to know what happened
today or within the last few days:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CURRENT_DATE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;7&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;constant-folding&quot;&gt;Constant folding&lt;/h2&gt;

&lt;p&gt;Trino uses the &lt;em&gt;constant folding&lt;/em&gt; optimization technique for dealing with these
types of queries by internally rewriting the filter expression as a comparison
predicate against a constant evaluated before executing the query in order to
avoid recalculating the same expression for each row scanned:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/date-predicates/constant_folding.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;predicate-pushdown&quot;&gt;Predicate pushdown&lt;/h2&gt;

&lt;p&gt;Another common query scenario for log data is to query for a specific date in
the past. A seasoned SQL user, being aware of the underlying data type of the
partitioning column, would likely specify the date to be queried explicitly as
two timestamp constant filter expressions:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20 00:00:00.000000 UTC&apos;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-21 00:00:00.000000 UTC&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A different flavor of the above-mentioned query would be to use
the &lt;a href=&quot;/docs/current/functions/comparison.html#range-operator-between&quot;&gt;BETWEEN&lt;/a&gt;
range operator:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BETWEEN&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20 00:00:00.000000 UTC&apos;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20 23:59:59.999999 UTC&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Users can focus on writing queries that are concise and readable by other human
readers, and leave the eventual grunt optimization work to the query engine.&lt;/p&gt;

&lt;p&gt;A succinct way of querying the logs for a specific day would be to cast the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; field value to its corresponding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt; value and compare it with
the day containing the relevant logs:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DATE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this case, Trino &lt;a href=&quot;https://github.com/trinodb/trino/commit/49be4c2a&quot;&gt;unwraps the initial temporal
filter&lt;/a&gt; to a filter that tests
whether the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; is within the constant timestamp range
corresponding to the date used in the initial filter, which is equivalent to the
most efficient of the explicit filters mentioned above.&lt;/p&gt;

&lt;p&gt;A different approach of querying the log data for a specific date is to use the
&lt;a href=&quot;/docs/current/functions/datetime.html#truncation-function&quot;&gt;date_trunc&lt;/a&gt;
function:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;date_trunc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;day&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;DATE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2022-01-20&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Trino again &lt;a href=&quot;https://github.com/trinodb/trino/commit/80c079f9&quot;&gt;replaces the initial temporal
filter&lt;/a&gt; to a filter testing
whether the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; is within the constant timestamp range
corresponding to the date used in the initial filter.&lt;/p&gt;

&lt;p&gt;A slightly different use case is querying the log data to see whether an exotic
error type is recorded in the logs during previous months of the current year by
making use of the
&lt;a href=&quot;/docs/current/functions/datetime.html#year&quot;&gt;year()&lt;/a&gt; function:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2023&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This time, Trino &lt;a href=&quot;https://github.com/trinodb/trino/commit/b8967a3c1550b6e64ad8d3e7979ea46fbfc51550&quot;&gt;rewrites the temporal
filter&lt;/a&gt;
applied on the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BETWEEN&lt;/code&gt; filter for the unfolded date
range corresponding to the entire span of the specified year:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;event_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BETWEEN&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2023-01-01 00:00:00.000000 UTC&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2023-12-31 23:59:59.999999&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Without predicate pushdown, the filtering is done by Trino on each tuple, after
scanning the entire content of the table:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/date-predicates/filter_basic_data_flow.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The optimization techniques employed by Trino to speed up the above mentioned
types of queries all involve replacing the provided filter with an equivalent
filter expression. Constant replacement optimizations compare the table column
against a constant or a constant range with the purpose of literally pushing the
filter down to &lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Iceberg&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As a consequence, the partition pruning happens on the metadata layer of the
table instead of filtering on top of the data itself, dramatically reducing the
amount of actual data files scanned:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/date-predicates/filter_push_down_data_flow.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As described in the &lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;Iceberg Table Spec&lt;/a&gt;, for
any snapshot of the table, Iceberg tracks each individual data file and the
partition to which it belongs. Iceberg uses a hierarchical index in its metadata
layer by storing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lower_bounds&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;upper_bounds&lt;/code&gt; for:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;each partition in the manifest list files&lt;/li&gt;
  &lt;li&gt;each data file in the manifest files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Desugaring seemingly variable filter expressions to comparison predicates
involving only columns and constants or constant ranges pays off. Not only does
it prune out partitions, but it also skips portions of the data file (for
example a Apache Parquet row group) or even the data file altogether in certain
circumstances. For instance, pruning and skipping can occur  if the queried
range value does not overlap with the indexed Iceberg metadata range of values
contained in the file, in case of a non-partition column filter.&lt;/p&gt;

&lt;p&gt;To put things in perspective, the optimization techniques presented in this
article, which have been already integrated in Trino, can cause the execution of
queries containing temporal filters with selective filters to complete in
seconds compared (depending on the size of the table scanned) to hours.&lt;/p&gt;

&lt;p&gt;A reader keen to experiment and discover whether the previously mentioned
optimization techniques are actually effective can use
&lt;a href=&quot;/docs/current/sql/explain.html&quot;&gt;EXPLAIN&lt;/a&gt; to examine the output
of the query planning stage. If the temporal predicate employed in the query is
being pushed down, the scan operation should definitely have fewer rows than the
count of all rows contained in the table.&lt;/p&gt;

&lt;p&gt;The queries in this post showcase just a tiny fraction of the myriad of
techniques which can be employed to perform queries on date and time columns.
Trino continuously strives to streamline its users’ workflows by providing the
results of queries as fast as possible.&lt;/p&gt;</content>

      
        <author>
          <name>Marius Grama</name>
        </author>
      

      <summary>In the data lake world, data partitioning is a technique that is critical to the performance of read operations. In order to avoid scanning large amounts of data accidentally, and also to limit the number of partitions that are being processed by a query, a query engine must push down constant expressions when filtering partitions.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/date-predicates/christian-pfeifer-l6OraG-v0d8-unsplash.jpg" />
      
    </entry>
  
    <entry>
      <title>Polish edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2023/04/06/the-definitive-guide-2-pl.html" rel="alternate" type="text/html" title="Polish edition of Trino: The Definitive Guide" />
      <published>2023-04-06T00:00:00+00:00</published>
      <updated>2023-04-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/06/the-definitive-guide-2-pl</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/06/the-definitive-guide-2-pl.html">&lt;p&gt;At this stage Trino is used all around the globe as we know from the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;community
chat&lt;/a&gt; and &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;our speakers at Trino Summit 2022&lt;/a&gt;. One large community of Trino
contributors and maintainers, many employed by &lt;a href=&quot;http://starburst.io&quot;&gt;Starburst&lt;/a&gt;,
is located in Poland. Poland also has a very active participation of developers
and users in the Java and Big Data communities.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Today, we are happy to announce that a translation of the book &lt;a href=&quot;https://trino.io/trino-the-definitive-guide.html&quot;&gt;Trino: The
Definitive Guide&lt;/a&gt; to Polish is
now available for the communities in Poland and beyond. We invite you all to get
your own copy:&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://ksiazki.promise.pl/produkt/trino-profesjonalny-przewodnik-sql-w-dowolnej-skali-w-dowolnym-magazynie-i-w-dowolnym-srodowisku/&quot;&gt;
        Trino Profesjonalny Przewodnik
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Our thanks for making this happen go out the teams at O’Reilly and
&lt;a href=&quot;https://ksiazki.promise.pl/&quot;&gt;Promise&lt;/a&gt;. We hope many readers will benefit from
the translated edition.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and Matt&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Matt Fuller</name>
        </author>
      

      <summary>At this stage Trino is used all around the globe as we know from the community chat and our speakers at Trino Summit 2022. One large community of Trino contributors and maintainers, many employed by Starburst, is located in Poland. Poland also has a very active participation of developers and users in the Java and Big Data communities.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-pl-cover.png" />
      
    </entry>
  
    <entry>
      <title>Trino and the BDFL model: a renewed focus</title>
      <link href="https://trino.io/blog/2023/04/06/trino-bdfl-focus.html" rel="alternate" type="text/html" title="Trino and the BDFL model: a renewed focus" />
      <published>2023-04-06T00:00:00+00:00</published>
      <updated>2023-04-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/06/trino-bdfl-focus</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/06/trino-bdfl-focus.html">&lt;p&gt;For those who are paying close attention, you may notice updates to a few pages
across the Trino website with a renewed focus on leadership roles in Trino. This
is part of an effort to re-focus and make the operating model more transparent
both for contributors and for end users. While this is not a functional change,
this does involve clarifying our roles following the
&lt;a href=&quot;https://en.wikipedia.org/wiki/Benevolent_dictator_for_life&quot;&gt;BDFL (benevolent dictator for life)&lt;/a&gt;
model.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Trino has been a popular open source project used by many companies and
organizations since its inception in 2012. As a founder-led project, it has
consistently operated under a BDFL model, though not necessarily by name. The
model is used to describe the persons who can make the final decisions for the
direction and development of the project. Many successful open-source projects,
including Linux, Python, Scala, Ruby, and Rust, operate using a BDFL model.&lt;/p&gt;

&lt;h2 id=&quot;why-the-bdfl-model&quot;&gt;Why the BDFL model?&lt;/h2&gt;

&lt;p&gt;One of the key benefits of the BDFL model is that it allows for a clear
decision-making process. When a project has a large number of contributors, it
can be difficult to reach consensus on certain issues. The BDFL can step in and
make the final decision, which can be particularly helpful in situations where
time is of the essence. Additionally, having a BDFL can provide a sense of
stability and direction for the project.&lt;/p&gt;

&lt;p&gt;It’s important to emphasize that the use of the BDFL model is not a new
development in Trino’s history. We (Dain, David and Martin) have acted in 
this role since the beginning.&lt;/p&gt;

&lt;h2 id=&quot;why-now&quot;&gt;Why now?&lt;/h2&gt;

&lt;p&gt;Why is there a renewed focus on the BDFL model now? Trino has reached a level
of maturity and a community size that has made increasingly important to have
clear leadership and decision-making processes. By making the BDFL model more
explicit, we can ensure that the project remains focused and continues to deliver
value to its users.&lt;/p&gt;

&lt;h2 id=&quot;more-info&quot;&gt;More info&lt;/h2&gt;

&lt;p&gt;You can check out the following pages for additional information:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/development/roles.html&quot;&gt;Roles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/development/process.html&quot;&gt;Development process&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/individual-code-of-conduct.html&quot;&gt;Individual code of conduct&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>For those who are paying close attention, you may notice updates to a few pages across the Trino website with a renewed focus on leadership roles in Trino. This is part of an effort to re-focus and make the operating model more transparent both for contributors and for end users. While this is not a functional change, this does involve clarifying our roles following the BDFL (benevolent dictator for life) model.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/bdfl-blog/trino-logo.png" />
      
    </entry>
  
    <entry>
      <title>Lakehouse summer camp at Trino Fest 2023</title>
      <link href="https://trino.io/blog/2023/04/05/announcing-trino-fest-2023.html" rel="alternate" type="text/html" title="Lakehouse summer camp at Trino Fest 2023" />
      <published>2023-04-05T00:00:00+00:00</published>
      <updated>2023-04-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/04/05/announcing-trino-fest-2023</id>
      <content type="html" xml:base="https://trino.io/blog/2023/04/05/announcing-trino-fest-2023.html">&lt;p&gt;Get ready to kick off your summer with Commander Bun Bun at Trino Fest 2023!
This year’s event is going virtual and will take place over two days, &lt;strong&gt;the 14th
and 15th of June&lt;/strong&gt;. The focus of the event will be on Trino as a data lakehouse
query engine, with discussions on how new features and the ecosystem around
Trino can support better data lakehouse management.&lt;/p&gt;

&lt;p&gt;Trino Fest 2023 is the new annual summer event dedicated to all things Trino.
Building on the success of last year’s &lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;Cinco de
Trino&lt;/a&gt;, we’re excited to bring
the community together once again to explore the latest trends and innovations
in Trino and data lakehouse management. With a focus on education, community
collaboration, and inspiration, Trino Fest 2023 will be a valuable experience
for anyone interested in improving their data and analytics platform. We hope to
see you there as attendee, speaker, or sponsor! Read below to find out how to
sign up.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;call-for-speakers&quot;&gt;Call for speakers&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;Call for speakers&lt;/a&gt; is now open, and we
invite you to submit a talk if you have an interesting perspective on Trino.
We’re particularly interested in talks related to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Data lake and lakehouse use cases, architectures and experiences&lt;/li&gt;
  &lt;li&gt;Apache Iceberg&lt;/li&gt;
  &lt;li&gt;Delta Lake&lt;/li&gt;
  &lt;li&gt;Hudi&lt;/li&gt;
  &lt;li&gt;Industry use cases for Trino&lt;/li&gt;
  &lt;li&gt;Query federation&lt;/li&gt;
  &lt;li&gt;Data governance with Trino&lt;/li&gt;
  &lt;li&gt;SQL with Trino&lt;/li&gt;
  &lt;li&gt;ETL/ELT/batch query processing&lt;/li&gt;
  &lt;li&gt;Other tools and integrations in the Trino ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The call for speakers closes on May 19th, so be sure to submit your talk soon!&lt;/p&gt;

&lt;h2 id=&quot;whats-new-this-year&quot;&gt;What’s new this year?&lt;/h2&gt;

&lt;p&gt;Aside from the new title, this year’s Trino Fest will differ from last years,
short conference in a few ways. We’re featuring more talks from Trino
practitioners, the event will run over two shorter days to avoid the death march
of talks, and there will be more summer, lakehouse, and camping puns. Of course,
there will be continued use of the &lt;a href=&quot;https://www.youtube.com/watch?v=kfJ63DNbAuI&amp;amp;list=PLFnr63che7wYFsknFAqisURvfm96rW0Dr&amp;amp;index=4&quot;&gt;Trinoritaville song
&lt;/a&gt;.
Whether you’re just getting started with Trino or you’re a seasoned pro, there
will be something for everyone at Trino Fest.&lt;/p&gt;

&lt;h2 id=&quot;what-is-trino-fest-versus-trino-summit&quot;&gt;What is Trino Fest versus Trino Summit&lt;/h2&gt;

&lt;p&gt;Trino was &lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;built from the beginning to query on Hive data&lt;/a&gt;, so Trino moving on to support a data
lakehouse is simply the evolution from its flagship use case. Trino Fest covers
the latest features and improvements to Trino that make it an even better choice
for data lakehouse management. You’ll hear from speakers who are using Trino in
innovative ways, and who can provide valuable insights and tips for managing
your own data lakehouse. Going with the chill summer theme, there will be plenty
of time to have fun and relax too!&lt;/p&gt;

&lt;h2 id=&quot;sponsor-trino-fest&quot;&gt;Sponsor Trino Fest&lt;/h2&gt;

&lt;p&gt;If you’re interested in sponsoring Trino Fest 2023, we’d love to hear from you!
Sponsoring the event is a great way to get your brand in front of a highly
engaged audience of Trino enthusiasts and data professionals. Your support will
help make the event a success, and in return, we’ll offer a range of benefits,
such as logo placement on our website, social media shoutouts, and more. To
learn more about sponsoring Trino Fest 2023, reach out to
&lt;a href=&quot;mailto:events@starburst.io&quot;&gt;events@starburst.io&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;see-you-there&quot;&gt;See you there&lt;/h2&gt;

&lt;p&gt;Mark your calendar to save &lt;strong&gt;the 14th
and 15th of June&lt;/strong&gt; for Trino Fest 2023: Lakehouse Summer Camp. Get ready
for a two-day event that will get you diving into the deep end of the data lake.
&lt;a href=&quot;https://www.starburst.io/info/trinofest&quot;&gt;Registration is open now&lt;/a&gt;, and &lt;a href=&quot;https://sessionize.com/trino-fest-2023&quot;&gt;the
call for speakers&lt;/a&gt; closes on April 28th,
so be sure to sign up and submit your talk soon!&lt;/p&gt;

&lt;p&gt;Happy querying!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Get ready to kick off your summer with Commander Bun Bun at Trino Fest 2023! This year’s event is going virtual and will take place over two days, the 14th and 15th of June. The focus of the event will be on Trino as a data lakehouse query engine, with discussions on how new features and the ecosystem around Trino can support better data lakehouse management. Trino Fest 2023 is the new annual summer event dedicated to all things Trino. Building on the success of last year’s Cinco de Trino, we’re excited to bring the community together once again to explore the latest trends and innovations in Trino and data lakehouse management. With a focus on education, community collaboration, and inspiration, Trino Fest 2023 will be a valuable experience for anyone interested in improving their data and analytics platform. We hope to see you there as attendee, speaker, or sponsor! Read below to find out how to sign up.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-fest-2023/trino-fest.png" />
      
    </entry>
  
    <entry>
      <title>46: Trino heats up with Ignite</title>
      <link href="https://trino.io/episodes/46.html" rel="alternate" type="text/html" title="46: Trino heats up with Ignite" />
      <published>2023-03-15T00:00:00+00:00</published>
      <updated>2023-03-15T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/46</id>
      <content type="html" xml:base="https://trino.io/episodes/46.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/jian-chen-7aa3a2225/&quot;&gt;Jason&lt;/a&gt;, Senior Data
Engineer at Shopee.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-408-410&quot;&gt;Releases 408-410&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-408.html&quot;&gt;Trino 408&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Apache Ignite connector!&lt;/li&gt;
  &lt;li&gt;Add support for writing decimal types to BigQuery.&lt;/li&gt;
  &lt;li&gt;Improve performance when reading structural types from Parquet files in Delta Lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-409.html&quot;&gt;Trino 409&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for nested fields in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP COLUMN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for sorted tables in Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for time type in Cassandra.&lt;/li&gt;
  &lt;li&gt;Faster aggregations containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; with dynamic patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-410.html&quot;&gt;Trino 410&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sheet&lt;/code&gt; table function in Google Sheets.&lt;/li&gt;
  &lt;li&gt;Better file pruning in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introducing-the-ignite-connector-to-trino&quot;&gt;Introducing the Ignite connector to Trino&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://trino.io/docs/current/connector/ignite.html&quot;&gt;Trino Ignite connector&lt;/a&gt;
was added a couple releases ago in Trino 408. It’s not every day that we add a
new connector to Trino, and so the topic of today’s episode is exploring the
connector, what it does, and what its use cases are. After that, we are going
to talk about the process of coming in as an outside engineer and contributing 
an entirely new connector to Trino.&lt;/p&gt;

&lt;h2 id=&quot;what-is-ignite&quot;&gt;What is Ignite?&lt;/h2&gt;

&lt;p&gt;Apache Ignite is an in-memory distributed database, comparable to others you may
be familiar with like Redis and SingleStore. If you’re not familiar with them or 
with in-memory computing, the gist is that by focusing on using RAM instead of
disk storage, you can create a database system which is &lt;em&gt;much&lt;/em&gt; faster - the
Ignite website advertises 10-1000x improvements. Of course, this is more
expensive, too, so it thrives in settings where performance is critical.&lt;/p&gt;

&lt;p&gt;With an initial release 7 years ago, Ignite is still a relative newcomer among
in-memory databases, coming with modern bells and whistles that has it
positioned to become a successor to those other, comparable databases mentioned
above. It also has some key functionality that sets it apart, including a
fully-distributed architecture which can use disk storage, allowing it to scale
horizontally.&lt;/p&gt;

&lt;h2 id=&quot;contributing-the-ignite-connector&quot;&gt;Contributing the Ignite connector&lt;/h2&gt;

&lt;p&gt;The Trino community and developers try their best to be active reviewers,
collaborators, and participants on pull requests coming in from outside
contributors. Massive contributions like the Ignite connector can take a lot of
round trips, back-and-forth discussion, and work from both the contributor and
the project’s maintainers to get it into a state where it is ready to merge and
go live for users to try out.&lt;/p&gt;

&lt;p&gt;To give you an idea,
&lt;a href=&quot;https://github.com/trinodb/trino/pull/8323&quot;&gt;the pull request (PR) to contribute Ignite&lt;/a&gt;
was opened in mid-June, 2021. It received immediate feedback from a couple
maintainers, went through a few round trips with amendments, re-reviews, more
edits, and then other reviews. But in an open source environment, each round
trip can tend to take longer and longer. Progress stalled in November 2021, and
neither Jason nor the maintainers poked the Ignite PR for nearly a year. In
October 2022, as part of Trino DevRel’s roundup of stale and out-of-date pull
requests, we bumped back into the work that Jason had done. The wheels began to
turn again, starting slow but picking up the pace, until it returned to full and
active development, with several maintainers checking in frequently until the
connector was ready to go. But that’s the story from an observer, and we’ve got
Jason here to go into more detail.&lt;/p&gt;

&lt;h3 id=&quot;questions-for-jason&quot;&gt;Questions for Jason&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;How was the Trino review process?&lt;/li&gt;
  &lt;li&gt;Were there any major lessons you picked up along the way?&lt;/li&gt;
  &lt;li&gt;What tips would you give to someone else looking to add something into Trino?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode-13493-add-support-for-migrate-procedure-in-iceberg&quot;&gt;PR of the episode: #13493: Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;migrate&lt;/code&gt; procedure in Iceberg&lt;/h2&gt;

&lt;p&gt;If you’ve been in the data space for a while, you may know that there’s a bit of
a prevailing current in migrating from Hive to Iceberg. Out with the old, in
with the new, and in with the performance gains. &lt;a href=&quot;https://github.com/ebyhr&quot;&gt;Yuya Ebihara&lt;/a&gt;,
one of the Trino maintainers,
&lt;a href=&quot;https://github.com/trinodb/trino/pull/13493&quot;&gt;has added a table procedure to Trino’s Iceberg connector&lt;/a&gt;
to make that process much, much simpler. Rather than a slow, manual, and arduous
process, if you have a Hive table stored in a file format supported by Iceberg,
it’s now as simple as calling the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;migrate&lt;/code&gt; table procedure and letting it run.
The procedure copies the schema, partitioning, properties, and location of the
source table, then streams in all the data files from the source table to
re-build it all in the Iceberg format. Neat, right?&lt;/p&gt;

&lt;h2 id=&quot;more-about-ignite&quot;&gt;More about Ignite&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://ignite.apache.org/&quot;&gt;Check out the Ignite website!&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/ApacheIgnite&quot;&gt;Ignite on Twitter&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/showcase/apache-ignite/&quot;&gt;Ignite on LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Kevin Haley will be hosting an in-person event,
&lt;a href=&quot;https://www.meetup.com/boston-data-engineering/events/291662797/&quot;&gt;Getting to Know Trino&lt;/a&gt;,
in Boston, Massachusetts on Wednesday, April 5. You need to register in advance,
so if you’re in the Boston area and interested in attending, go sign up!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>45: Trino swimming with the DolphinScheduler</title>
      <link href="https://trino.io/episodes/45.html" rel="alternate" type="text/html" title="45: Trino swimming with the DolphinScheduler" />
      <published>2023-02-23T00:00:00+00:00</published>
      <updated>2023-02-23T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/45</id>
      <content type="html" xml:base="https://trino.io/episodes/45.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at
  &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/davidzollo/&quot;&gt;David Zollo&lt;/a&gt;, Apache
DolphinScheduler PMC Chair&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/zhongjiajie/&quot;&gt;Jay Chung&lt;/a&gt;,  Apache
DolphinScheduler PMC Member&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/niko-zeng/&quot;&gt;Niko Zeng&lt;/a&gt;,  Apache
DolphinScheduler Community Manager&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/williamk2000/&quot;&gt;William Guo&lt;/a&gt;, Apache Software 
Foundation Member&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;recap-of-trino-in-2022&quot;&gt;Recap of Trino in 2022&lt;/h2&gt;

&lt;p&gt;Highlights from the blog post &lt;a href=&quot;/blog/2023/01/10/trino-2022-the-rabbit-reflects.html&quot;&gt;The rabbit reflects on Trino in 2022&lt;/a&gt; touch upon various aspects.&lt;/p&gt;

&lt;h2 id=&quot;release-407&quot;&gt;Release 407&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-407.html&quot;&gt;Trino 407&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for highly selective queries.&lt;/li&gt;
  &lt;li&gt;Improved performance when reading numeric, string and timestamp
values from Parquet files.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function for full query pass-through in Cassandra.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unregister_table&lt;/code&gt; procedure in Delta Lake and Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for writing to the change data feed in Delta Lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cole’s comments:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;For our contributors, we added a new action to track and ping the developer
relations team on stale pull requests to further prompt maintainers to take a
look. This doesn’t have any immediate impact on end users, but it’ll improve
the development and contribution process.&lt;/li&gt;
  &lt;li&gt;A Kerberos fix for the Kudu connector should make using it much
less of a headache on long-running Trino instances.&lt;/li&gt;
  &lt;li&gt;There were some really sophisticated performance improvements
that came from shifting default config values and adding some new
ones, all of which took a whole lot of testing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-407.html&quot;&gt;Trino 407&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;what-is-workflow-orchestration&quot;&gt;What is workflow orchestration?&lt;/h2&gt;

&lt;p&gt;Workflow orchestration refers to the process of coordinating and automating
complex sequence of operations known as workflows consisting of multiple
interdependent tasks. This involves designing and defining the workflow,
scheduling and executing the tasks, monitoring the progress and outcomes, and
handling any errors or exceptions that may arise. In the context of Trino, the
tasks are typically the processing of SQL queries on one or more Trino cluster
and other related systems to create a data pipeline or similar automation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/45/data-pipelines.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-do-we-need-a-workflow-orchestration-tool-for-building-a-data-lake&quot;&gt;Why do we need a workflow orchestration tool for building a data lake?&lt;/h2&gt;

&lt;p&gt;Building a data lake can involve many complex and interdependent data processing
tasks, which can be challenging to manage and scale without a workflow
orchestration tool. Sometimes we can consider tools like Trino at the center of 
the universe, and perhaps it would be easier to schedule SQL queries with a much
simpler tool. Most companies, however, require a larger variety of tasks to
build a data lake that interoperate on more than just running SQL on Trino. Even
if you primarily run Trino SQL scripts to run these jobs, it is better to have
an orchestration tool instead of managing all processes manually.&lt;/p&gt;

&lt;h2 id=&quot;what-is-apache-dolphinscheduler&quot;&gt;What is Apache DolphinScheduler?&lt;/h2&gt;

&lt;p&gt;&lt;img width=&quot;75%&quot; src=&quot;/assets/episode/45/dolphin-scheduler.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Apache DolphinScheduler is an open-source, distributed workflow scheduling
platform designed to manage and execute batch jobs, data pipelines, and ETL
processes. DolphinScheduler enables users to create and manage consecutive jobs
run easily, including support for different types of tasks, such as SQL
statements, shell scripts, Spark jobs, Kubernetes deployments, and many others.
In short, it’s a powerful and user-friendly workflow orchestration platform that
enables users to automate and manage their complex data processing tasks.&lt;/p&gt;

&lt;p&gt;Read &lt;a href=&quot;https://blog.devgenius.io/dolphinscheduler-helps-trino-quickly-realize-the-integrated-data-construction-of-lake-and-warehouse-cde095b6573b&quot;&gt;this blog on Trino and Apache DolphinScheduler&lt;/a&gt;
to find out more.&lt;/p&gt;

&lt;h3 id=&quot;does-dolphinscheduler-have-any-computing-engine-or-storage-layer&quot;&gt;Does DolphinScheduler have any computing engine or storage layer?&lt;/h3&gt;

&lt;p&gt;DolphinScheduler is a powerful tool for managing and orchestrating data
processing workflows across a range of computing engines and storage systems,
but it does not provide its own computing or storage capabilities.&lt;/p&gt;

&lt;h2 id=&quot;what-are-the-differences-to-other-workflow-orchestration-systems&quot;&gt;What are the differences to other workflow orchestration systems?&lt;/h2&gt;

&lt;p&gt;Airflow is the incumbent de facto workload orchestrator. Many data engineers 
currently rely on Airflow to handle their workflow orchestration today so it
helps to understand DolphinScheduler’s benefits in relation to Airflow. Both
Dolphin Scheduler and Airflow are designed to be scalable and highly available
to support large-scale distributed environments.&lt;/p&gt;

&lt;p&gt;Airflow supports a wide range of third-party integrations, including popular
data processing frameworks such as Trino, Spark, and Flink, as well as with
cloud services such as AWS and Google Cloud. Dolphin Scheduler supports a
similar range of data processing frameworks and tools. This makes both platforms
suitable for managing diverse data processing tasks.&lt;/p&gt;

&lt;p&gt;DolphinScheduler project believes that future data governance belongs to data
engineers and consumers alike and should not be centralized to a single team.
Product-focused engineering teams should have access to data and be able to
orchestrate workflows without the need for extensive coding skills.
DolphinScheduler uses a drag and drop web UI to create and manages workflows
while also providing programmatic access using tools like Python SDK and Open
API.&lt;/p&gt;

&lt;p&gt;A positive feature of DolphinScheduler supporting users outside the data team
through a UI is that it offers robust security features. This includes
authentication, authorization, and data encryption, to ensure that users’ data
and workflows are protected.&lt;/p&gt;

&lt;p&gt;DolphinScheduler has relatively limited documentation and community support
since they are a newer project, but they are working hard to improve the 
developer experience and documentation.&lt;/p&gt;

&lt;h2 id=&quot;how-does-dolphinscheduler-deal-with-failures&quot;&gt;How does DolphinScheduler deal with failures?&lt;/h2&gt;

&lt;p&gt;Failure is an inevitable aspect of data workflow orchestration. The merits of
many of these orchestration tools come from how well they aid users in
responding to failures by monitoring health and notifying users when things go
wrong.&lt;/p&gt;

&lt;h3 id=&quot;does-dolphinscheduler-have-an-alarm-mechanism-itself&quot;&gt;Does DolphinScheduler have an alarm mechanism itself?&lt;/h3&gt;

&lt;p&gt;Apache DolphinScheduler supports user notifications as part of a workflow. This
mechanism is designed to help users monitor and manage their workflows more
effectively and respond quickly to any issues.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/45/alerts.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;These alerts can be configured to notify users via email, SMS, or other
communication channels, and can include details such as the name of the
workflow, the name of the failed task, and the error message or stack trace
associated with the failure.&lt;/p&gt;

&lt;p&gt;In addition to these configurable alerts, DolphinScheduler provides a dashboard
for monitoring the status and progress of workflows and tasks. It includes
real-time updates and visualizations of workflow performance and status. The
dashboard helps users quickly identify any issues or bottlenecks in their
workflows and take corrective action as needed.&lt;/p&gt;

&lt;p&gt;&lt;img width=&quot;80%&quot; src=&quot;/assets/episode/45/monitoring.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-creating-a-simple-trino-workflow-in-dolphinscheduler&quot;&gt;Demo of the episode: Creating a simple Trino workflow in DolphinScheduler&lt;/h2&gt;

&lt;p&gt;For this episodes’ demo, we look at creating a workflow consisting of a Trino
query execution managed by a workflow in DolphinScheduler.&lt;/p&gt;

&lt;p&gt;Run the demo by following 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/community_tutorials/dolphinscheduler&quot;&gt;the steps listed&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-episode-improve-performance-of-parquet-files&quot;&gt;PR of the episode: Improve performance of Parquet files&lt;/h2&gt;

&lt;p&gt;While we’re on the topic of data lakes, we had several performance for Parquet
files in release 407 from contributor and maintainer, 
&lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;@raunaqmorarka&lt;/a&gt;. This change includes an
improvement on performance of reading Parquet files for
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15713&quot;&gt;decimal types&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15850&quot;&gt;numeric types&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15923&quot;&gt;string types&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/15954&quot;&gt;timestamp and boolean types&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While Trino has historically had better performance for the ORC format, the 
Parquet file type has grown drastically in popularity and so this is one of 
many examples of the improving support around Parquet files for data lakes.&lt;/p&gt;

&lt;h2 id=&quot;find-out-more-about-dolphinscheduler&quot;&gt;Find out more about DolphinScheduler&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://dolphinscheduler.apache.org/&quot;&gt;https://dolphinscheduler.apache.org/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/dolphinscheduler&quot;&gt;https://github.com/apache/dolphinscheduler&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/dolphinschedule&quot;&gt;https://twitter.com/dolphinschedule&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-events&quot;&gt;Trino events&lt;/h2&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;https://trino.io/community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>44: Seeing clearly with Metabase</title>
      <link href="https://trino.io/episodes/44.html" rel="alternate" type="text/html" title="44: Seeing clearly with Metabase" />
      <published>2023-01-26T00:00:00+00:00</published>
      <updated>2023-01-26T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/44</id>
      <content type="html" xml:base="https://trino.io/episodes/44.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/luispaolini/&quot;&gt;Luis Paolini&lt;/a&gt;, Success Engineer at
&lt;a href=&quot;https://www.metabase.com/&quot;&gt;Metabase&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/andrewdibiasio/&quot;&gt;Andrew DiBiasio&lt;/a&gt;, Software
Engineer at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/piotrleniartek&quot;&gt;Piotr Leniartek&lt;/a&gt;, Product Manager
at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;recap-of-trino-in-2022&quot;&gt;Recap of Trino in 2022&lt;/h2&gt;

&lt;p&gt;Highlights from the blog post &lt;a href=&quot;/blog/2023/01/10/trino-2022-the-rabbit-reflects.html&quot;&gt;The rabbit reflects on Trino in 2022&lt;/a&gt; touch upon various aspects.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lots of growth for the community celebrating 10 years Trino&lt;/li&gt;
  &lt;li&gt;Trino Summit, Cinco de Trino, Trino Community Broadcast, and more content&lt;/li&gt;
  &lt;li&gt;Trino: The Definitive Guide second edition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lots of Trino releases and new features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; support&lt;/li&gt;
  &lt;li&gt;JSON functions&lt;/li&gt;
  &lt;li&gt;Table functions&lt;/li&gt;
  &lt;li&gt;Fault-tolerant execution&lt;/li&gt;
  &lt;li&gt;Upgrade to Java 17&lt;/li&gt;
  &lt;li&gt;New Delta Lake, Hudi, and MariaDB connectors&lt;/li&gt;
  &lt;li&gt;Tons and tons of performance improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-404-to-406&quot;&gt;Releases 404 to 406&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-404.html&quot;&gt;Trino 404&lt;/a&gt; not found&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-405.html&quot;&gt;Trino 405&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER COLUMN ... SET DATA TYPE&lt;/code&gt; statement.&lt;/li&gt;
  &lt;li&gt;Support for Apache Arrow when reading from BigQuery.&lt;/li&gt;
  &lt;li&gt;Support for views in the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support for the Iceberg REST catalog.&lt;/li&gt;
  &lt;li&gt;Support for Protobuf encoding in the Kafka connector.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in the MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and query pushdown in the Redshift connector.&lt;/li&gt;
  &lt;li&gt;Performance improvements when reading Parquet data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-406.html&quot;&gt;Trino 406&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for JDBC catalog in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Support for fault-tolerant execution in the BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Support for exchange spooling on HDFS.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CHECK&lt;/code&gt; constraints with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; statements.&lt;/li&gt;
  &lt;li&gt;Improved performance for Parquet files with the Delta Lake, Hive, Hudi and
Iceberg connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-405.html&quot;&gt;Trino 405&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-406.html&quot;&gt;Trino 406&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also shipped trino-python-client 0.321.0 with the following improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for SQLAlchemy 2.0.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varbinary&lt;/code&gt; query parameters.&lt;/li&gt;
  &lt;li&gt;Add support for variable precision &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datetime&lt;/code&gt; types.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;what-is-metabase&quot;&gt;What is Metabase&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;../assets/images/logos/metabase-small.png&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.metabase.com/&quot;&gt;Metabase&lt;/a&gt; is the  easy, open-source BI tool with the
friendly UX and integrated tooling to let your company explore data on their
own. Everyone in your company can ask questions and learn from your data.&lt;/p&gt;

&lt;p&gt;Running Metabase locally is easy. Try with a container runtime and the 300 MB
image:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run -it -p 3000:3000 metabase/metabase
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Or use a JVM and the 260MB single JAR file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;wget https://downloads.metabase.com/latest/metabase.jar
java -jar metabase.jar
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can go zero to dashboard in under 6 minutes - &lt;a href=&quot;https://www.metabase.com/demo&quot;&gt;learn more from the
demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-screenshot.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Core features and advantages of Metabase include the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Visual query build&lt;/li&gt;
  &lt;li&gt;Dashboards&lt;/li&gt;
  &lt;li&gt;Models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Metabase is a web-based application that you run on a server. You can make it
available to multiple users. It uses SQL to create queries, reports,
visualizations, dashboards, and more.&lt;/p&gt;

&lt;p&gt;You can host it yourself locally, run it in your own datacenter or use the
cloud:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-self-hosted.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/44/metabase-cloud-hosted.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Metabase is an open source project licensed under the GNU Affero General Public
License (AGPL) license. It is written in Clojure and therefore runs on the Java
virtual machine.&lt;/p&gt;

&lt;p&gt;Following is a high-level architecture diagram:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Metabase is also the name of the company, founded in 2014. It provides an
expanded version under a commercial license, a SaaS version of the application,
support and others services, and manages the open source project.&lt;/p&gt;

&lt;p&gt;Metabase is running in more than 50K instances around the world, including over
2K using the SaaS version.&lt;/p&gt;

&lt;h2 id=&quot;history-of-metabase-and-trino&quot;&gt;History of Metabase and Trino&lt;/h2&gt;

&lt;p&gt;Metabase was first released in 2015 as version 0.9. Since the initial release it
has grown to be a well known and widely used BI application.&lt;/p&gt;

&lt;p&gt;A Presto driver was created in 2018. It directly integrated with the client REST
API. With the rename of Presto to Trino, Manfred &lt;a href=&quot;https://github.com/metabase/metabase/pull/15160&quot;&gt;created a
PR&lt;/a&gt; that replicates this for
Trino to ensure continued support for the community. In the discussion it was
decided that it would be better to use the Trino JDBC driver, similar to how
other drivers for Metabase work.&lt;/p&gt;

&lt;p&gt;After some more demand from the user and customer community, Starburst and
Metabase established a collaboration, and started implementation of the current
driver. Piotr led the charge, Andrew buckled down and learned Clojure, and
together a first release was created and tested. The driver is now provided as
an open source project managed by Starburst.&lt;/p&gt;

&lt;h2 id=&quot;core-advantages-of-using-metabase-with-trino&quot;&gt;Core advantages of using Metabase with Trino&lt;/h2&gt;

&lt;p&gt;With Metabase and the driver for Trino, Trino users have access to a well
established and proven open source BI tool. It is suitable for internal usage in
any organization, and users can upgrade to commercial version for more demanding
deployments and use cases.&lt;/p&gt;

&lt;p&gt;The combination of Trino and Metabase also provides a number of unique benefits
for Metabase users that are not available with typical drivers for systems.
These are typically databases that support SQL, and are limited to the specific
database.&lt;/p&gt;

&lt;p&gt;With Trino and the driver, you have access to the following unique features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Metabase users can connect to databases that do no yet have a Metabase driver,
but are supported by Trino&lt;/li&gt;
  &lt;li&gt;Trino also enables using SQL for system that don’t support SQL such as MongoDB
or Elasticsearch, and therefore allows Metabase usage with these systems.&lt;/li&gt;
  &lt;li&gt;With Trino you can join data from different catalogs in the same SQL query.
This also applies to Metabase reports or visualizations.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;Can I join multiple engines? Yes &lt;br /&gt;
Can I join SQL and no-SQL engines? YES!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ElasticSearch, Google Spreadsheets, Cassandra, Redis and others are all
accessible with Trino. Specifically this also opens up querying object storage
data lakes on S3 and other systems with the Hive, Delta Lake, Iceberg, and Hudi
connectors - all from Metabase.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-trino-datasources.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Metabase also includes support for access control for any connected datasource,
all the way to row level security. This includes Trino and can be used to secure
Trino access through Metabase to a large group of your Trino users, such as all
BI users. It can even be used to add row level security for No SQL databases.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/metabase-no-sql-security.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-metabase-and-trino&quot;&gt;Demo of the episode: Metabase and Trino&lt;/h2&gt;

&lt;p&gt;Luis shows us the demo from his repository at
&lt;a href=&quot;https://github.com/paoliniluis/metabase-trino&quot;&gt;https://github.com/paoliniluis/metabase-trino&lt;/a&gt;.
Watch our video to see it and action, and check out the instructions in the
repository to try yourself.&lt;/p&gt;

&lt;h2 id=&quot;real-world-use-cases-at-meesho&quot;&gt;Real world use cases at Meesho&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;../assets/images/logos/meesho-small.png&quot; align=&quot;right&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.meesho.com/&quot;&gt;Meesho&lt;/a&gt; is India’s fastest growing internet commerce
company. They provide a large retail website and support small business
entrepreneurs with their platform.&lt;/p&gt;

&lt;p&gt;Meesho relies on the Trino, Metabase and the Trino Metabase driver from
Starburst for their data platform.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../assets/episode/44/meesho-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Piotr and Luis share more details:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Meesho needs the ability to query the lake, with high speed, concurrency and
scale. It was not possible before Trino, in the form of Starburst Enterprise,
and Metabase were introduced.&lt;/li&gt;
  &lt;li&gt;Meesho observes more than 13 million queries from Metabase in 10 months.&lt;/li&gt;
  &lt;li&gt;Meesho uses Metabase to add security and governance for the data assets.&lt;/li&gt;
  &lt;li&gt;A next planned step is to integrate with &lt;a href=&quot;https://www.metabase.com/docs/latest/data-modeling/models#enable-model-caching-in-metabase&quot;&gt;Metabase Model
Caching&lt;/a&gt;
to improve user experience even more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-episode&quot;&gt;PR of the episode&lt;/h2&gt;

&lt;p&gt;Let’s explore the code a bit, instead of focussing on a specific PR. The whole
driver codebase is open source at
&lt;a href=&quot;https://github.com/starburstdata/metabase-driver&quot;&gt;https://github.com/starburstdata/metabase-driver&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As mentioned earlier the whole driver is written in Clojure, and Andrew tells us
more about his experience writing the driver and working with the two systems.&lt;/p&gt;

&lt;p&gt;We also talk about a recent community &lt;a href=&quot;https://github.com/starburstdata/metabase-driver/pull/59&quot;&gt;PR for datetime
functions&lt;/a&gt; and the
ongoing work to support model caching.&lt;/p&gt;

&lt;h2 id=&quot;datanova-and-other-trino-events&quot;&gt;Datanova and other Trino events&lt;/h2&gt;

&lt;p&gt;We invite you all to join us for the &lt;a href=&quot;http://bit.ly/3j2N9Q9&quot;&gt;free, virtual conference
Datanova&lt;/a&gt; from Starburst. Trino and related tools and
approaches are touched upon in many presentations and discussion.&lt;/p&gt;

&lt;p&gt;If you have an event that is related to Trino, let us know so we can add it to
the &lt;a href=&quot;../community.html#events&quot;&gt;Trino events calendar&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Metabase and Trino are a great combination of tools. Together they unlock use
cases that are difficult or impossible to implement with other tools. Give it a
try!&lt;/p&gt;

&lt;h2 id=&quot;rounding-out&quot;&gt;Rounding out&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, get the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>The rabbit reflects on Trino in 2022</title>
      <link href="https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects.html" rel="alternate" type="text/html" title="The rabbit reflects on Trino in 2022" />
      <published>2023-01-10T00:00:00+00:00</published>
      <updated>2023-01-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects</id>
      <content type="html" xml:base="https://trino.io/blog/2023/01/10/trino-2022-the-rabbit-reflects.html">&lt;p&gt;It’s that time of the year where everyone gives excessively broad or niche
predictions about the finance market, venture capital, or even the data
industry. And we are now bombarded with &lt;a href=&quot;https://www.githubunwrapped.com/&quot;&gt;“year-in-review” 
summaries&lt;/a&gt; where we find out just how much
data is being collected to generate those summaries. End-of-year reflections are
always useful because you can find patterns of what’s going well and what’s
going poorly. It’s also good to pause and take stock of the things that did go
well, because without that, you’ll only be looking at the list of things that
you still have to do, and that isn’t healthy for anybody. In that spirit, let’s
reflect on what we’ve been able to accomplish as a community this year, as well
as what to look forward to in the next year!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;2022-by-the-numbers&quot;&gt;2022 by the numbers&lt;/h2&gt;

&lt;p&gt;Let’s take a look at the Trino project’s growth and what happened specifically
in the past year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;1,031,842 unique visits 🙋to the Trino site&lt;/li&gt;
  &lt;li&gt;116,231 unique blog post views 👩‍💻 on the Trino site&lt;/li&gt;
  &lt;li&gt;60,296 views 👀 on YouTube&lt;/li&gt;
  &lt;li&gt;5,982 hours watched ⌚on YouTube&lt;/li&gt;
  &lt;li&gt;4,696 new commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;2,775 new members 👋 in Slack&lt;/li&gt;
  &lt;li&gt;2,769 new stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;2,550 pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;1,465 issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;1,322 new followers 🐦 on Twitter&lt;/li&gt;
  &lt;li&gt;1,068 pull requests closed ❌ in GitHub&lt;/li&gt;
  &lt;li&gt;702 new subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;658 average weekly members 💬 in Slack&lt;/li&gt;
  &lt;li&gt;56 videos 🎥 uploaded to YouTube&lt;/li&gt;
  &lt;li&gt;37 Trino 🚀 releases&lt;/li&gt;
  &lt;li&gt;36 blog ✍️ posts&lt;/li&gt;
  &lt;li&gt;12 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;12 Trino 🍕 meetups&lt;/li&gt;
  &lt;li&gt;2 Trino ⛰️ Summits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Trino website got an impressive number of unique visits, also referred to as
entrances. This metric filters out refreshes and through traffic to count the
number of times a visitor started a unique session. Blog posts saw a 47 percent
increase from last year. Slack membership grew 13 percent and average weekly
active members grew an exciting 25 percent. YouTube views have increased by 218
percent. We’ve more than doubled the number of hours watched, which makes sense,
as we’ve nearly doubled the number of subscribers since last year.&lt;/p&gt;

&lt;p&gt;The project’s velocity hasn’t slowed down either. The number of commits grew 
27.6 percent this year and the number of created issues grew by 20 percent. This
increase in demand for features also pushed up merged pull requests numbers by
nearly 29 percent!&lt;/p&gt;

&lt;p&gt;Why are we pointing out the number of closed pull requests that weren’t merged?
We are improving communication with contributors regarding when and why we
explicitly decide not to move forward with a pull request. Part of this has
included a new initiative to close out old and inactive pull requests. There
have been a good number of pull requests that have fallen through the cracks and
are missing communication from the pull request creator or reviewer. The DevRel
team, Brian Olsen, Cole Bowden, and Manfred Moser, are actively working on
improving the workflow around pull requests and issues. Cole recently posted a 
&lt;a href=&quot;/blog/2023/01/09/cleaning-up-the-trino-backlog.html&quot;&gt;blog that dives deeper&lt;/a&gt;
into what this team is actively working on to improve the experience of 
contributing to the project.&lt;/p&gt;

&lt;h3 id=&quot;trino-is-trending&quot;&gt;Trino is trending&lt;/h3&gt;

&lt;p&gt;A lot of these metrics indicate the growing popularity of Trino, but they also
help drive further awareness of the project to others. One metric we pay close
attention to is the number of visitors we get through blog posts, as they grow
Trino’s visibility. This increases the number of contributors and users that
shape Trino to be the best analytics SQL query engine on the planet. One of our
most successful blog posts was &lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Why leaving Facebook/Meta was the best thing we
could do for the Trino Community&lt;/a&gt;.
The day this blog post was released, it doubled the website traffic we received
and set the record for blog post views or website views in a single day. For
reference, our previous record was the post we had when the project was 
rebranded.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/web-views.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This post gained a lot of traction for two reasons. Posts related to Meta and
the inner workings of open source communities naturally perform well, as many
developers are interested in these topics - drama is exciting! But you can have
an interesting topic that doesn’t go viral if nobody sees it. The catalyst to
this success was actually when &lt;a href=&quot;https://news.ycombinator.com/item?id=32323746&quot;&gt;David Phillips posted this to Hacker
News&lt;/a&gt;. We hit the top ten of 
Hacker News and occupied the front page for about two days.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/hacker-news.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;So what is the takeaway here? We need your help! While it made sense for David
to do this post once, &lt;a href=&quot;https://news.ycombinator.com/newsguidelines.html&quot;&gt;Hacker News generally looks down upon repeated
self-promotion&lt;/a&gt;. Clearly 
&lt;a href=&quot;http://redd.it/zbe333&quot;&gt;there’s a lot of people interested&lt;/a&gt; in Trino, and Hacker
News and many other social media outlets are how we get the word out. If you
don’t think that sharing has much effect, we hope sharing this impact motivates
you to help us. We don’t want to keep Trino the hidden secret of Silicon Valley
much longer. We need your help to really get people continuously reading and
hearing about all things Trino. So share any time you see something cool going
on in our community!&lt;/p&gt;

&lt;h3 id=&quot;trino-touches-the-world&quot;&gt;Trino touches the world&lt;/h3&gt;

&lt;p&gt;Let’s take a look at the number of users who have initiated at least one session
on the Trino site in 2022 by top 10 countries. This goes to show the true global
reach this project has attained in 10 years.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;123,326 USA 🇺🇸users&lt;/li&gt;
  &lt;li&gt;33,540 Indian 🇮🇳users&lt;/li&gt;
  &lt;li&gt;30,955 Chinese 🇨🇳users&lt;/li&gt;
  &lt;li&gt;12,282 British 🇬🇧users&lt;/li&gt;
  &lt;li&gt;11,638 German 🇩🇪users&lt;/li&gt;
  &lt;li&gt;10,760 Canadian 🇨🇦 users&lt;/li&gt;
  &lt;li&gt;9,980 Brazilian 🇧🇷users&lt;/li&gt;
  &lt;li&gt;9,098 Singaporean 🇸🇬users&lt;/li&gt;
  &lt;li&gt;8,649 South Korean 🇰🇷users&lt;/li&gt;
  &lt;li&gt;8,636 Japanese 🇯🇵users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/world.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Our reach currently favors the USA, but our aim is to grow Trino in all
countries that are starting to show interest. The new edition of “Trino: The
Definitive Guide” is being translated into Chinese, &lt;a href=&quot;https://simpligility.ca/2022/12/trino-guide-for-everyone-in-2023/&quot;&gt;Polish, and
Japanese&lt;/a&gt;. If
you want to translate the book to your local language, please reach out to
Manfred Moser.&lt;/p&gt;

&lt;h2 id=&quot;trino-celebrates-its-tenth-birthday&quot;&gt;Trino celebrates its tenth birthday&lt;/h2&gt;

&lt;p&gt;Of all the incredible things that happened, one that gave us cause to reflect
was Trino’s tenth birthday. Martin, Dain, and David &lt;a href=&quot;https://trino.io/development/vision.html&quot;&gt;cite
longevity&lt;/a&gt; of the project as one of the
core philosophies that govern decisions around Trino. We expect that Trino will
be used for at least the next 20 years. We build for the long term. This first
decade &lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;has been an adventurous
ride&lt;/a&gt;, and wow has it &lt;a href=&quot;/blog/2022/08/08/trino-tenth-birthday.html&quot;&gt;produced an
incredible system&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-tenth-birthday/how-it-started-going.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We wanted to do something special with the community to celebrate this
milestone, so Brian put together a birthday video to timeline the evolution of
Presto and now Trino. We had a premiere watch party on the day of the tenth
anniversary and got some folks’ reactions. Take a look at the video if you
haven’t yet, you don’t want to miss it.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot; style=&quot;text-align: center;&quot;&gt;
 
&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/hPD95_-bZZw&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;
      
&lt;/div&gt;

&lt;h2 id=&quot;trino-summit&quot;&gt;Trino Summit&lt;/h2&gt;

&lt;p&gt;The next event in 2022 was the Trino Summit, which was the first in-person
summit we’ve had as Trino, with well over 750 attendees. We had a stellar lineup
of speakers from companies like Apple, Astronomer, Bloomberg, Comcast,
Goldman Sachs, Lyft, Quora, Shopify, Upsolver, and Zillow.&lt;/p&gt;

&lt;p&gt;This summit had a Pokémon theme, making the analogy that data sources are much
like Pokémon and Trino is much like a Pokémon trainer trying to access and
federate all the data, train it, and level the data up. Check out the video for
a small summary, and if you missed this event, we have all 
&lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the recordings and slides available&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot; style=&quot;text-align: center;&quot;&gt;
 
&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/R1Z0VnKrQ9w&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;
      
&lt;/div&gt;

&lt;p&gt;We want to thank &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt; for hosting this event and
all the sponsors for making this year’s summit possible. As usual, a huge thanks
to the community for showing up, engaging with each other, and bringing your
stories and curiosity.&lt;/p&gt;

&lt;h3 id=&quot;cinco-de-trino&quot;&gt;Cinco de Trino&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;Cinco de Trino&lt;/a&gt; was
our mini Trino Summit held in the first half of the year. It dove into using
Trino with complementary tools to build a data lakehouse. The virtual event was
held on Cinco de Mayo (5th of May), which gave it a Margaritaville, on-the-lake
vibe. We used this conference as a platform to &lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;launch the long-awaited Project
Tardigrade features&lt;/a&gt;
around the fault-tolerance mode for Trino.&lt;/p&gt;

&lt;h4 id=&quot;trino-contributor-congregation&quot;&gt;Trino Contributor Congregation&lt;/h4&gt;

&lt;p&gt;This year, we began what we are calling the Trino Contributor Congregation
(TCC), which brings together Trino contributors, maintainers, and developer
relations under the same roof. This congregation was to counter the siloed
nature of Trino development that occurred during the pandemic. Many community
members felt like their work wasn’t being seen and much of this was due to lack
of communication, and especially face-to-face communication, which builds
empathy and demands attention. The TCCs aim to increase connections and
collaboration between maintainers and contributors, create opportunities for
highly technical exchange of ideas and plans for Trino, and learn about usage
scenarios and issues from each other. This is different from the Trino Summit
since it focuses on gathering those who contribute code to keep the
conversations focused on developing features and removing blockers for
contributors.&lt;/p&gt;

&lt;p&gt;The first TCC happened just following Trino Summit in Palo Alto. This was
convenient for many, as a lot of folks were already in San Francisco to attend
Trino Summit. Moving forward we will continue having in-person TCCs around Trino
Summit to minimize the travel expected for anyone wanting to attend in-person
TCCs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/tcc.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Along with the in-person TCC, we also had the first virtual TCC in December.
This included a great deal of people in Eurasia who weren’t able to travel to
San Francisco in November. We covered mostly similar topics but with a larger
amount of interaction from those new voices.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/2022-review/virtual-tcc.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;During these discussions the biggest topics covered timelines of existing
roadmap items and suggestions for other items that should get more attention.
We talked about upcoming connectors and plugins, and all the required
infrastructure needed to support that. A recurring theme was the need for better
testing infrastructure. The more information we can gather as a community, the
quicker we can remove any issues as new releases come out and increase adoption
of newer versions of Trino. We also discussed desired features around
resource-intensive and batch workloads, and the new polymorphic table function
features.&lt;/p&gt;

&lt;p&gt;The biggest takeaway from these meetings was that everyone now had a better
basis to engage with each other. As we move forward, we will continue the
cadence of having these virtual TCCs to keep everyone on the same page, and have
in-person meetings when there is a larger conference. With that, let’s cover
some of the features we gained this year.&lt;/p&gt;

&lt;h2 id=&quot;features&quot;&gt;Features&lt;/h2&gt;

&lt;p&gt;Of course, one of the main deliverables of our project are Trino releases. In
2022, we improved our release process and cadence, shipping 37 releases that
were packed with features, and we’re about to dive into a high-level list of the
most exciting ones that made their way to you. For details and to keep up you
can check out the &lt;a href=&quot;https://trino.io/docs/current/release.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;fault-tolerant-execution-mode&quot;&gt;Fault-tolerant execution mode&lt;/h3&gt;

&lt;p&gt;2022 was the year of resiliency for Trino. Users have long requested adding a 
&lt;a href=&quot;https://trino.io/docs/current/admin/fault-tolerant-execution.html&quot;&gt;fault-tolerant mechanism to 
Trino&lt;/a&gt; akin to
query engines like Apache Spark. Users wanted the ability to take the queries
that they were running in Trino and scale those queries to larger data or
resource intensive queries. Experimental features were implemented in late 2021
for &lt;a href=&quot;https://github.com/trinodb/trino/pull/9361&quot;&gt;automatic query retries&lt;/a&gt; and
earlier this year &lt;a href=&quot;https://github.com/trinodb/trino/pull/9818&quot;&gt;task-level
retries&lt;/a&gt;. The efforts for these
features were codenamed &lt;a href=&quot;https://trino.io/episodes/32.html&quot;&gt;Project Tardigrade&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Fault-tolerant execution relies on storing intermediate data between task
shuffles to have data persist in an exchange spool. The first iteration of this
was AWS S3, but eventually Azure Blob Storage and Google Cloud Storage were
included. The Project Tardigrade engineers started &lt;a href=&quot;/blog/2022/02/16/tardigrade-project-update.html&quot;&gt;improving performance and
fixing bugs&lt;/a&gt; in
fault-tolerant execution as users tested the early implementation. Later, memory
efficiency for aggregations, faster data transfers, and dynamic filtering with
fault-tolerant query execution were added. The &lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;launch of fault-tolerant
execution&lt;/a&gt; happened at Cinco de
Trino. The first iterations only applied to queries being run on object-storage
connectors such as Hive, Iceberg, and Delta Lake. Recently, support for MySQL,
PostgreSQL, and SQL Server were added. These contributions added a foundation
for other JDBC connectors. A few companies, &lt;a href=&quot;https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html&quot;&gt;most notably
Lyft&lt;/a&gt;, have
adopted this feature and are scaling it in production.&lt;/p&gt;

&lt;h3 id=&quot;sql-language-improvements&quot;&gt;SQL language improvements&lt;/h3&gt;

&lt;p&gt;Here are all the notable SQL features that made it to Trino this year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/merge.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statement support&lt;/a&gt; is
 the most impactful SQL feature released this year. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; allows users to
 implement &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; functionality in one statement.
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; is not simply syntax sugar, the implementation has profound performance
 improvements. A lot of your operations can be merged (pun intended) from 
 multiple tasks into a single scan over data. This functionality is absolutely
 critical for positioning Trino as a data lakehouse query engine. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; is 
 currently available in the Hive, Iceberg, Delta Lake, Kudu, and Raptor 
 connectors. We discussed this and did a demo with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; on the recent &lt;a href=&quot;https://trino.io/episodes/40.html&quot;&gt;Trino
 Community Broadcast with Iceberg&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Another massive update was the introduction of &lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;polymorphic table
 functions&lt;/a&gt; (
 &lt;a href=&quot;https://trino.io/docs/current/functions/table.html&quot;&gt;PTFs&lt;/a&gt;). Table functions
 initially released with some initial passthrough query functionality that we
 see in connectors like Pinot, Elasticsearch, MySQL, PostgreSQL,
 &lt;a href=&quot;https://github.com/trinodb/trino/pull/12325&quot;&gt;and other JDBC connectors&lt;/a&gt;.
 However, this is only one small instance of what can be achieved with PTFs and
 the &lt;a href=&quot;https://www.youtube.com/clip/UgkxQcokpdgPjiuMKMC5-3HwHvlbmZjxAvxe&quot;&gt;true power comes from the generalization of this
 feature&lt;/a&gt;. 
 Dain and David gave &lt;a href=&quot;https://www.youtube.com/clip/Ugkx62IKgPd_v9eGBaPUHP2hyaRkWSXh8w8h&quot;&gt;a simpler explanation of
 PTFs&lt;/a&gt;. To
 dive in deeper, watch &lt;a href=&quot;https://trino.io/episodes/38.html&quot;&gt;this episode of
 the Trino Community Broadcast&lt;/a&gt; where Kasia
 Findeisen and Martin discuss PTFs in greater detail.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/8&quot;&gt;Dynamic function resolution&lt;/a&gt; has
 been discussed for many years and finally arrived. This provides the ability
 for &lt;a href=&quot;https://youtu.be/mUq_h3oArp4?t=680&quot;&gt;connectors to provide functions at
 runtime&lt;/a&gt;. Unlike before, where you needed
 to statically register your functions ahead of time, you can now provide a
 plugin that contains these functions that are resolved at runtime. This enables
 features like supporting function calls to dynamically registered user-defined
 functions in different languages like Javascript or Python. Martin and Dain go
 into great detail about how this works when &lt;a href=&quot;https://youtu.be/mUq_h3oArp4?t=1596&quot;&gt;answering this question at Trino
 Summit&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Trino gained support for JSON processing functions, which is a part of the
 &lt;a href=&quot;https://en.wikipedia.org/wiki/SQL:2016&quot;&gt;ANSI SQL 2016&lt;/a&gt; specification. This
 resolves a large number of issues reported by the community over the years.
 This includes the
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-array&quot;&gt;json_array&lt;/a&gt;,
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-object&quot;&gt;json_object&lt;/a&gt;,
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-exists&quot;&gt;json_exists&lt;/a&gt;,
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-query&quot;&gt;json_query&lt;/a&gt;, and
 &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json-value&quot;&gt;json_value&lt;/a&gt;
 functions that were added to Trino this year.&lt;/li&gt;
  &lt;li&gt;The JSON format was added to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt; statement to provide an anonymized
 query plan output to enable offline analysis.&lt;/li&gt;
  &lt;li&gt;It became possible to comment on tables, columns of tables, and even views for
 various connectors. Support for setting comments on views was introduced very
 recently and includes support for Hive and Iceberg.&lt;/li&gt;
  &lt;li&gt;A ton of new functions were added, including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_base32&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_base32&lt;/code&gt;,
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim_array&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;performance-improvements&quot;&gt;Performance improvements&lt;/h3&gt;

&lt;p&gt;Despite all the hype about vectorization being a silver bullet to make databases
go fast, the real speed comes from &lt;a href=&quot;https://www.youtube.com/clip/UgkxQwDYDS6evVJelNVjWAgrIhzg_Q-cAEyq&quot;&gt;better algorithms and better data structures
that lead to lower resource consumption&lt;/a&gt;.
Following is a list of some improvements that made their way into Trino this
year:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino now offers improved performance for a variety of operations, including
 complex join criteria pushdown to connectors, faster aggregations, faster
 joins, and better performance for large clusters. We have also implemented
 improvements specifically for aggregations with filters and for the Glue
 metastore. In addition, we now support dynamic filtering for various connectors
 and have faster query planning for the Hive, Delta Lake, Iceberg, MySQL,
 PostgreSQL, and SQL Server connectors.&lt;/li&gt;
  &lt;li&gt;Along with general performance optimizations, there have been a great deal of
 query planning optimizations that lead to better performance for specific SQL
 operators. These include faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; queries, improved performance for
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; expressions and highly selective &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; queries, and enhanced
 performance and reliability for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; operations. We also made
 performance improvements for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; queries, as well
 as faster planning of queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; predicates.&lt;/li&gt;
  &lt;li&gt;There are also optimizations for specific SQL types’ performance, such as
 string, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAP&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; types. We have also made aggregations over 
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt; columns faster and improved the performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; type and
 aggregation.&lt;/li&gt;
  &lt;li&gt;A last set of improvements come from reading open file formats like ORC and
 Parquet efficiently. We have improved the speed of reading or writing of all 
 data types from and to Parquet in general. There were also general performance
 to ORC types, and now have the ability to write Bloom filters in ORC files. We
 have also improved performance and efficiency for a wide range of ORC and
 Parquet-related operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These improvements in aggregate are at the core of what makes Trino fast. There
is no silver bullet you can plug in to speed things up. It takes time, effort,
and smart changes to improve the speed of various systems.&lt;/p&gt;

&lt;h3 id=&quot;runtime-improvements&quot;&gt;Runtime improvements&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;Trino upgraded to Java 17&lt;/a&gt;. This
upgrade improves the overall speed and lowers the memory footprint of Trino with
various performance fixes to the JVM and garbage collectors. Trino uses the G1
garbage collector which can now more efficiently reclaim memory and reduce pause
times.&lt;/p&gt;

&lt;p&gt;Aside from having to perform the upgrades, we get a lot of these performance
enhancements for free. On top of performance, upgrading to Java 17 adds new Java
language features to improve the ability to write and maintain higher quality
code.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href=&quot;/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;this blog 
post&lt;/a&gt; and watch episode 36
of &lt;a href=&quot;https://trino.io/episodes/36.html&quot;&gt;the Trino Community Broadcast&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Along with the Java upgrade, Trino now has a Docker image for ppc64le and added
CLI support for ARM64, which means Trino’s Docker image can run on AWS Graviton
processors and the image and CLI can run on the new MacBooks.&lt;/p&gt;

&lt;h3 id=&quot;security&quot;&gt;Security&lt;/h3&gt;

&lt;p&gt;Trino added the following improvements and features relevant for authentication,
authorization and integration with other security systems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;There were a lot of updates to &lt;a href=&quot;https://trino.io/docs/current/security/oauth2.html&quot;&gt;OAuth 2.0
 authentication&lt;/a&gt; like support for OAuth
 2.0 refresh tokens and allowing access token passthrough with refresh tokens
 enabled. We also added support for &lt;a href=&quot;https://trino.io/docs/current/security/oauth2.html#openid-connect-discovery&quot;&gt;automatic discovery of OpenID
 Connect&lt;/a&gt;
 metadata with OAuth 2.0 authentication, support for groups in OAuth2 claims,
 and reduced latency for OAuth2.0 authentication.&lt;/li&gt;
  &lt;li&gt;Hive, Iceberg, and Delta Lake got AWS Security Token Service (STS) credentials
 for authentication with Glue catalog and allow specifying an AWS role session
 name via S3 security mapping config.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;object-storage-connectors-hive-iceberg-delta-lake-hudi&quot;&gt;Object storage connectors (Hive, Iceberg, Delta Lake, Hudi):&lt;/h3&gt;

&lt;p&gt;One of the common uses for Trino is being used as a data lakehouse query engine.
This year we not only added two connectors to this category, but a lot of 
performance improvements across the board with the file reader and writer
improvements.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Earlier this year, we added the &lt;a href=&quot;https://trino.io/docs/current/connector/delta-lake.html&quot;&gt;Delta Lake
 connector&lt;/a&gt; to finally
 reach everyone using Trino in the Delta Lake community. Delta Lake is a table
 format that improves on the Hive table format in areas like better support for
 ACID transactions. After the initial release, we added read and write support
 on Google Cloud Storage, added support for Databricks 10.4 LTS, and improved
 overall performance of the connector. To learn more about the Delta Lake
 connector, watch the &lt;a href=&quot;https://trino.io/episodes/34.html&quot;&gt;Trino Community Broadcast on Delta 
 Lake&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/hudi.html&quot;&gt;The Hudi connector&lt;/a&gt; is a
 more recent addition, but it’s just as exciting. Hudi was created at Uber with
 the goal of handling realtime ingestion to a data lake. This connector is the
 youngest of the three newest object storage connectors, so stay tuned to see
 more features land around this connector. See how Robinhood uses &lt;a href=&quot;https://trino.io/episodes/34.html&quot;&gt;Hudi and
 Trino in the Trino Community Broadcast&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;The Iceberg connector had a massive amount of improvements as well, bringing
 it to the same level of a production-ready connector as Hive. Iceberg now has
 new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_orphan_files&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OPTIMIZE&lt;/code&gt; procedures.
 Having these capabilities along with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; are really the keys to being an
 effective lakehouse query engine. This year, Iceberg added support for the Glue
 metastore, the Avro file format, file-based access control, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; and
 time travel syntax. Iceberg received a lot of performance improvements and
 improvement in latency when querying tables with many files.&lt;/li&gt;
  &lt;li&gt;Although it seems like Hive is gradually on its way out, there are many that
 still depends on the Hive connector to be performant. Hive received support for
 S3 Select pushdown for JSON data, IBM Cloud Object Storage in Hive,
 improved performance when querying partitioned Hive tables, and the
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flush_metadata_cache()&lt;/code&gt; procedure for the Hive connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;other-connectors&quot;&gt;Other connectors&lt;/h3&gt;

&lt;p&gt;A major feature of Trino is the availability of other connectors to query all
sorts of databases with SQL. All with the speed that Trino users are used to.
Here’s some of the major improvements that landed for these connectors in 2022:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New MariaDB connector&lt;/li&gt;
  &lt;li&gt;Performance improvements with various pushdowns in the MongoDB, MySQL, Oracle,
 PostgreSQL and SQL Server connectors.&lt;/li&gt;
  &lt;li&gt;Support for bulk data insertion in SQL Server connector.&lt;/li&gt;
  &lt;li&gt;Added a query passthrough table function to numerous connectors.&lt;/li&gt;
  &lt;li&gt;Expanded SQL features for various connectors by adding support for
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SCHEMA&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, and others.&lt;/li&gt;
  &lt;li&gt;Update Cassandra connector to support v5 and v6 protocols.&lt;/li&gt;
  &lt;li&gt;A collection of improvements on the Pinot and BigQuery connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;bug-fixes&quot;&gt;Bug fixes&lt;/h3&gt;

&lt;p&gt;Any software includes issues and bugs, Trino included. Thanks to our community
we learned about many of them, and fixed even more. Continue to test new
releases and report issues. Check out &lt;a href=&quot;https://trino.io/docs/current/release.html#releases-2022&quot;&gt;all the release notes for
details&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;updates-in-the-trino-ecosystem&quot;&gt;Updates in the Trino ecosystem&lt;/h2&gt;

&lt;p&gt;Outside of the excitement within the main Trino project, there was a great deal
going on in the larger Trino community and ecosystem:&lt;/p&gt;

&lt;h3 id=&quot;trino-the-definitive-guide-second-edition&quot;&gt;Trino: The Definitive Guide second edition&lt;/h3&gt;

&lt;p&gt;Martin, Manfred, and Matt released the &lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;second version of Trino: The Definitive
Guide&lt;/a&gt;. This update of the
book from O’Reilly fixed errata, added the deployment process to include newer
Kubernetes installation methods, and updated features for all the additions that
had been released since the first version of the book. Along with this, &lt;a href=&quot;https://simpligility.ca/2022/12/trino-guide-for-everyone-in-2023/&quot;&gt;efforts
are underway to translate this
book&lt;/a&gt; to
different languages. Huge thanks to everyone involved in this!&lt;/p&gt;

&lt;h3 id=&quot;starburst-provides-trino-in-the-cloud&quot;&gt;Starburst provides Trino in the cloud&lt;/h3&gt;

&lt;p&gt;As a major community supporter, &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt; helped us
with events, marketing, developer relations, and partner cooperation. Starburst
also provided a large part of development and code contributions to Trino and
its related projects. Starburst acquired Varada and integrated the object
storage indexing technology, and they shipped many Starburst Enterprise releases
for self-managed deployments. On top of all that amazing work, Starburst
launched &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst Galaxy&lt;/a&gt;
as a powerful, multi-cloud SaaS offering of Trino. Security, cluster management,
a query editor, and many other features are included in this new platform.&lt;/p&gt;

&lt;h3 id=&quot;amazon-upgrades-athena&quot;&gt;Amazon upgrades Athena&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2022/12/01/athena.html&quot;&gt;Athena version three rolled out&lt;/a&gt;
and is now based on a recent Trino release. This is great news for Athena users
who were missing the many performance gains, expanded SQL support, and other
features from Trino, since the prior versions are based on old Presto releases.
As a result, the large Athena community and their feedback and knowledge have
become more integrated with the Trino community, and we are seeing positive
impact for Trino releases already.&lt;/p&gt;

&lt;h3 id=&quot;dbt-trino&quot;&gt;dbt-trino&lt;/h3&gt;

&lt;p&gt;dbt users rejoice! The &lt;a href=&quot;https://docs.getdbt.com/reference/warehouse-setups/trino-setup&quot;&gt;official dbt-Trino
integration&lt;/a&gt;
made it into dbt this year! This means that anyone who wanted to read or write
data to or from multiple data sources is now able to. If you want to dive into
it, &lt;a href=&quot;https://docs.starburst.io/blog/2022-11-30-dbt0-introduction.html&quot;&gt;check out this blog
post&lt;/a&gt; written
by the contributors of this integration.&lt;/p&gt;

&lt;h3 id=&quot;python-client-improvements&quot;&gt;Python client improvements&lt;/h3&gt;

&lt;p&gt;The amount of development of the
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt; doubled
this year. A major focus was on performance improvements with the sqlalchemy
integration. There was also a wide range of bug fixes.&lt;/p&gt;

&lt;h3 id=&quot;airflow-integration&quot;&gt;Airflow integration&lt;/h3&gt;

&lt;p&gt;The long-awaited &lt;a href=&quot;https://airflow.apache.org/docs/apache-airflow-providers-trino/stable/index.html&quot;&gt;Trino/Airflow
integration&lt;/a&gt;
landed this year. This paired well with the new task-retry and fault-tolerant
execution features. To learn more about the full capabilities of pairing Trino’s
few fault-tolerant execution mode with Airflow, check out &lt;a href=&quot;https://www.youtube.com/watch?v=xKDN7RUJ5i4&quot;&gt;Philippe Gagnon’s
talk at this year’s Trino Summit&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;metabase-driver&quot;&gt;Metabase driver&lt;/h3&gt;

&lt;p&gt;A lot of folks in the community were asking for a &lt;a href=&quot;https://github.com/metabase/metabase/issues/17532&quot;&gt;Trino/Metabase
driver&lt;/a&gt; after Trino updated
its name. This was a large blocker for anyone who wants to move to Trino and
uses Metabase. Through a collaboration of the Metabase and Starburst engineers,
the &lt;a href=&quot;https://github.com/starburstdata/metabase-driver&quot;&gt;metabase-driver&lt;/a&gt; for
Trino was released, and we saw numerous users migrate to Trino.&lt;/p&gt;

&lt;h2 id=&quot;2023-roadmap&quot;&gt;2023 Roadmap&lt;/h2&gt;

&lt;p&gt;The upcoming roadmap was &lt;a href=&quot;https://youtu.be/mUq_h3oArp4?t=799&quot;&gt;covered in detail&lt;/a&gt;
by Martin at Trino Summit. To avoid extending this blog even further, we’ll
leave you with the featured project that covers many aspects of the Trino core
engine.&lt;/p&gt;

&lt;h3 id=&quot;project-hummingbird&quot;&gt;Project Hummingbird&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt; aims to
improve Trino’s columnar and vectorized evaluation engine. Every year we report
on many incremental performance improvements. These improvements are typically
small in isolation but have a large aggregate impact. This incremental approach
is the real key to improving query engine performance, and there is always room
for further optimization. If you want to get involved with this exciting
project, or to learn about the latest innovations as they are being discussed,
join the #project-hummingbird channel in &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;the Trino Slack
workspace&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;2022 was by far the busiest year this bunny has been. Trino has consistently
continued growing as we’ve attracted more contributors. We believe this trend
will continue in 2023 as we begin to put more process in place around managing
pull requests. Remember to get the word out and share anything you genuinely
think is cool or important for others to hear! Looking forward to an even more
successful 2023 Trino nation!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Manfred Moser, Cole Bowden, Martin Traverso </name>
        </author>
      

      <summary>It’s that time of the year where everyone gives excessively broad or niche predictions about the finance market, venture capital, or even the data industry. And we are now bombarded with “year-in-review” summaries where we find out just how much data is being collected to generate those summaries. End-of-year reflections are always useful because you can find patterns of what’s going well and what’s going poorly. It’s also good to pause and take stock of the things that did go well, because without that, you’ll only be looking at the list of things that you still have to do, and that isn’t healthy for anybody. In that spirit, let’s reflect on what we’ve been able to accomplish as a community this year, as well as what to look forward to in the next year!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/2022-review/cbb-reflection.png" />
      
    </entry>
  
    <entry>
      <title>Cleaning up the Trino pull request backlog</title>
      <link href="https://trino.io/blog/2023/01/09/cleaning-up-the-trino-backlog.html" rel="alternate" type="text/html" title="Cleaning up the Trino pull request backlog" />
      <published>2023-01-09T00:00:00+00:00</published>
      <updated>2023-01-09T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2023/01/09/cleaning-up-the-trino-backlog</id>
      <content type="html" xml:base="https://trino.io/blog/2023/01/09/cleaning-up-the-trino-backlog.html">&lt;p&gt;At some point in the lifecycle of a successful open source project, it reaches a
point where the number of incoming pull requests (PRs) outpace the project’s
ability to get code merged. It happens for a huge variety of reasons, including
developers moving on to other projects before tying up every loose end,
reviewers who miss a request for review, and because some stagnant PRs were
never going to happen and should have been closed two years ago. The GitHub
notification system doesn’t do anyone any favors, either. Having too many open
PRs is a problem for a project, because they make it harder to tell what is
being worked on and what may as well be dead code walking.&lt;/p&gt;

&lt;p&gt;And when we cross 700 open pull requests in Trino, constantly adding a few more
to the pile every week, what do we do? We clean it up! Let’s talk about how
we’re doing it, why we’re doing it that way, and how we’re planning on
preventing this from happening again. The end result should be some process
improvements that make contributing to Trino a better, faster, and more painless
experience.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;spring-cleaning&quot;&gt;Spring cleaning&lt;/h2&gt;

&lt;p&gt;The “how” is an easy thing to talk about. The Trino developer relations team is
in the process of going through all open PRs, from oldest to newest, manually
taking a look at each one and checking in on how we may want to proceed. For PRs
where the author seems to have abandoned it and not responded to a review, we close
them down, encouraging the authors to open them right back up if they decide
they want to continue work. For everything else, though, we’ve been taking a
more measured approach, offering to help facilitate reviews or discussion for
these long-lasting bits of code that may still have a chance of making their way
into Trino.&lt;/p&gt;

&lt;p&gt;To anyone who’s managed a repository before, this may seem like more effort than
necessary. You can add a bot to close anything that’s been stale or inactive for
too long, and problem solved, right? Sure, that does solve the problem, but it
creates a couple others.&lt;/p&gt;

&lt;p&gt;First, and perhaps most importantly: it’s not very human. Having a pull request
that you put time and effort into get shut down by a bot without having another
person swing by to say hello can be demoralizing, and it builds a negative
experience that might discourage future contributions to the project. We want
our contributors to like Trino and to enjoy the process of adding on to it, and
a GitHub bot slamming the door shut on their hard work isn’t going to help with
that. Having a bot do our work for us would also deprive us of a valuable
learning opportunity. Manually checking in on each pull request that slipped
through the cracks has allowed us to identify pain points in Trino code reviews
which we can try to mitigate moving forwards, and it’s provided a ton of
valuable insights for deciding on how to best improve the process.&lt;/p&gt;

&lt;p&gt;Second, and perhaps even more significant: there’s a lot of cool stuff we’d be
missing out on if we automatically closed everything. While going through the
backlog, we’ve found dozens of year-old pull requests that still have a lot of
value for Trino and only needed someone to take another look at them. For some,
the author may be missing, but the ideas are good and the PR can be handed off
to someone else to carry the torch and get it across the finish line. For
others, the author is still happy and ready to iterate on it, and all that’s
needed to get the ball rolling again is to ping a reviewer or two to take
another look. We’ve even found a couple PRs that were approved and ready to go,
and all it took was a simple click of the merge button. The effort-to-impact
ratio on that is off the charts - think of all the value we’d be missing out on
if we’d automatically closed those!&lt;/p&gt;

&lt;p&gt;The result of the effort so far has been excellent.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/backlog-blog/open-pull-requests-graph.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We’re not completely done with the cleanup effort, but as you can see, we’re
slowing down. Our oldest PRs are increasingly recent, still in development,
and worth having open. Going from a peak of 700+ open pull requests to around
300 is a massive improvement, and the goal is to end up in the vicinity of about
200 open pull requests in Trino at any point in time.&lt;/p&gt;

&lt;h2 id=&quot;keeping-things-pristine&quot;&gt;Keeping things pristine&lt;/h2&gt;

&lt;p&gt;But with the cleanup being so manual, the next challenge is stopping the pull
requests from steadily piling back up while we’re not paying attention to them.
The fix for that is simple - we’re going to keep paying attention. The Trino
developer relations team is planning on tracking and getting involved in two
categories of pull requests to keep the number of open PRs stable.&lt;/p&gt;

&lt;p&gt;The first category is pull requests that don’t get any immediate attention from
a reviewer. While Trino reviewers are overall excellent and quick to take a look
at incoming pull requests, about five percent slip through the cracks, where a
contributor submits something that receives no reviews or comments and lives on
in the pull request backlog. That’s not a good experience for the contributor,
and it’s not good for Trino, either, because that contribution could have a lot
of value. We plan on stopping this from happening by implementing workflows
which spring Trino developer relations into action when these situations arise.
If a pull request goes a few days without a comment, we’ll be the safety net to
ask questions, get engineers involved, and make sure that at least a few pairs
of eyes take a look at every incoming PR in a timely manner.&lt;/p&gt;

&lt;p&gt;The second category is pull requests that get some reviews, but eventually
stagnate or stop being actively worked on. This happens for a lot of reasons,
but in all cases, if a pull request goes a few weeks with no activity, the
developer relations team will be checking in. Our goal will be to figure out the
proper path forward, whether that’s flagging down some reviewers again,
communicating that the pull request should be closed, or anything else. The end
result should be that nothing slips through the cracks and ends up going months
without human contact. If an author vanishes or everyone gets too busy to look
at a pull request again, though, the final stop will ultimately be a stale bot
which closes pull requests that have gone a few months with no activity.&lt;/p&gt;

&lt;p&gt;With all these processes in place, contributors should never feel like their
efforts are going unnoticed. Submitted code should be reviewed quickly,
iterated on in a timely manner, and merged without much delay. In situations
where a pull request is &lt;em&gt;not&lt;/em&gt; going to be merged, the Trino developer relations
team should be able to chime in quickly to make that clear, saving contributors
from wasting time and effort on a false impression that their code will be
landed. And if you have any questions, concerns, or suggestions about all of
this, don’t hesitate to reach out to us directly on the Trino Slack using
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@devrel-team&lt;/code&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>At some point in the lifecycle of a successful open source project, it reaches a point where the number of incoming pull requests (PRs) outpace the project’s ability to get code merged. It happens for a huge variety of reasons, including developers moving on to other projects before tying up every loose end, reviewers who miss a request for review, and because some stagnant PRs were never going to happen and should have been closed two years ago. The GitHub notification system doesn’t do anyone any favors, either. Having too many open PRs is a problem for a project, because they make it harder to tell what is being worked on and what may as well be dead code walking. And when we cross 700 open pull requests in Trino, constantly adding a few more to the pile every week, what do we do? We clean it up! Let’s talk about how we’re doing it, why we’re doing it that way, and how we’re planning on preventing this from happening again. The end result should be some process improvements that make contributing to Trino a better, faster, and more painless experience.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/backlog-blog/so-many-pull-requests.png" />
      
    </entry>
  
    <entry>
      <title>Using Trino to analyze a product-led growth (PLG) user activation funnel</title>
      <link href="https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html" rel="alternate" type="text/html" title="Using Trino to analyze a product-led growth (PLG) user activation funnel" />
      <published>2022-12-23T00:00:00+00:00</published>
      <updated>2022-12-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html">&lt;p&gt;As the holiday season approaches, we have reached the end of our
&lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022 recap posts&lt;/a&gt;.
With the last talk of the summit, Mei Long from Upsolver gave an insightful
overview of how they use data to inform product decisions.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/MCB_1furnAo&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Upsolver.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;When talking about product-led growth (PLG), it helps to start by defining what
it even means. The core idea is simple: see how users engage with your product,
and make decisions based on how you can improve the product to better serve
those users. At Upsolver, the goal of PLG is to maximize user value. The issue
is that while this can be simple in some situations, when you’re delivering
complicated analytics tools, it’s not always immediately clear what features
would be the most valuable or useful. You need a lot of data to glean a lot of
insight, and you need to make sure your insights that can lead to action. And of
course, you need to be absolutely certain that your data is high-quality,
accurate, and trustworthy, lest you end up accidentally giving a customer a
ten million dollar discount.&lt;/p&gt;

&lt;p&gt;Mei explores the initial pass at using analytics to drive PLG at Upsolver,
letting her intern use a tool called Amplitude that worked for a time and for
limited use cases. As Upsolver grew, the analytics requirements did, too, and
Amplitude wasn’t powerful enough for Upsolver’s use case, nor for the more
complicated queries and analysis that needed to be run.&lt;/p&gt;

&lt;p&gt;Want to guess what query engine they swapped to using? Trino. Mei dives into a
quick demo that shows how Upsolver ingests all of its streaming data and stores
it for Trino to query, driving down time-to-insight to make it quick and
efficient to ask questions and make decisions based on those answers. With Trino
at the ready, Upsolver has never been better-equipped to work towards PLG.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html&quot;&gt;https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/upsolver-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Mei Long, Cole Bowden</name>
        </author>
      

      <summary>As the holiday season approaches, we have reached the end of our Trino Summit 2022 recap posts. With the last talk of the summit, Mei Long from Upsolver gave an insightful overview of how they use data to inform product decisions.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/upsolver.jpg" />
      
    </entry>
  
    <entry>
      <title>Using Trino with Apache Airflow for (almost) all your data problems</title>
      <link href="https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html" rel="alternate" type="text/html" title="Using Trino with Apache Airflow for (almost) all your data problems" />
      <published>2022-12-21T00:00:00+00:00</published>
      <updated>2022-12-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html">&lt;p&gt;As we close in on the final talks from &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022&lt;/a&gt;, this next talk dives into how to set up
Trino for batch processing. Trino has historically been well-known for
facilitating fast adhoc analytics queries as opposed to long-running, resource
intensive batch/ETL queries. This is due to the fact that Trino kills queries
that run out of resources in order to prioritize faster query execution. Earlier
this year, Trino added features to better support batch queries with a new 
&lt;a href=&quot;https://trino.io/blog/2022/05/05/tardigrade-launch.html&quot;&gt;fault-tolerant execution mode&lt;/a&gt;.
This mode backs up intermediate data during execution time, allowing Trino to
restart individual query tasks on failure rather than a query stage or the query
itself.&lt;/p&gt;

&lt;p&gt;Batch queries don’t typically involve human intervention and run asynchronously.
These tasks may depend on each other and have a complex workflow. This talk
describes how to orchestrate this complexity using Airflow’s new Trino
integration to run Trino batch queries to solve (almost) all your data problems.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/xKDN7RUJ5i4&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Astronomer.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;In this talk, we’re going to hear from Philippe, a Trino contributor and
Solutions Architect at Astronomer, the company building a SaaS product around
Apache Airflow. Philippe describes a fictional trading scenario that initially
follows a traditional warehousing approach to storing data. This architecture
has data sources that are queried and submitted as raw data into a centralized
warehouse. Within the warehouse itself, the raw data is transformed into data
ready to be consumed.&lt;/p&gt;

&lt;p&gt;This model enforces centralization, in which one team runs the platform and
builds the integration between producers and consumers. This team focuses on the
aspects of the data platform which further separates them from the business use
case. As source databases evolve, the central data team must keep up with these
changes. As the data consumers that rely on the data infrastructure grow, this
team commonly becomes a bottleneck.&lt;/p&gt;

&lt;p&gt;Trino allows you to move the queries as close as possible to the federated data
sources, removing the labor-intensive process of moving data into stages
before ingesting it into a central warehouse. This doesn’t mean that data
movement is no longer a necessity, but the necessity shifts from an availability
concern to a performance and scalability concern.&lt;/p&gt;

&lt;p&gt;Without investing into more resources, your data professionals are able to work
closely with producers and stakeholders with a shared understanding of the
domain. This increases data literacy and data availability throughout your
organization.&lt;/p&gt;

&lt;p&gt;Trino is not only for fast adhoc analytics with a human in the loop, but now 
provides a fault-tolerant execution mode that enables it to run resource
intensive batch jobs. This, paired with the federation capabilities, make Trino
able to ingest any data that can be represented in a tabular format. Users can
implement user-defined functions and run transformations using SQL without
involving intermediate systems.&lt;/p&gt;

&lt;p&gt;To run Trino batch queries at scale requires building complex interdependencies
between different tasks and often needs monitoring if there are any failures
that occur. This configuration also demands reactive automation to handle the
failing instances. Apache Airflow is an open-source platform for developing,
scheduling, and monitoring batch-oriented workflows on systems like Trino,
perfectly complementing the challenges of handling these intensive queries at 
scale.&lt;/p&gt;

&lt;p&gt;Even before introducing fault-tolerant execution mode, &lt;a href=&quot;https://engineering.salesforce.com/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36/&quot;&gt;Trino was already being
used to run batch queries at scale&lt;/a&gt;.
In these scenarios, Trino and a tool like Airflow already work well together
because these jobs will take time and likely nobody wants to wait around to run
the pipeline components in sequence. The reason why fault-tolerant execution
mode brings the Trino and Airflow combination to the forefront, is due to the
anticipation of Trino being adopted as a batch query engine tool as the learning
curve to run ETL jobs on Trino becomes as trivial as other tools in the space.&lt;/p&gt;

&lt;p&gt;Philippe dives into building out basic Airflow jobs to run over Trino and
introduces the concept of a directed acyclic graph (DAG). He then dives into
multiple useful features that help break down large jobs into manageable tasks,
and jobs that can adjust the schedule based on runtime execution. Sharded job 
creation splits large batch jobs into smaller tasks that can easily be retried.
Dynamic task mapping splits jobs into smaller tasks based on data observed at
runtime. Finally, a new features called data aware scheduling can schedule tasks
based on interdependencies between datasets.&lt;/p&gt;

&lt;p&gt;To get started with Trino in Apache Airflow, check out the
&lt;a href=&quot;https://airflow.apache.org/docs/apache-airflow-providers-trino/stable/index.html&quot;&gt;Airflow Trino provider documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html&quot;&gt;https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/astronomer-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Philippe Gagnon, Brian Olsen</name>
        </author>
      

      <summary>As we close in on the final talks from Trino Summit 2022, this next talk dives into how to set up Trino for batch processing. Trino has historically been well-known for facilitating fast adhoc analytics queries as opposed to long-running, resource intensive batch/ETL queries. This is due to the fact that Trino kills queries that run out of resources in order to prioritize faster query execution. Earlier this year, Trino added features to better support batch queries with a new fault-tolerant execution mode. This mode backs up intermediate data during execution time, allowing Trino to restart individual query tasks on failure rather than a query stage or the query itself. Batch queries don’t typically involve human intervention and run asynchronously. These tasks may depend on each other and have a complex workflow. This talk describes how to orchestrate this complexity using Airflow’s new Trino integration to run Trino batch queries to solve (almost) all your data problems.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/astronomer.jpg" />
      
    </entry>
  
    <entry>
      <title>Journey to Iceberg with Trino</title>
      <link href="https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html" rel="alternate" type="text/html" title="Journey to Iceberg with Trino" />
      <published>2022-12-19T00:00:00+00:00</published>
      <updated>2022-12-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html">&lt;p&gt;This post comes from &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the second half of Trino Summit 2022 session&lt;/a&gt;. Our friends JaeChang and Jennifer from
SK Telecom traveled across the globe from South Korea to join us in person! SK
Telecom recently had some issues scaling Trino on the Hive model, among other
issues that come with Hive. While some initial tweaking helped speed things up,
it ultimately never solved the problem. After switching to Iceberg, SK Telecom
ran initial performance tests with some very impressive results. In this talk,
Jennifer and JaeChang describe their journey to Iceberg with Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/V9_aPLXATh8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@SK-Telecom.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;SK Telecom is a South Korean telecom company that has built and operated an
on-premise data platform based on open source software to determine
manufacturing yield since 2015. SK Telecom’s goal has always been to build an observable
federated data platform on open source software at scale.&lt;/p&gt;

&lt;p&gt;SK Telecom manages on-premise Hadoop clusters to store their data. Previously,
they used tools like
&lt;a href=&quot;https://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html&quot;&gt;distcp&lt;/a&gt; to
make data available in one center. SK Telecom started using Presto in 2016 and
shifted to Trino in 2021. To run batch queries on their warehouse, Trino workers
are deployed on HDFS data nodes. There is also an adhoc Trino cluster deployed
to manage federated queries over multiple data silos from an array of disparate
data sources. This was one of the slow and brittle processes that Trino
replaced. They chose Trino because it simplifies querying novel big data systems
and combines that data more commonplace systems for their users.&lt;/p&gt;

&lt;p&gt;As Trino adoption grew within the company up to 300 requests per minute, they
eventually faced challenges with scaling. Not only were the number of
requests growing, but the range of data being queried grew as well; users were
evaluating petabytes of data, with terabyte-sized query input processed across
hundreds of nodes. Many user queries were blocked while waiting for resources to
become available. In response, the data engineering team began investigating how
they could both scale and improve individual query performance.&lt;/p&gt;

&lt;p&gt;To find the root cause, SK Telecom’s data engineers investigated cluster
behavior beyond what was exposed in the web UI. They began collecting all the
query plan JSON files, coordinator and worker JMX stats, system metrics, and
Trino logs to build out their own metrics dashboard. The two main
causes were that input data was too large, and there were spikes in the number
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BlockedSplit&lt;/code&gt; operations leading to queries being blocked while waiting for
other tasks to complete. They initially aimed to address this by changing some
settings to increase thread counts and tuning the settings, but these changes
still didn’t achieve the desired results. The ultimate bottleneck was the Hive
metastore and the expensive list operations that caused many of the blocking
operations to finish slowly.&lt;/p&gt;

&lt;p&gt;At this point, the team reevaluated their needs to consider alternative
solutions. They needed a better indexing strategy on the data with a flexible
partitioning strategy. They also needed to remove the bottleneck on the metadata
for this data while still maintaining compatibility across multiple query
engines as Hive did.&lt;/p&gt;

&lt;p&gt;The team looked at the existing set of novel data lake connectors available in
Trino version 356, which at the time only included Iceberg. SK Telecom was 
immediately impressed by the metadata indexing in the Iceberg project. They 
particularly liked Iceberg’s snapshot isolation as data is created or modified.
They were able to speed up queries using data file pruning on partition and
column stats stored in the manifest file.&lt;/p&gt;

&lt;p&gt;After running a benchmark, the team found that Iceberg reduced the input data
size on the order of hundreds, down to under ten gigabytes. They also
investigated adding a high amount of partitions to continue lowering the input
data, but found that there’s a tradeoff where creating too many partitions
increases query planning time. Ultimately, they found a sweet spot where the
input data size was around six gigabytes and planning only took 70 milliseconds.&lt;/p&gt;

&lt;p&gt;This summary is just the tip of the iceberg of all the information JaeChang and
Jennifer shared with us about how Iceberg helped SK Telecom with their Trino
scaling issues. Watch this incredible talk to learn more if you’re considering
taking the leap from Hive to Iceberg!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html&quot;&gt;https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/sk-telecom-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>JaeChang Song, Jennifer Oh, Brian Olsen</name>
        </author>
      

      <summary>This post comes from the second half of Trino Summit 2022 session. Our friends JaeChang and Jennifer from SK Telecom traveled across the globe from South Korea to join us in person! SK Telecom recently had some issues scaling Trino on the Hive model, among other issues that come with Hive. While some initial tweaking helped speed things up, it ultimately never solved the problem. After switching to Iceberg, SK Telecom ran initial performance tests with some very impressive results. In this talk, Jennifer and JaeChang describe their journey to Iceberg with Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/sk-telecom.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino at Quora: Speed, cost, reliability challenges, and tips</title>
      <link href="https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html" rel="alternate" type="text/html" title="Trino at Quora: Speed, cost, reliability challenges, and tips" />
      <published>2022-12-16T00:00:00+00:00</published>
      <updated>2022-12-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html">&lt;p&gt;As we near the end of the &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022 recap series&lt;/a&gt;, it’s time to take a stop at Quora. At
Quora, being an engineer responsible for maintaining Trino comes with its fair
share of challenges. With concerns about cost, performance, and reliability,
Quora has taken several creative steps to ensure that they get the most out of
Trino. Other Trino users may be able to learn a few neat tips and tricks to
do the same by tuning in.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Q03DzL_fm-I&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Quora.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Trino at Quora is used in the big ways that we’re all familiar with. It receives
queries from a variety of clients and services, then executes those queries
on an S3 data lake and Hive metastore to return results at high speeds. With a
wide variety of clients, Quora gets the most out of Trino, using it for ad-hoc
analysis, but also for ETL, backfill jobs, A/B testing, and time series queries.
But as with any large system being used for so many things, this isn’t without a
few challenges.&lt;/p&gt;

&lt;p&gt;The first challenge is a universal one - how can Quora keep the costs of running
Trino to a minimum? One of the biggest strategies was to migrate to AWS Graviton
instances to run Trino clusters, as they have proven to be more cost-efficient
than other AMD and Intel-based EC2 instances at Quora. Graviton does have lower 
availability, though, so they sometimes must be complemented with some AMD/Intel
instances in order to avoid any downtime. Auto-scaling also led to great cost
savings, as the workloads varied based on time of day. By checking usage and
anticipating it by ramping up the number of machines during the busy workday and
ramping it back down when fewer jobs are in progress, Quora was able to minimize
idle machines and cut back on unnecessary spending. Finally, and perhaps most
obviously, the team at Quora worked to make ETL queries more efficient. By using
partitions effectively and creating a tool to detect inefficient queries
scanning too many partition keys, the result is efficient queries that take less
time and use fewer resources, saving on cost.&lt;/p&gt;

&lt;p&gt;Up next - how could Quora maximize Trino’s performance? With data analysts
expecting quick runtimes and occasionally running into problems, fine-tuning
Trino to run as well as it possibly can isn’t always an easy task. One
particular major issue they found at Quora was that some worker nodes which ran
for 24 hours or more straight would utilize less CPU and run slow, bogging
things down. The fix? Gracefully restart worker nodes that run for over a day,
and implement a detector to flag and restart any nodes which showed signs of
behaving slowly.&lt;/p&gt;

&lt;p&gt;The final big concern at Quora is reliability, as users expect Trino to be up
and running whenever they need it. In one instance, they found that overwriting
a specific configuration option caused a cluster to crash repeatedly and
slow down to a crawl. The issue was that they’d steadily been bumping the value
of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.min-expire-age&lt;/code&gt; configuration property up and up and up from the
default value of 15 minutes, until eventually, unexpired query history was using
up too much memory and causing the cluster to falter. Lowering the value back
down to something more advisable saved the day in that situation. But wanting to
avoid similar situations from happening again, Quora built extensive monitoring
tools to track the health of their Trino clusters. They ensure that even when
user error does cause problems, those problems can be flagged and send out
alerts, bringing the data engineering team to the rescue.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html&quot;&gt;https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/quora-social.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Yifan Pan, Cole Bowden</name>
        </author>
      

      <summary>As we near the end of the Trino Summit 2022 recap series, it’s time to take a stop at Quora. At Quora, being an engineer responsible for maintaining Trino comes with its fair share of challenges. With concerns about cost, performance, and reliability, Quora has taken several creative steps to ensure that they get the most out of Trino. Other Trino users may be able to learn a few neat tips and tricks to do the same by tuning in.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/quora.jpg" />
      
    </entry>
  
    <entry>
      <title>43: Trino saves trips with Alluxio</title>
      <link href="https://trino.io/episodes/43.html" rel="alternate" type="text/html" title="43: Trino saves trips with Alluxio" />
      <published>2022-12-15T00:00:00+00:00</published>
      <updated>2022-12-15T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/43</id>
      <content type="html" xml:base="https://trino.io/episodes/43.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at 
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Bin Fan, VP of Open Source at Alluxio and PMC maintainer of Alluxio open 
source and TSC member of Presto (&lt;a href=&quot;https://twitter.com/binfan&quot;&gt;@binfan&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/beinan/&quot;&gt;Beinan Wang&lt;/a&gt;, Software Engineer at 
Alluxio and Presto committer&lt;/li&gt;
&lt;/ul&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/alluxio-trino.jpeg&quot; /&gt;
&lt;br /&gt;
The Alluxio crew at Trino Summit 2022. &lt;br /&gt;
From left to right:
&lt;a href=&quot;https://www.linkedin.com/in/beinan/&quot;&gt;Beinan Wang&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/bin-fan/&quot;&gt;Bin Fan&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/bitsondatadev/&quot;&gt;Brian Olsen&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/dennyglee/&quot;&gt;Denny Lee&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/hopechong/&quot;&gt;Hope Wang&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/jasminechenwang/&quot;&gt;Jasmine Wang&lt;/a&gt;.
&lt;br /&gt;
Somehow Denny Lee from &lt;a href=&quot;https://delta.io/&quot;&gt;Delta Lake&lt;/a&gt; snuck in there
😉. Love the data community vibes on this one.

&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-data-caching-and-orchestration&quot;&gt;Concept of the episode: Data caching and orchestration&lt;/h2&gt;

&lt;p&gt;Out of all those petabytes of data you store, only a small fraction of it is
creating business value for you today. When you scan the same data multiple
times and transfer it over the wire, you’re wasting time, compute cycles, and
ultimately money. This gets worse when you’re pulling data across regions or
clouds from disaggregate Trino clusters. In situations like these, caching
solutions can make a tremendous impact on the latency and cost of your queries.&lt;/p&gt;

&lt;h3 id=&quot;trino-without-caching&quot;&gt;Trino without caching&lt;/h3&gt;

&lt;p&gt;There seems to be a sizeable portion of the community who aren’t using a
caching solution. Not all workloads will really benefit from caching. If you
are performing more writes than reads, the cache will need to constantly be
invalidated before performing each read. If you are scanning all your data to
run daily migrations, you would not benefit from caching. However, one of the
most common use cases where Trino shines is interactive adhoc analytics. This 
type of querying is very fast in Trino, especially when using modern storage 
formats like Iceberg.&lt;/p&gt;

&lt;h3 id=&quot;two-types-of-caching&quot;&gt;Two types of caching&lt;/h3&gt;

&lt;p&gt;There are two types of caching used with Trino. The first type caches the
results of a common query or sub query to optimize performance for any query
that overlaps with predicates to obtain the cached results.&lt;/p&gt;

&lt;p&gt;The other type of query is object file caching. Rather than store the results of
the query, you are caching the files from a file or object store that are
scanned as part of the query.&lt;/p&gt;

&lt;p&gt;In this episode, we will focus on the latter type of caching. This will apply to
connectors like Hive, Iceberg, Delta Lake, and Hudi.&lt;/p&gt;

&lt;h3 id=&quot;hive-connector-caching&quot;&gt;Hive connector caching&lt;/h3&gt;

&lt;p&gt;Trino has an &lt;a href=&quot;https://trino.io/docs/current/connector/hive-caching.html&quot;&gt;embedded caching engine&lt;/a&gt;
in the Hive connector. This is convenient as it ships with Trino, however, it 
does not work outside the Hive connector. The caching engine is 
&lt;a href=&quot;https://github.com/qubole/rubix&quot;&gt;Rubix&lt;/a&gt;. While this system works for simple
Hive use cases, it fails to address use cases outside of Hive and hasn’t been
maintained since 2020. There are many features missing like security features
and support for more compute engines.&lt;/p&gt;

&lt;h3 id=&quot;what-is-alluxio&quot;&gt;What is Alluxio?&lt;/h3&gt;

&lt;p&gt;Alluxio is world’s first open source data orchestration technology for analytics
and AI for the cloud. It provides a common interface enabling computation
frameworks to connect to numerous storage systems through a common interface.
Alluxio’s memory-first tiered architecture enables data access at speeds orders
of magnitude faster than existing solutions. Alluxio was originally developed at
Berkley Amp Labs, &lt;a href=&quot;https://amplab.cs.berkeley.edu/wp-content/uploads/2014/11/2014_socc_tachyon.pdf&quot;&gt;and was originally called Tachyon&lt;/a&gt;.
It was less focused on caching and data orchestration and more focused on
fault-tolerance via lineage and other techniques borrowed from Spark.&lt;/p&gt;

&lt;p&gt;Alluxio lies between data driven applications, such as Trino and Apache Spark,
and various persistent storage systems, such as Amazon S3, Google Cloud Storage,
HDFS, Ceph, and MinIO. Alluxio unifies the data stored in these different
storage systems, presenting unified client APIs and a global namespace to its
upper layer data driven applications.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/alluxio-architecture.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Alluxio is commonly used as a distributed shared caching service so compute
engines talking to Alluxio can transparently cache frequently accessed data,
especially from remote locations, to provide in-memory I/O throughput. Alluxio
also enables unifying all data storage to a single namespace. This can make
things simpler if your data is stored across different systems, have data stored
in different regions, or stored across different clouds.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/inside-alluxio.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://docs.alluxio.io/os/user/stable/en/Overview.html&quot;&gt;https://docs.alluxio.io/os/user/stable/en/Overview.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;what-is-data-orchestration&quot;&gt;What is data orchestration?&lt;/h3&gt;

&lt;p&gt;A data orchestration platform abstracts data access across storage systems,
virtualizes all the data, and presents the data via standardized APIs with
global namespace to data-driven applications. In the meantime, it should have
caching functionality to enable fast access to warm data. In summary, a data
orchestration platform provides data-driven applications data accessibility,
data locality, and data elasticity.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://www.alluxio.io/blog/data-orchestration-the-missing-piece-in-the-data-world/&quot;&gt;https://www.alluxio.io/blog/data-orchestration-the-missing-piece-in-the-data-world/&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;trino-and-alluxio-expedia-use-case&quot;&gt;Trino and Alluxio: Expedia use case&lt;/h3&gt;

&lt;p&gt;Expedia needed to have the ability to query cross cluster over different regions
while simplifying the interface to their local data sources.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/43/expedia-trino-alluxio.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://www.alluxio.io/blog/unifying-cross-region-access-in-the-cloud-at-expedia-group-the-path-toward-data-mesh-in-the-brand-world/&quot;&gt;Unifying cross-region access in the cloud at Expedia Group — The path toward data mesh in the brand world&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-episode-alluxioalluxio-pr-13000-add-a-doc-for-trino&quot;&gt;PR of the episode: Alluxio/alluxio PR 13000 Add a doc for Trino&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/Alluxio/alluxio/pull/13000&quot;&gt;This episode’s PR&lt;/a&gt; is actually
not located in a Trino repository. This PR comes from the Alluxio repository. It
happened in the wake of the rebranding from Presto to Trino. PRs like this
helped continue the Trino community as it grew awareness around the new name, as
well as, fixed any potential issues that occurred with the hasty renaming we had
to do.&lt;/p&gt;

&lt;p&gt;This was submitted by Alluxio engineer, &lt;a href=&quot;https://github.com/yuzhu&quot;&gt;David Zhu&lt;/a&gt;.
A huge thanks to David and his contributions to Trino as well!&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-running-trino-on-alluxio&quot;&gt;Demo of the episode: Running Trino on Alluxio&lt;/h2&gt;

&lt;p&gt;This demo of the episode, covers how to configure Alluxio to use write-through
caching to MinIO. This is done using the Iceberg connector with only one change
to the location property on the table from the Trino perspective.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/yaxPEWRpEzc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;To follow this demo, copy the code located in the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/community_tutorials/alluxio/trino-alluxio-iceberg-minio&quot;&gt;trino-getting-started repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Federating them all on Starburst Galaxy</title>
      <link href="https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html" rel="alternate" type="text/html" title="Federating them all on Starburst Galaxy" />
      <published>2022-12-14T00:00:00+00:00</published>
      <updated>2022-12-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html">&lt;p&gt;As the &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022 recap post series&lt;/a&gt; continues on, I have been reading all the
wonderful posts by our awesome speakers, facilitated by the Trino developer
relations team. Because I have a perpetual fear of missing out, I convinced them
that I should get in on the fun. For this latest installment in the series, I
will be recapping my very own Trino Summit talk. Basically, I’m ripping off
Bo Burnham’s comedy bit where he &lt;a href=&quot;https://youtu.be/FZVMB8mrNO0?t=35&quot;&gt;reacts to his own reaction video&lt;/a&gt;,
blog style.&lt;/p&gt;

&lt;p&gt;In this session, I demonstrate building a data lakehouse architecture with
&lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst Galaxy&lt;/a&gt;, the
fastest and easiest way to get up running with Trino.
Before I dive into the recap, I want to thank the Trino community for showing
up. I am grateful that I was able to meet and learn from so many members of the
community in person.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Zfmxwu0m98k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;The premise of this example is that we have Pokémon Go data being ingested into
S3, which contains each Pokémon’s encounter information. This includes the
geo-location data of where each Pokémon spawned, and how long the Pokémon could
be found at that location. What we don’t have is any
information on that Pokemon’s abilities. That information is contained in the
Pokédex stored in MongoDB which I’ve cleverly nicknamed &lt;strong&gt;PokéMongoDB&lt;/strong&gt;. It
includes data about all the Pokémon including type, legendary status,
catch rate, and more. To create meaningful insights from our data, we need
to combine the incoming geo-location data with the static dimension CSV table
located in MongoDB.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/starburst-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To do this, I build out a reporting structure in the data lake using
Starburst Galaxy. The first step is to read the raw data stored in the land
layer, then clean and optimize that data into more performant ORC files in the
structure layer. Finally, I join the spawn data and Pokédex data together into a
single table that is cleaned and ready to be utilized by a data consumer.
Next I apply role-based access control capabilities within Starburst
Galaxy, which provides the proper data governance so that data consumers only
have read permissions to that final table. I then create some visualizations to
analyze which Pokémon are common to spawn in the San Francisco area.&lt;/p&gt;

&lt;p&gt;I walk through all the setup required to put this data lakehouse architecture
into action including creating my catalogs, cluster, schemas, and tables. After
incorporating open table formats, applying native security, and building
out a reporting structure, I have confidence that my data lakehouse is built
to last, and end up with some really cool final Pokémon graphs.&lt;/p&gt;

&lt;h2 id=&quot;helpful-links&quot;&gt;Helpful links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Sign up for &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/start/&quot;&gt;Starburst Galaxy&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Read the &lt;a href=&quot;https://docs.starburst.io/starburst-galaxy/index.html&quot;&gt;docs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Try a
&lt;a href=&quot;https://docs.starburst.io/starburst-galaxy/tutorials/index.html&quot;&gt;tutorial&lt;/a&gt; for yourself&lt;/li&gt;
  &lt;li&gt;Register for &lt;a href=&quot;https://www.starburst.io/datanova/?utm_source=event&amp;amp;utm_medium=datanova&amp;amp;utm_campaign=[…]Event-Datanova-social-promo&amp;amp;utm_content=trinosummitrecapblog&quot;&gt;Datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html&quot;&gt;https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/starburst-social.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Monica Miller</name>
        </author>
      

      <summary>As the Trino Summit 2022 recap post series continues on, I have been reading all the wonderful posts by our awesome speakers, facilitated by the Trino developer relations team. Because I have a perpetual fear of missing out, I convinced them that I should get in on the fun. For this latest installment in the series, I will be recapping my very own Trino Summit talk. Basically, I’m ripping off Bo Burnham’s comedy bit where he reacts to his own reaction video, blog style. In this session, I demonstrate building a data lakehouse architecture with Starburst Galaxy, the fastest and easiest way to get up running with Trino. Before I dive into the recap, I want to thank the Trino community for showing up. I am grateful that I was able to meet and learn from so many members of the community in person.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/starburst.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino for large scale ETL at Lyft</title>
      <link href="https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html" rel="alternate" type="text/html" title="Trino for large scale ETL at Lyft" />
      <published>2022-12-12T00:00:00+00:00</published>
      <updated>2022-12-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html">&lt;p&gt;Buckle up, for the next &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;post in the Trino Summit 2022 recap series&lt;/a&gt;. In this post, we’re covering the talk
given by Lyft engineers, Charles and Ritesh, on how they have not only scaled
Trino as adoption grew, but with less nodes and more effective usage. They
also started moving to utilizing Trino more for ETL rather than just interactive
analytics. Get ready for a smooth ride as Lyft brings you large scale ETL with
Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/FL3c1Ue7YWM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Lyft.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Lyft uses Trino to perform ETL jobs reading 10 petabytes of data per day and
writing 100 terabytes per day. They run 250,000 queries per day, with around
2,000 unique users. This requires approximately 750 EC2 instances scaling up or
down with an autoscaler. Over 90 percent of queries complete within a one to
three minutes.&lt;/p&gt;

&lt;p&gt;In the last year, Lyft cut their number of Trino nodes in half, while increasing
their workloads. This is possible due to recent improvements in Trino and
upgrades in Java versions. Lyft is not using fault-tolerant execution, but has
started seeing interest in using Trino for ETL jobs due to the faster
turnaround. Some issues Lyft has faced has been around how resource hungry Trino
is, as well as, the issue where the coordinator can be a single point of failure
for queries executing on a cluster.&lt;/p&gt;

&lt;p&gt;Lyft was one of the earliest companies to really push using Trino for ETL use
cases. They built custom best effort rollback code in Apache Airflow. If a query
fails, the operation reverts to the state before the operation began. Lyft runs
four Trino clusters split by the type of workload used on that cluster. The best
practices are careful usage around broadcast joins, query sharding, and scaling
writers for ETL loads.&lt;/p&gt;

&lt;p&gt;One final point Lyft pointed out is keeping up with the rapid release cycle of
Trino was a challenge. Lyft showcases their regression testing using their query
replay framework. This session is a smooth five out of five ride. Enjoy!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html&quot;&gt;https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/lyft-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Charles Song, Ritesh Varyani, Brian Olsen</name>
        </author>
      

      <summary>Buckle up, for the next post in the Trino Summit 2022 recap series. In this post, we’re covering the talk given by Lyft engineers, Charles and Ritesh, on how they have not only scaled Trino as adoption grew, but with less nodes and more effective usage. They also started moving to utilizing Trino more for ETL rather than just interactive analytics. Get ready for a smooth ride as Lyft brings you large scale ETL with Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/lyft.jpg" />
      
    </entry>
  
    <entry>
      <title>Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino</title>
      <link href="https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html" rel="alternate" type="text/html" title="Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino" />
      <published>2022-12-09T00:00:00+00:00</published>
      <updated>2022-12-09T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html">&lt;p&gt;Rolling right along with another one of &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;our Trino Summit 2022 recap posts&lt;/a&gt;, we’re excited to bring you the engaging
talk from Marc Laforet at Shopify. He talked about the ordeal (or, if you look
at it in a positive light, the privilege) of migrating petabytes of data from
Hive to Iceberg table formats with the help of Trino. With details on why
Shopify chose to move to Iceberg, the various migration strategies that were
considered, and the ultimate process of moving all that data while the Trino
Iceberg connector was still in active development, it’s an insightful talk that
you don’t want to miss.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/nJBBw-xnLU8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Shopify@Trino.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Along with many other Trino users, it should come as no surprise that Shopify
has a lot of data to work with. First-party data comes in from a few different
sources, and there’s a mountain of modelled data to go along with it. In
Shopify’s case, one of the issues was that some data sets were built on top of
custom table formats. On top of that, the architecture wasn’t scaled with a
careful plan in mind, leading to limited interoperability of datasets among
various tools. With data scientists unable to unify data across different tools
and storages, it was time for a change.&lt;/p&gt;

&lt;p&gt;When you’ve got tons of data that isn’t currently in one place, what’s the fix?
Create a central lakehouse for all the data to be accessible from, a
single-service portal that could serve all users’ needs. The first question was
which table format to use, and if the title of the blog post didn’t already give
it away, they chose to go with Apache Iceberg. It was an easy, central vision
to work towards: all data in a centralized lakehouse stored in Iceberg, then
queryable by Trino.&lt;/p&gt;

&lt;p&gt;Having a plan and putting that plan into action are two different things,
though. When nothing is already in Iceberg, moving it all there is a migration
on the scale of thousands of tables and petabytes of data. In Marc’s words from
the talk, once Shopify committed to the migration and invested resources into
it, the realization was, “crap, now I have to build it.” Even worse, because the
old data was primarily in gzipped JSON format, it all needed to be rewritten…
and so it was.&lt;/p&gt;

&lt;p&gt;Then, enter Trino! With new Iceberg-based tables, Trino was identified as the
right tool for the job to process all that data. This wasn’t without snags, as
the migration happened while the Iceberg connector was still being aggressively
worked on and developed. There were a few different incidents where Shopify hit
a snag or an issue, and an update or bugfix to Trino’s Iceberg connector solved
those problems in a matter of days or weeks.&lt;/p&gt;

&lt;p&gt;The result of all of this? Some incredible benchmark results. Large tables saw a
96% reduction in planning time, a 96% reduction in cumulative user memory, and a
95% reduction in query execution time. That’s the difference between thousands
of terabytes of memory to under 100, and a query that would take an hour to run
only taking three minutes. For the absolute largest table at Shopify, some
queries saw a 99.9% reduction in execution time. Yes, that number is real.&lt;/p&gt;

&lt;p&gt;Moral of the story? If you find yourself using an old Hive table with outdated
file formats, lamenting the resources you need and the time it takes, the
decision is easy. Migrate to Iceberg with Trino. Shopify has shown us the way,
and the full talk has plenty of useful advice for how to best go about it.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html&quot;&gt;https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/shopify-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Marc Laforet, Cole Bowden</name>
        </author>
      

      <summary>Rolling right along with another one of our Trino Summit 2022 recap posts, we’re excited to bring you the engaging talk from Marc Laforet at Shopify. He talked about the ordeal (or, if you look at it in a positive light, the privilege) of migrating petabytes of data from Hive to Iceberg table formats with the help of Trino. With details on why Shopify chose to move to Iceberg, the various migration strategies that were considered, and the ultimate process of moving all that data while the Trino Iceberg connector was still in active development, it’s an insightful talk that you don’t want to miss.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/shopify.jpg" />
      
    </entry>
  
    <entry>
      <title>Elevating data fabric to data mesh: Solving data needs in hybrid data lakes</title>
      <link href="https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html" rel="alternate" type="text/html" title="Elevating data fabric to data mesh: Solving data needs in hybrid data lakes" />
      <published>2022-12-07T00:00:00+00:00</published>
      <updated>2022-12-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html">&lt;p&gt;Tune in for the next &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;post in the Trino Summit 2022 recap series&lt;/a&gt;. In this post, we’re joining Saj from
Comcast, to talk about their migration from a data fabric to data mesh. Saj
shows you that there is more to the buzzword than meets the eye. He gives a
solid overview of why Comcast is taking data mesh to heart.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/sSWBi7bBotQ&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Comcast.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Comcast engineer Sajuman Joseph brings us through Comcast’s process to move from
their initial use case of using Trino to power their data fabric architecture to
include more governance features by leveraging Trino. Data fabric enables
querying data across distributed data sets, but importantly, it allows Comcast
to transparently migrate data across on-prem and cloud storage without impacting
users.&lt;/p&gt;

&lt;p&gt;Despite offering query federation, data fabric still misses out on a
higher-quality experience that data mesh aims to solve. Not only does having
access to the data matter, but also adding data quality checks and a dedicated
owner to ensure the data is correct and consumable. The ownership is split by
domains defined by Comcast. It is the responsibility of the owners to ensure
data quality, compliance, and security on the data they own. This data can be
exposed internally or externally as a data product. While many of the drivers
for this are done through company policy, there are technical means to make this
possible. This includes improving metadata on the data, access logs, global
data catalogs, and managing data access.&lt;/p&gt;

&lt;p&gt;Trino facilitates a single point of access and is the a primary location where
policies are enforced. Comcast created an engine called the Enterprise Policy
Hub which syncs with all data stores and compute engines to enforce company
policy and update metadata on all data across Comcast. Trino, along with other
query engines, consults this engine to determine what information a user has
access to, who owns the data, and creates an audit trail of what queries are
run.&lt;/p&gt;

&lt;p&gt;There are still some open challenges Comcast is looking to overcome. Data
discovery is a large challenge for anyone looking to find a specific table and
who is responsible for updating it. Another interesting area Comcast is
researching is creating automated retention and minimization of data copies.
This talk was exciting and gives a pretty clear roadmap to some beneficial
changes many teams can make to improve the quality and governance of their data
sets.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and
link to &lt;a href=&quot;https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html&quot;&gt;https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/comcast-social.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Sajuman Joseph, Brian Olsen</name>
        </author>
      

      <summary>Tune in for the next post in the Trino Summit 2022 recap series. In this post, we’re joining Saj from Comcast, to talk about their migration from a data fabric to data mesh. Saj shows you that there is more to the buzzword than meets the eye. He gives a solid overview of why Comcast is taking data mesh to heart.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/comcast.jpg" />
      
    </entry>
  
    <entry>
      <title>AI and poetry meet Trino</title>
      <link href="https://trino.io/blog/2022/12/06/poetry-ai.html" rel="alternate" type="text/html" title="AI and poetry meet Trino" />
      <published>2022-12-06T00:00:00+00:00</published>
      <updated>2022-12-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/06/poetry-ai</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/06/poetry-ai.html">&lt;p&gt;Sometimes when working in the depth of the query engine core, the planner, the
optimizer, or some other tricky problem or code, you run out of steam and need a
distraction. Your brain will quietly work on the problem in the background while
you get to read or do something interesting.&lt;/p&gt;

&lt;p&gt;During one break we found out more about GPT-3 and the recent improvements on
the OpenAI project. So we had to try.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;It’s pretty amazing that you can feed the AI a prompt just like we did:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Write a short rhyming poem explaining Trino in easy but accurate terms
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And after a few trials, the best one produced by GPT-3 was this:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Trino is a tech platform,&lt;br /&gt;
That makes data integration a breeze,&lt;br /&gt;
It’s designed to help your business grow,&lt;br /&gt;
And optimise your systems with ease.&lt;br /&gt;
&lt;br /&gt;
It links together different sources,&lt;br /&gt;
Both big and small,&lt;br /&gt;
Gathers data of different types,&lt;br /&gt;
Helping you make decisions for all.&lt;br /&gt;
&lt;br /&gt;
So Trino is an integration tool,&lt;br /&gt;
That connects data for you,&lt;br /&gt;
It’ll ease the burden of data sync,&lt;br /&gt;
Giving you more time to do what you do.&lt;br /&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cool, huh? Have a look at another prompt:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Write a short rhyming poem explaining Trino, the SQL query engine in complex
and accurate terms.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the compelling result:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Trino is a query engine that does more than just run,&lt;br /&gt;
It’s a SQL engine with performance that stuns.&lt;br /&gt;
It is secure, efficient, and kind,&lt;br /&gt;
Providing real-time analytics of any kind.&lt;br /&gt;
&lt;br /&gt;
It’s a distributed system that’s built to scale,&lt;br /&gt;
Processing queries that never fail.&lt;br /&gt;
Connecting to myriad data sources,&lt;br /&gt;
It can quickly analyze any course.&lt;br /&gt;
&lt;br /&gt;
It’s flexible, reliable, and fast,&lt;br /&gt;
And is designed to make your queries last.&lt;br /&gt;
It can handle large volumes of data with ease,&lt;br /&gt;
Making the most complex queries a breeze.&lt;br /&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you are interested to learn more you can look at the
&lt;a href=&quot;https://arstechnica.com/information-technology/2022/11/openai-conquers-rhyming-poetry-with-new-gpt-3-update/&quot;&gt;blog post on Ars Technica&lt;/a&gt;
or go and try it yourself &lt;a href=&quot;https://beta.openai.com/playground/&quot;&gt;on the playground&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Enjoy while we are heading back to &lt;a href=&quot;https://github.com/trinodb/trino/pulls&quot;&gt;working on Trino pull
requests&lt;/a&gt; and other code now.&lt;/p&gt;

&lt;p&gt;Martin and Marcos&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Marcos Traverso</name>
        </author>
      

      <summary>Sometimes when working in the depth of the query engine core, the planner, the optimizer, or some other tricky problem or code, you run out of steam and need a distraction. Your brain will quietly work on the problem in the background while you get to read or do something interesting. During one break we found out more about GPT-3 and the recent improvements on the OpenAI project. So we had to try.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/graphics/trino-openapi-header.png" />
      
    </entry>
  
    <entry>
      <title>Leveraging Trino to power data at Goldman Sachs</title>
      <link href="https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html" rel="alternate" type="text/html" title="Leveraging Trino to power data at Goldman Sachs" />
      <published>2022-12-05T00:00:00+00:00</published>
      <updated>2022-12-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html">&lt;p&gt;Continuing with &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the Trino Summit 2022 sessions posts&lt;/a&gt;, we’re diving into an insightful
lightning talk from &lt;a href=&quot;https://www.goldmansachs.com&quot;&gt;Goldman Sachs&lt;/a&gt;. They explore
how they use Trino to help ensure data quality across the board for all users
and customers. By using Trino to federate their various data sources, querying
everything in one place provides them with the flexibility they need. With that
flexibility, they can validate that all data is as it should be where that data
lives, settling any concerns that may exist about data integrity.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/g9fLA3tFG-Q&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Validating data quality can be a tricky and complicated process. Data resides
in many sources, with different rules and different processes for checking
quality. Goldman’s data ingestion team may not have a detailed understanding
of all data sets. Despite that, there is a need to autonomously verify and
validate all data to be confident in its quality and integrity. The solution to
this challenge? A queryable data quality platform powered by Trino.&lt;/p&gt;

&lt;p&gt;The underlying data quality platform’s logic handles the validation. Resting
on top of it is Trino, the scalable, fast solution to ensure that users can
query what they need. Even when the platform is profiling the data, enforcing
various quality rules, and validating the data in different ways, Trino is there
to provide access to everything contained within, proving that quality, speed,
and accessibility don’t need to be tradeoffs.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html&quot;&gt;https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/goldman-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Sumit Halder, Siddhant Chadha, Suman-Newton, Ramesh Bhanan, Cole Bowden</name>
        </author>
      

      <summary>Continuing with the Trino Summit 2022 sessions posts, we’re diving into an insightful lightning talk from Goldman Sachs. They explore how they use Trino to help ensure data quality across the board for all users and customers. By using Trino to federate their various data sources, querying everything in one place provides them with the flexibility they need. With that flexibility, they can validate that all data is as it should be where that data lives, settling any concerns that may exist about data integrity.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/goldman-sachs.png" />
      
    </entry>
  
    <entry>
      <title>Trino delivers for Amazon Athena</title>
      <link href="https://trino.io/blog/2022/12/01/athena.html" rel="alternate" type="text/html" title="Trino delivers for Amazon Athena" />
      <published>2022-12-01T00:00:00+00:00</published>
      <updated>2022-12-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/01/athena</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/01/athena.html">&lt;p&gt;Our community just keeps growing! Today, it is time to reach out and welcome
another large group of Trino users. The release of the new engine version for
&lt;a href=&quot;https://aws.amazon.com/athena&quot;&gt;Amazon Athena&lt;/a&gt; upgrades Athena to a recent
version of Trino from a rather old version. This update brings a ton of
improvements from the Trino project to the users of the popular cloud-based
query service.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;shared-history&quot;&gt;Shared history&lt;/h2&gt;

&lt;p&gt;Amazon Athena and Trino share a long history. From the beginning of Athena, the
query engine under the hood was Trino, then still called Presto. Athena created
a low-maintenance, powerful access mode to your data in S3 and beyond. It
combined the performance and features of Trino, with the convenience of a cloud
service, which enabled new users and use cases. You could take advantage of
Trino without needing a team of experts to deploy and operate a Trino cluster
for your organization. In fact, we wrote about this in the first edition of
&lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt;. There is also a section in the &lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;new second
edition&lt;/a&gt; that you can get for
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;free from Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;time-flies&quot;&gt;Time flies&lt;/h2&gt;

&lt;p&gt;But since the initial release of Athena, time has not stood still. In fact, the
Trino project has accelerated in &lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;innovation, features, and releases
tremendously&lt;/a&gt;. Until now Athena
users missed out on these improvements. However with the update Amazon Athena
users now get access to many of these great features. As &lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-athena-announces-upgraded-query-engine/&quot;&gt;AWS mentions in the
announcement&lt;/a&gt;,
“over 50 new SQL functions, 30 new features, and more than 90 query performance
improvements” are now available due the upgrade to a new version of Trino. These
include &lt;a href=&quot;/blog/2021/05/19/row_pattern_matching.html&quot;&gt;Row pattern recognition with MATCH_RECOGNIZE&lt;/a&gt;, &lt;a href=&quot;/blog/2021/03/10/introducing-new-window-features.html&quot;&gt;new window features&lt;/a&gt;, support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE&lt;/code&gt; statements, and many others.&lt;/p&gt;

&lt;p&gt;Performance improvements in our core engine and all the Trino connectors show up
in every release note. The &lt;a href=&quot;https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/&quot;&gt;improvements observed by the Athena team in their
benchmarks&lt;/a&gt;
show the resulting gains nicely. This is great evidence that our approach of
constantly working on small improvements wherever we find potential works well.
This approach is necessary since Trino is already at a very high performance
level, and like an elite athlete, where every small improvement matters.&lt;/p&gt;

&lt;p&gt;It is also important to note that these improvements are only in the  Trino
version of the engine, since the &lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Presto project does not include these
features&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;client-tools-and-collaboration&quot;&gt;Client tools and collaboration&lt;/h2&gt;

&lt;p&gt;Athena users also benefit from improvements for supporting client tools such as
Python clients, dbt, Metabase and others. Working with other communities is of
critical importance to the Trino project. The &lt;a href=&quot;https://trino.io/episodes/40.html&quot;&gt;innovations in our Iceberg
connector&lt;/a&gt; that are all now also available to
Athena users are a great example how we can lead the way together. Working with
contributors from Amazon and other companies and projects has yielded some
amazing improvements. At the &lt;a href=&quot;https://trino.io/episodes/42.html&quot;&gt;Trino summit and contributor
congregation&lt;/a&gt;, we to reconnected in person and
established even closer collaboration.&lt;/p&gt;

&lt;h2 id=&quot;looking-forward&quot;&gt;Looking forward&lt;/h2&gt;

&lt;p&gt;So, what is next for Trino and Athena users? First up, you should upgrade to the
new Trino engine in Athena, and avoid the legacy Presto engine.&lt;/p&gt;

&lt;p&gt;Second, check out some of the great presentations from &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022&lt;/a&gt; and &lt;a href=&quot;https://trino.io/episodes/42.html&quot;&gt;hear about some of our
impressions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And last but not least, stay tuned for more goodness. Trino already shipped
further releases that included support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;, table functions, and more
performance improvements. The Athena team is working hard on updating Trino for
your benefit regularly.&lt;/p&gt;

&lt;p&gt;Celebrating our &lt;a href=&quot;/blog/2022/09/12/tenth-birthday-celebration-recap.html&quot;&gt;first decade of the Trino project this last summer&lt;/a&gt; has shown a great trajectory for
the project and the community, and it looks like the next decade is going to be
even better!&lt;/p&gt;

&lt;p&gt;Sending a warm welcome from the Trino community to the Amazon Athena team and
users. Now you know that you were Trino users all along.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Martin and Manfred&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso</name>
        </author>
      

      <summary>Our community just keeps growing! Today, it is time to reach out and welcome another large group of Trino users. The release of the new engine version for Amazon Athena upgrades Athena to a recent version of Trino from a rather old version. This update brings a ton of improvements from the Trino project to the users of the popular cloud-based query service.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/trino-light.png" />
      
    </entry>
  
    <entry>
      <title>Optimizing Trino using spot instances with Zillow</title>
      <link href="https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html" rel="alternate" type="text/html" title="Optimizing Trino using spot instances with Zillow" />
      <published>2022-12-01T00:00:00+00:00</published>
      <updated>2022-12-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html">&lt;p&gt;In this installment of &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;the Trino Summit 2022 sessions posts&lt;/a&gt;, we jump into an exciting topic by folks
from &lt;a href=&quot;https://www.zillow.com&quot;&gt;Zillow&lt;/a&gt; about running Trino on spot instances.
Spot instances are cheap and ephemeral nodes that lead to reduced overall
compute costs. Spot instances are cheaper as they are not guaranteed to remain
available.&lt;/p&gt;

&lt;p&gt;In this session, Zillow engineers talk about how they use Trino on spots to take
advantage of the cost savings while handling the transitory nature of spots.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/vz9reBUgQTE&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Zillow.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Zillow’s BI platform team is tasked with enabling access to data and metrics
from their data lake in a self-serving and performant manner. The platform must
handle generating up-to-date reports and metrics to unlock time-critical
opportunities. They also need to enable adhoc analytics across multiple domains
within Zillow.&lt;/p&gt;

&lt;p&gt;There are close to 600 data pipelines and 65,000 queries running daily. The
average read covers 600 terabytes of data, and the average P95 time is around
20 seconds. They have six Trino clusters that service various workflows based on
load. These are all deployed on Amazon EKS with a range of eight to 60 workers
based on CPU utilization.&lt;/p&gt;

&lt;p&gt;When deploying Trino on EKS, Zillow uses worker groups, which enables them to
collocate nodes in AWS local zones. It also made it possible to choose spot 
instances, which are 90% cheaper than regular on-demand instances. A critical
aspect they needed to cover was to correctly tune the percentage of nodes that
were spot instances. They created pools of nodes that were entirely on-demand
for coordinators since a coordinator going down, brings down the entire cluster.
Other pools used for workers are tuned to an optimal blend of spot and
on-demand.&lt;/p&gt;

&lt;p&gt;Watch this session to learn how to properly optimize the number of spot
instances running for your Trino clusters, without losing reliability of your
service. Also learn some ways that Zillow is planning on using the
fault-tolerant execution mode.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html&quot;&gt;https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/zillow-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Santhosh Venkatraman, Rupesh Kumar Perugu, Brian Olsen</name>
        </author>
      

      <summary>In this installment of the Trino Summit 2022 sessions posts, we jump into an exciting topic by folks from Zillow about running Trino on spot instances. Spot instances are cheap and ephemeral nodes that lead to reduced overall compute costs. Spot instances are cheaper as they are not guaranteed to remain available. In this session, Zillow engineers talk about how they use Trino on spots to take advantage of the cost savings while handling the transitory nature of spots.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/zillow.jpg" />
      
    </entry>
  
    <entry>
      <title>Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!</title>
      <link href="https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html" rel="alternate" type="text/html" title="Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!" />
      <published>2022-11-30T00:00:00+00:00</published>
      <updated>2022-11-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html">&lt;p&gt;This post continues &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;a larger series of posts&lt;/a&gt; on the Trino Summit 2022 sessions.
Following the &lt;a href=&quot;/blog/2022/11/28/trino-summit-2022-apple-recap.html&quot;&gt;Trino at Apple talk&lt;/a&gt;, engineers from Bloomberg shared
the latest about their additions to Trino. Bloomberg uses Trino to federate huge
amounts of disparate financial data together. When you have many users with
different use cases and resource needs, you need something to ensure that the
huge workloads don’t bully the small ones. Enter the Trino Load Balancer, a
privacy-aware solution to help maintain high availability while still treating
data security as the first-class citizen that it should be.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/ePr-iVQ5ri4&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino-at-Bloomberg.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;Bloomberg collects data, creates experimental data, and ingests data from
vendors. Its data analysts then refine, clean, and structure that data using
whatever their preferred method is, generating even more diverse data. Internal
teams and clients then want to look at and query that generated data, too. Sound
like a data mesh? That’s because it is. Trino isn’t new at Bloomberg, and it’s
been in use to help federate all of those varying data sets into one unified
access point.&lt;/p&gt;

&lt;p&gt;When trying to deploy multiple Trino clusters for such a wide array of users who
demand high uptime, high throughput, and fast response times, the Trino
coordinator becomes a single point of failure. There’s the risk of
infrastructure outages, the need to shut things down for occasional upgrades,
and some users run high-throughput jobs for millions of rows while others are
expecting low-latency jobs for only hundreds. Keeping Trino up, running, and
meeting all users’ expectations is no small task.&lt;/p&gt;

&lt;p&gt;And that’s where the Trino Load Balancer comes in! As a fork of the open-source
presto-gateway, it helps to do exactly what it says on the tin for Trino:
balance workloads. By being aware of what’s running on each cluster and how many
resources are being used, it can direct traffic to the ideal clusters to meet
each user’s needs. And with a brief demo, we get a look at how data owners
can set policies that are respected within the load balancer, ensuring that
users can only access and query what they’re supposed to.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html&quot;&gt;https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/bloomberg-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Vishal Jadhav, Pablo Arteaga, Cole Bowden</name>
        </author>
      

      <summary>This post continues a larger series of posts on the Trino Summit 2022 sessions. Following the Trino at Apple talk, engineers from Bloomberg shared the latest about their additions to Trino. Bloomberg uses Trino to federate huge amounts of disparate financial data together. When you have many users with different use cases and resource needs, you need something to ensure that the huge workloads don’t bully the small ones. Enter the Trino Load Balancer, a privacy-aware solution to help maintain high availability while still treating data security as the first-class citizen that it should be.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/bloomberg.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino at Apple</title>
      <link href="https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html" rel="alternate" type="text/html" title="Trino at Apple" />
      <published>2022-11-28T00:00:00+00:00</published>
      <updated>2022-11-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html">&lt;p&gt;This post continues &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;a larger series of posts&lt;/a&gt; on the Trino Summit 2022 sessions.
Following the &lt;a href=&quot;/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&quot;&gt;Keynote: State of Trino session&lt;/a&gt;, engineers from Apple shared the
current usage of Trino at Apple. They discuss how they support Trino as a
service for multiple end-users, and the critical features that drew Apple to
Trino. They wrap up with some challenges they have faced and some development
they have planned to contribute to Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/3afcRK6Yvio&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/Trino@Apple.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;blockquote&gt;
  &lt;p&gt;Trino is deployed at scale in Apple, and it continues to see tremendous
adoption across multiple teams at Apple. &lt;em&gt;Yathi Peddyshetty, Software Engineer @ Apple&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The commonplace adhoc and BI analytics use cases make up a lot of how Apple uses
Trino today. They also have increasing uses in federated querying and A/B 
testing.&lt;/p&gt;

&lt;p&gt;To deploy Trino as a service, Apple has an in-house Kubernetes operator to
manage the Trino cluster lifecycles. They also created an orchestrator to
provision and simplify cluster creation and management. They make this a
self-service console that allows users to provision their own clusters per
request. Their custom orchestrator also takes care of autoscaling and other
technical complexities of maintaining a scalable Trino system.&lt;/p&gt;

&lt;p&gt;Apple primarily uses Iceberg, Hive, and Cassandra connectors. They have a heavy
focus on Apache Iceberg as their table format and have contributed a significant
amount of PRs to improve interoperability between Trino and Spark, and increased
coverage of Iceberg APIs. Other challenges Apple face stem from the lack of
flexible routing of queries to achieve zero downtime, and having pluggable
optimizer rules and operators.&lt;/p&gt;

&lt;p&gt;Apple has various features on their roadmap to eventually contribute to the
community. This includes, exposing remaining functionality in the Iceberg APIs,
support all partition transforms, predicate pushdowns, bucketed joins, simple
aggregate pushdowns, Iceberg native views in Trino, and more.&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, please consider sharing this on
Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social
card and link to &lt;a href=&quot;https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html&quot;&gt;https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html&lt;/a&gt;. If you think Trino is awesome, 
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/apple-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Vinitha Gankidi, Yathi Peddyshetty, Brian Olsen</name>
        </author>
      

      <summary>This post continues a larger series of posts on the Trino Summit 2022 sessions. Following the Keynote: State of Trino session, engineers from Apple shared the current usage of Trino at Apple. They discuss how they support Trino as a service for multiple end-users, and the critical features that drew Apple to Trino. They wrap up with some challenges they have faced and some development they have planned to contribute to Trino.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/apple.jpg" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022 recap: The state of Trino</title>
      <link href="https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html" rel="alternate" type="text/html" title="Trino Summit 2022 recap: The state of Trino" />
      <published>2022-11-22T00:00:00+00:00</published>
      <updated>2022-11-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html">&lt;p&gt;To kick off the &lt;a href=&quot;/blog/2022/11/21/trino-summit-2022-recap.html&quot;&gt;Trino Summit 2022&lt;/a&gt;,
we heard from Trino co-creators Martin Traverso, Dain Sundstrom, and David
Phillips. Martin gave a talk on the state of Trino and project plans for 2023,
then opened the floor to questions from the community. You can watch a recording
of the talk, or read on if you’re only interested in the highlights.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/mUq_h3oArp4&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md&quot; target=&quot;_blank&quot; href=&quot;/assets/blog/trino-summit-2022/State-of-Trino-Nov-2022.pdf&quot;&gt;
  Check out the slides!
&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;recap&quot;&gt;Recap&lt;/h2&gt;

&lt;p&gt;So what &lt;em&gt;has&lt;/em&gt; happened in Trino over the last year?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2022/08/08/trino-tenth-birthday.html&quot;&gt;We celebrated Trino’s 10th birthday!&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;It was the busiest year in project history, with 600+ contributors, 4000+
commits, and near-weekly releases.&lt;/li&gt;
  &lt;li&gt;Tons of new features were added, including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;, JSON functions, table
functions, fault-tolerant execution (look forward to a lot of talking about it
in later recaps!), upgrading to Java 17, and a slide so dense with other
goodies that it needed two columns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And what’s coming down the pipeline?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/14237&quot;&gt;Project Hummingbird&lt;/a&gt;, a large
set of core engine improvements.&lt;/li&gt;
  &lt;li&gt;Expanded table function support, including accepting tables as arguments.&lt;/li&gt;
  &lt;li&gt;Extra community support, so that contributors have an easier and better time
getting code merged into Trino.&lt;/li&gt;
  &lt;li&gt;New connectors, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP CATALOG&lt;/code&gt;, query tracing, and more!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There were also tons of great questions asked by live and online attendees
answered by Dain, David, and Martin, so if you want to hear more, take a listen
to the full talk!&lt;/p&gt;

&lt;h2 id=&quot;share-this-session&quot;&gt;Share this session&lt;/h2&gt;

&lt;p&gt;If you thought this talk was interesting, consider sharing this on Twitter,
Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and
link to &lt;a href=&quot;https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&quot;&gt;https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&lt;/a&gt;. If you think Trino is awesome,
&lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;give us a 🌟 on GitHub &lt;i class=&quot;fab fa-github&quot;&gt;&lt;/i&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/keynote-social.png&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips, Cole Bowden</name>
        </author>
      

      <summary>To kick off the Trino Summit 2022, we heard from Trino co-creators Martin Traverso, Dain Sundstrom, and David Phillips. Martin gave a talk on the state of Trino and project plans for 2023, then opened the floor to questions from the community. You can watch a recording of the talk, or read on if you’re only interested in the highlights.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/keynote-header.jpeg" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022 recap</title>
      <link href="https://trino.io/blog/2022/11/21/trino-summit-2022-recap.html" rel="alternate" type="text/html" title="Trino Summit 2022 recap" />
      <published>2022-11-21T00:00:00+00:00</published>
      <updated>2022-11-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/11/21/trino-summit-2022-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/11/21/trino-summit-2022-recap.html">&lt;p&gt;Trino Summit 2022 was in a word, invigorating. I’m still coming off the high 
from the amount of energy I gained from being at this summit, meeting many of
you face-to-face for the first time. Most surprisingly, I learned that Trino
contributor James Petty from AWS was actually not famous painter
&lt;a href=&quot;https://en.wikipedia.org/wiki/Bob_Ross&quot;&gt;Bob Ross&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/james-petty.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you’ve ever planned a conference, you know that there are a lot of details to
iron out, and you can be left exhausted by the end. After this year’s Trino
Summit though, rather than being worn out, I felt like it ended too quickly and
I simply wanted more time to chat with everyone. A single day was simply not
enough, and now all I can think about is the next summit. We not only got to
hear an incredible lineup of talks and discussions from first-time Trino Summit
speakers like Apple, Shopify, and Lyft, but also had many engaging discussions
outside the auditorium.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/swag.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/authors.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/talking-1.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/talking-2.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There were cross-community discussions between Delta Lake, Airflow, and Alluxio
about how to turbo-charge Trino integrations with these communities. There were
many companies talking about best practices and gotchas while migrating from
Hive to Iceberg or Delta Lake. Others wanted to learn how to use fault-tolerant
execution. I spoke with managers of companies like LinkedIn and Bloomberg who
wanted to help develop their engineers to get more involved with contributing to
Trino. We all finally got to see the faces of people we had been talking to for
the past two to three years for the first time. People were getting their free
copies of Trino: The Definitive Guide signed by Manfred, Matt, and Martin and
brought home other swag. After a long day of talks, we wrapped Trino Summit up
with two happy hours on the roof of the Commonwealth club watching the sunset
over the San Francisco bay bridge.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/speech.jpg&quot; /&gt;
&lt;img src=&quot;/assets/blog/trino-summit-2022/happy-hour.jpg&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;session-summaries&quot;&gt;Session summaries&lt;/h2&gt;

&lt;p&gt;I would like to quickly summarize a few short takeaways I had from each talk at
the summit. I highly recommend you watch the full videos on the Trino YouTube
which are linked in the titles:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mUq_h3oArp4&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Keynote: State of Trino&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Trino co-creator, Martin, covers recently developed features, community 
statistics, and discusses roadmap features like Project Hummingbird.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Dain and David join Martin on the stage to answer audience questions.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mUq_h3oArp4&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/keynote.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=3afcRK6Yvio&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Trino at Apple&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/11/28/trino-summit-2022-apple-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Apple has an in-house k8s operator to manage Trino cluster lifecycles, and an
orchestrator to provision and simplify cluster creation and management.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Apple has a heavy focus on Apache Iceberg as their table format and has
contributed a significant amount of PRs to improve interoperability between
Trino and Spark and increased coverage of Iceberg APIs.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=3afcRK6Yvio&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/apple.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ePr-iVQ5ri4&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Bloomberg uses Trino to centralize access to their massive amounts of catalogs
under many different departments.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To offer Trino-as-a-Service for varying workloads, they use a Trino Load
Balancer (a fork of the popular presto-gateway project at Lyft) to add new
functionality. In talking with them after their presentation, the Bloomberg
team expressed an interest in wanting to open source this work to the
community as a more generalized solution than the gateway project.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ePr-iVQ5ri4&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/bloomberg.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=vz9reBUgQTE&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Optimizing Trino using spot instances&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/01/trino-summit-2022-zillow-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;In an attempt to minimize costs, Zillow is measuring the efficacy of running
Trino ETL jobs on spot instances.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This currently runs the risk of retries for failure but future work will look
at utilizing the new fault-tolerant execution method to mitigate retries in
the event of failure.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=vz9reBUgQTE&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/zillow.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=g9fLA3tFG-Q&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Leveraging Trino to Power Data at Goldman Sachs&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Goldman Sachs uses Trino to power their data quality service, taking advantage
of the fact that Trino centralizes all visibility across their platform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=g9fLA3tFG-Q&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/goldman-sachs.png&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=sSWBi7bBotQ&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Elevating data fabric to data mesh: Solving data needs in hybrid datalakes&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/07/trino-summit-2022-comcast-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Comcast takes us through their Trino architecture journey by providing the
history of their Data Fabric service, and now discusses the data governance
and culture changes required to realize a Data Mesh with Trino.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=sSWBi7bBotQ&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/comcast.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=nJBBw-xnLU8&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/09/trino-summit-2022-shopify-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Shopify has recently migrates of its workloads to Trino. One of the first
hurdles was dealing with many issues in the Hive table format, so they quickly
upgraded to the Iceberg table format.&lt;/li&gt;
  &lt;li&gt;They initially encountered numerous issued, but experienced incredibly fast
turnaround of fixes from the Trino project that resolved their issues during
the migration.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;There’s also a benchmark of how updating to a columnar format and Iceberg
table format drastically improves the results.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=nJBBw-xnLU8&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/shopify.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=FL3c1Ue7YWM&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Trino for Large Scale ETL at Lyft&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/12/trino-summit-2022-lyft-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Lyft is using Trino to perform ETL jobs scanning 10PB of data per day, and
writing 100TB per day. They are not using fault-tolerant execution.&lt;/li&gt;
  &lt;li&gt;In the last year, Lyft cut their number of Trino nodes in half, while
increasing the volume of their workloads due to recent improvements in Trino
and upgrades in Java versions.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Keeping up with the rapid release cycle of Trino was a challenge and Lyft
showcases their regression testing using their query replay framework.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=FL3c1Ue7YWM&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/lyft.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Zfmxwu0m98k&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Federating them all on Starburst Galaxy&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/14/trino-summit-2022-starburst-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Running and scaling Trino is difficult. Starburst showcases Starburst Galaxy,
a SaaS data platform built around the Trino query engine.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This demoes running federated queries over Pokémon data scattered across
MongoDB and Iceberg tables.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Zfmxwu0m98k&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/starburst.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Q03DzL_fm-I&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Trino at Quora: Speed, Cost, Reliability Challenges and Tips&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/16/trino-summit-2022-quora-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Quora uses a large number of Trino clusters for ad-hoc, ETL, time series, A/B
testing, and backfill data.&lt;/li&gt;
  &lt;li&gt;Quora faced some initially high costs on Trino due to inefficient uses of
resources.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To address this they migrated to use Graviton instances, implemented
autoscaling, and optimized query efficiency.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Q03DzL_fm-I&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/quora.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=V9_aPLXATh8&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Journey to Iceberg with SK Telecom&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The speakers travelled all the way from South Korea to join us in person.&lt;/li&gt;
  &lt;li&gt;SK Telecom had a multitude of performance issues that all stemmed from the
lack of flexibility in the Hive model and metastore.&lt;/li&gt;
  &lt;li&gt;They migrated to Iceberg to address performance issues and had added benefits
of Iceberg’s table format to improve developer workflow.&lt;/li&gt;
  &lt;li&gt;Housekeeping operations like optimize were already addressed by the Iceberg
community and quickly added to Trino.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This reduced query processing time by 80%.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=V9_aPLXATh8&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/sk-telecom.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=xKDN7RUJ5i4&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Using Trino with Apache Airflow for (almost) all your data problems&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/21/trino-summit-2022-astronomer-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Airflow is a highly functional and well-adopted workflow management platform
to schedule jobs on your data platform.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The Trino integration for Airflow recently landed and this coincided with the
GA arrival of fault-tolerance execution mode in Trino.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=xKDN7RUJ5i4&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/astronomer.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=MCB_1furnAo&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; How we use Trino to analyze our Product-led Growth (PLG) user activation funnel&lt;/a&gt;
(&lt;a href=&quot;/blog/2022/12/23/trino-summit-2022-upsolver-recap.html&quot;&gt;Read more&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Upsolver solves a lot of common data problems on their platform.&lt;/li&gt;
  &lt;li&gt;One such problem is measuring activation rates in a product-led growthteam. This requires taking action on many sources of data.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Trino makes a natural fit to address the issues of joining this data together.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=MCB_1furnAo&quot;&gt;&lt;img width=&quot;40%&quot; src=&quot;/assets/blog/trino-summit-2022/upsolver.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;federate-em-all&quot;&gt;Federate ‘em all&lt;/h2&gt;

&lt;p&gt;After a whole day of throwing Trino balls out to the crowd, we got to see a
nice metaphor for federated data by throwing them all in the air and yelling,
“Federate ‘em all!”&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/balls.jpg&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;trino-contributor-congregation&quot;&gt;Trino Contributor Congregation&lt;/h2&gt;

&lt;p&gt;The day after the summit, we invited a relatively small group of our
contributors to meet for the inaugural Trino Contributor Congregation (TCC).
This gathered many of our long-time and heavy Trino contributors. We had folks
from companies like Starburst, AWS, Apple, Bloomberg, Lyft, Comcast, LinkedIn,
Treasure Data, and others. Let’s dive into some of the topics we discussed.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/contributor-congregation.jpg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We discussed feature proposals like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The Trino loadbalancer which is an adaption of the popular gateway project from Lyft.&lt;/li&gt;
  &lt;li&gt;A Ranger plugin to be maintained by the Trino community rather than rely on the Ranger project.&lt;/li&gt;
  &lt;li&gt;A Snowflake connector that was traditionally held back by the lack of infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We discussed the need for better shared testing datasets outside of the TPC-H
and TPC-DS that are more representative of real workloads that many are using.&lt;/p&gt;

&lt;p&gt;We discussed the need for a clearer process for contributors to follow to
minimize the time to get features merged and avoid stale PRs. This is being
addressed by the backlog grooming performed by the developer relations team, and
assigning maintainers to own various PRs. While there is never a promise to
merge a PR, improving the turnaround and communication on PRs is crucial to keep
happy contributors and improve the health of the project.&lt;/p&gt;

&lt;p&gt;While we were sad that not everyone could make the in-person TCC, we plan to
have virtual TCCs on a more frequent cadence and have the in-person TCCs
alongside larger in-person events. Getting these TCCs right is core to growing
the maintainership and continued success of the Trino project.&lt;/p&gt;

&lt;p&gt;We hope all of you who could join us in-person and online enjoyed yourselves. We
all had such a blast! Stay tuned for updates on the next Trino Summit location!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-summit-2022/bun-bun-bye.jpg&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Trino Summit 2022 was in a word, invigorating. I’m still coming off the high from the amount of energy I gained from being at this summit, meeting many of you face-to-face for the first time. Most surprisingly, I learned that Trino contributor James Petty from AWS was actually not famous painter Bob Ross.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/stage.jpg" />
      
    </entry>
  
    <entry>
      <title>42: Trino Summit 2022 recap</title>
      <link href="https://trino.io/episodes/42.html" rel="alternate" type="text/html" title="42: Trino Summit 2022 recap" />
      <published>2022-11-17T00:00:00+00:00</published>
      <updated>2022-11-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/42</id>
      <content type="html" xml:base="https://trino.io/episodes/42.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at 
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Manfred Moser, Director of Information Engineering at 
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;@simpligility&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Zhan, Product Manager at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/brianzhan1&quot;&gt;@brianzhan1&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/claudiusli&quot;&gt;Claudius Li&lt;/a&gt;, Product Manager at
&lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dain Sundstrom, Trino creator and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/daindumb&quot;&gt;@daindumb&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Martin Traverso, Trino creator and CTO at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
(&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;@mtraverso&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-402-to-403&quot;&gt;Releases 402 to 403&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-402.html&quot;&gt;Trino 402&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for column comments in Hive and Iceberg views.&lt;/li&gt;
  &lt;li&gt;Support predicate pushdown on temporal types in MongoDB connector.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OR&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nullif&lt;/code&gt;, and arithmetic operations in SQL Server connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-403.html&quot;&gt;Trino 403&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; in MongoDB.&lt;/li&gt;
  &lt;li&gt;Faster aggregations.&lt;/li&gt;
  &lt;li&gt;Faster data transfers with fault-tolerant execution.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW SCHEMAS&lt;/code&gt; in BigQuery.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt; in Apache Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-402.html&quot;&gt;Trino 402&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-403.html&quot;&gt;Trino 403&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;trino-summit-2022-recap&quot;&gt;Trino Summit 2022 recap&lt;/h2&gt;

&lt;p&gt;This episode we’re doing a recap of both the Trino Summit and the first Trino
Contributor Congregation. We dive into what everyone’s favorite Trino Summit
sessions were. Then we cover key takeaways from the Trino Contributor
Congregation, which took place the day after.&lt;/p&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Top five reasons to attend Trino Summit 2022</title>
      <link href="https://trino.io/blog/2022/10/31/trino-summit-2022-teaser-3.html" rel="alternate" type="text/html" title="Top five reasons to attend Trino Summit 2022" />
      <published>2022-10-31T00:00:00+00:00</published>
      <updated>2022-10-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/10/31/trino-summit-2022-teaser-3</id>
      <content type="html" xml:base="https://trino.io/blog/2022/10/31/trino-summit-2022-teaser-3.html">&lt;p&gt;This blog post wraps up a series of 
&lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;previous posts&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/blog/2022/10/19/trino-summit-2022-teaser-2.html&quot;&gt;teasing Trino Summit 2022&lt;/a&gt;.
The conference is free and takes place in San Francisco, California on November
10th. Join us either in-person or virtually!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Lets dive right into the five reasons you should attend Trino Summit 2022. If
you’re not into these lists, go ahead and 
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register now&lt;/a&gt;!&lt;/p&gt;

&lt;h3 id=&quot;1-hear-speakers-from-industry-leading-companies-talk-about-their-trino-architecture-and-use-cases&quot;&gt;1. Hear speakers from industry leading companies talk about their Trino architecture and use cases&lt;/h3&gt;

&lt;p&gt;This year’s summit contains leaders in the industry with varying workloads and
use cases. There are also sessions on tips and tricks to scale and lower the
cost of running Trino in production. Users from the following companies speak
about their challenges and how they use Trino to help overcome them:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Apple&lt;/li&gt;
  &lt;li&gt;Astronomer&lt;/li&gt;
  &lt;li&gt;Bloomberg&lt;/li&gt;
  &lt;li&gt;Comcast&lt;/li&gt;
  &lt;li&gt;Goldman Sachs&lt;/li&gt;
  &lt;li&gt;Lyft&lt;/li&gt;
  &lt;li&gt;Quora&lt;/li&gt;
  &lt;li&gt;Shopify&lt;/li&gt;
  &lt;li&gt;SK Telecom&lt;/li&gt;
  &lt;li&gt;Starburst&lt;/li&gt;
  &lt;li&gt;Upsolver&lt;/li&gt;
  &lt;li&gt;Zillow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To see more information about the talks and the agenda for the conference, check
out the &lt;a href=&quot;https://www.starburst.io/info/trinosummit#agenda&quot;&gt;Trino Summit 2022 agenda&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;2-meet-the-authors-of-the-trino-the-definitive-guide-and-get-that-trino-swag&quot;&gt;2. Meet the authors of the &lt;strong&gt;&lt;em&gt;Trino: The Definitive Guide&lt;/em&gt;&lt;/strong&gt; and get that Trino swag&lt;/h3&gt;

&lt;p&gt;This year, we are giving away autographed copies of the recently updated,
&lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;&lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt;&lt;/a&gt; to members who are attending.
Already have a physical copy? Visit the Trino booth to get your book signed and
meet authors 
&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;Manfred Moser&lt;/a&gt;,
&lt;a href=&quot;https://twitter.com/mfullertweets&quot;&gt;Matt Fuller&lt;/a&gt;, and
&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt; who literally wrote the book on
Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; src=&quot;/assets/ttdg2-cover.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;We will be giving away swag packs containing an autographed copy of Trino: The
Definitive Guide, a Trino Summit 2022 shirt, a Commander Bun Bun plushie, and 
more to both virtual and in-person attendees! This will be done during our
sponsored giveaway breaks between sessions where we challenge both online and 
virtual attendees in a race against time to bag the swag!&lt;/p&gt;

&lt;h3 id=&quot;3-federate-em-all&quot;&gt;3. Federate ‘em all&lt;/h3&gt;

&lt;p&gt;This year’s summit will be a free event that federates both data and humans. The
theme extends from a popular show that many of you know called Pokémon. To
understand the connection here, let’s break down what we mean by federate ‘em
all. In the same way Pokémon protagonist, Ash Ketchum, catches and trains
heterogeneous creatures called Pokémon, Trino queries and filters heterogeneous
data sets from various data sources.&lt;/p&gt;

&lt;p&gt;If you’re not familiar with Pokémon, a losing strategy is to train just one or
two Pokémon as different types of Pokémon are better suited to different tasks.
In the same way, centralizing all of your data to a single data warehouse or
data lake doesn’t make sense either. There are different use cases and 
different needs across the company. Rather than spending your time building
brittle one-size-fits-all architectures, Trino enables you to connect to
&lt;a href=&quot;https://trino.io/docs/current/connector.html&quot;&gt;multiple data sources&lt;/a&gt; using ANSI SQL.&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h3 id=&quot;4-experience-beautiful-san-francisco&quot;&gt;4. Experience beautiful San Francisco&lt;/h3&gt;

&lt;p&gt;For those attending in-person, you will get to enjoy the beautiful San Francisco
area. The &lt;a href=&quot;https://www.starburst.io/info/trinosummit/#location&quot;&gt;Commonwealth club&lt;/a&gt;,
is located right on the San Francisco Bay. The building is beautiful with a
large auditorium for the main event, and plenty of floors and rooms for socializing.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/trino-summit-2022/commonwealth-club.jpeg&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;At the end of the summit, we will have a happy hour on the scenic roof-deck that
gazes over the San Francisco bay at the iconic Oakland Bay Bridge.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-summit-2022/san-francisco.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;We know this only applies to our in-person attendees, but remember if you join
us virtually, there are still plenty of resources to network and interact 
throughout the conference. We will be taking questions from our virtual audience
and there will also be a chat forum to discuss with attendees from across the
globe. Plus, unlike those of us attending in-person, no travel is required and
pajamas are optional during the event!&lt;/p&gt;

&lt;h3 id=&quot;5-collaborate-with-some-of-the-best-minds-working-on-trino&quot;&gt;5. Collaborate with some of the best minds working on Trino&lt;/h3&gt;

&lt;p&gt;Trino is a relatively new paradigm compared to the rest of data world. If you
just realized that you don’t have to move all your data into one location,
you’re on the right track. However, there’s still a lot to learn when it comes
to scaling out a query engine that over time grows in usage. To get this right,
you need a community to be successful. The creators Martin, Dain, and David and
many of the core contributors of Trino will be attending, along with a large
list of folks that are using multiple clusters over hundreds of petabytes of
data.&lt;/p&gt;

&lt;p&gt;Tap into this incredibly passionate group of Trino enthusiasts to augment your
experience with this revolutionary query engine!&lt;/p&gt;

&lt;h2 id=&quot;register-for-the-summit&quot;&gt;Register for the summit&lt;/h2&gt;

&lt;p&gt;Make sure you register quickly for in-person registration, as it is limited to
250 seats. Spots are running out quickly so don’t wait!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h2 id=&quot;announcing-the-final-round-of-sessions-and-the-agenda&quot;&gt;Announcing the final round of sessions and the agenda!&lt;/h2&gt;

&lt;p&gt;Now for the final list of sessions to announce for this year’s Trino Summit!
This week is quite the reveal as we are showcasing a talk of how engineers at
Apple use Trino for their analytics challenges! 🎉🤯&lt;/p&gt;

&lt;p&gt;We also have three more amazing guests that are heavy hitters in the data and
analytics tech scene.&lt;/p&gt;

&lt;h3 id=&quot;trino-at-apple&quot;&gt;Trino at Apple&lt;/h3&gt;

&lt;p&gt;In this talk the audience will learn how Apple uses Trino to accelerate
analytics, the challenges we face deploying analytics at scale at Apple, and the
areas we would like to collaborate on with the community.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Vinitha Gankidi, Software engineer at Apple&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Yathindranath Peddyshetty, Software engineer at Apple&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;enterprise-ready-trino-at-bloomberg-one-giant-leap-toward-data-mesh&quot;&gt;Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!&lt;/h3&gt;

&lt;p&gt;Enterprises like Bloomberg love Trino. It allows us to embrace the data mesh
with ease. Providing Trino as a service in a highly available, configurable, and
access-controlled manner has been a key enabler for us in this paradigm shift.
Join us to learn how we have leveraged open-source components to achieve these
goals at Bloomberg.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Pablo Arteaga, Software Engineer at Bloomberg&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Vishal Jadhav, Software Engineer at Bloomberg&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;leveraging-trino-to-power-data-quality-at-goldman-sachs&quot;&gt;Leveraging Trino to power data quality at Goldman Sachs&lt;/h3&gt;

&lt;p&gt;Data is at the core of today’s business processes. We are responsible for making
accurate, timely, and modeled data available to our analytics and application
teams. The source of these datasets can be quite heterogeneous like HDFS, S3,
Sybase, Snowflake, Elasticsearch, and more. Also with an increase in data
volume, velocity, and variety; data quality assurance is extremely critical to
ensure the trustworthiness of data and mark it usable for consumers to use with
confidence. We have leveraged Trino to make high-quality data centrally
accessible through an efficient, secure, governed, and unified way of performing
analytics.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Sumit Halder, Vice President at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Ramesh Bhanan, Vice President at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Siddhant Chadha, Associate at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Suman Baliganahalli Narayan Murthy, Vice President at Goldman Sachs&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;optimizing-trino-using-spot-instances&quot;&gt;Optimizing Trino using spot instances&lt;/h3&gt;

&lt;p&gt;Trino is a critical tool used at Zillow for doing analytics on datalake. In this
talk we aim to give a general overview of how we leverage Trino and dive deeper
into the optimizations we have done for scaling Trino at Zillow using Spot
instances.&lt;/p&gt;

&lt;p&gt;In this session, we will show how fault-tolerant execution mode enables a more
cost-effective and resilient execution running Trino on Spot.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Rupesh Kumar Perugu, Senior Software Engineer at Zillow&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Santhosh Venkatraman, Software Engineer at Zillow&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That finalizes all of our sessions! To see them all, check out the
&lt;a href=&quot;https://www.starburst.io/info/trinosummit#agenda&quot;&gt;Trino Summit 2022 agenda&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Get excited, the conference is in less than two weeks so don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register&lt;/a&gt;, and always, &lt;strong&gt;&lt;em&gt;Federate them
all&lt;/em&gt;&lt;/strong&gt;! It is really shaping up to be an educational and fun-filled event with
Trino experts and aficionados.&lt;/p&gt;

&lt;p&gt;A huge thanks to our sponsors: Starburst, Privacera, Monte Carlo, Immuta,
CubeJS, Delta Lake, Hightouch, Backblaze, Databricks, Alluxio, and Tabular!&lt;/p&gt;

&lt;p&gt;Well that’s a wrap, we’ll see you all in T-minus ten days!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>This blog post wraps up a series of previous posts teasing Trino Summit 2022. The conference is free and takes place in San Francisco, California on November 10th. Join us either in-person or virtually! Register now</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>41: Trino puts on its Hudi</title>
      <link href="https://trino.io/episodes/41.html" rel="alternate" type="text/html" title="41: Trino puts on its Hudi" />
      <published>2022-10-27T00:00:00+00:00</published>
      <updated>2022-10-27T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/41</id>
      <content type="html" xml:base="https://trino.io/episodes/41.html">&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Sagar Sumit, Software Engineer at 
 &lt;a href=&quot;https://www.onehouse.ai&quot;&gt;Onehouse&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/sagarsumit6&quot;&gt;@sagarsumit6&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/yueluhelloworld&quot;&gt;Grace (Yue) Lu&lt;/a&gt;, Software
Engineer at &lt;a href=&quot;https://robinhood.com&quot;&gt;Robinhood&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;register-for-trino-summit-2022&quot;&gt;Register for Trino Summit 2022!&lt;/h2&gt;

&lt;p&gt;Trino Summit 2022 is coming around the corner! This &lt;strong&gt;free&lt;/strong&gt; event on November
10th will take place in-person at the Commonwealth Club in San Francisco, CA or
can also be attended remotely!&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Read about the recently announced speaker sessions and details in these blog posts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;Trino Summit 2022 first post&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/10/19/trino-summit-2022-teaser-2.html&quot;&gt;Trino Summit 2022 second post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;You can register for the conference&lt;/a&gt;
at any time. We must limit in-person registrations to 250 
attendees, so register soon if you plan to attend in person!&lt;/p&gt;

&lt;h2 id=&quot;releases-396-to-401&quot;&gt;Releases 396 to 401&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-396.html&quot;&gt;Trino 396&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance when processing strings.&lt;/li&gt;
  &lt;li&gt;Faster writing of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; types to Parquet.&lt;/li&gt;
  &lt;li&gt;Support for pushing down complex join criteria to connectors.&lt;/li&gt;
  &lt;li&gt;Support for column and table comments in BigQuery connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-397.html&quot;&gt;Trino 397&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;S3 Select pushdown for JSON data in Hive connector.&lt;/li&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_trunc&lt;/code&gt; predicates over partition columns in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Reduced query latency with Glue catalog in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-398.html&quot;&gt;Trino 398&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Hudi connector.&lt;/li&gt;
  &lt;li&gt;Improved performance for Parquet data in Delta Lake, Hive and Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;Support for column comments in Accumulo connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; type in Pinot connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-399.html&quot;&gt;Trino 399&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster joins.&lt;/li&gt;
  &lt;li&gt;Faster reads of decimal values in Parquet data.&lt;/li&gt;
  &lt;li&gt;Support for writing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; columns in BigQuery.&lt;/li&gt;
  &lt;li&gt;Support for predicate pushdown involving datetime types in MongoDB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-400.html&quot;&gt;Trino 400&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for TRUNCATE in BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Support for the Pinot proxy.&lt;/li&gt;
  &lt;li&gt;Improved latency when querying Iceberg tables with many files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-401.html&quot;&gt;Trino 401&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance and reliability of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for writing to Google Cloud Storage in Delta Lake.&lt;/li&gt;
  &lt;li&gt;Support for IBM Cloud Object Storage in Hive.&lt;/li&gt;
  &lt;li&gt;Support for writes with fault-tolerant execution in MySQL, PostgreSQL, and SQL
Server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Cole:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The new Hudi connector is worth mentioning twice. It was in the works for a
while, and we’re really excited it has arrived and continues to improve.&lt;/li&gt;
  &lt;li&gt;Trino 396 added support for version three of the Delta Lake writer, then Trino
401 added support for version four, so we’ve jumped from two to four since the
last time you saw us!&lt;/li&gt;
  &lt;li&gt;There have been a ton of fixes to table and column comments across a wide
variety of connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-396.html&quot;&gt;Trino 396&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-397.html&quot;&gt;Trino 397&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-398.html&quot;&gt;Trino 398&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-399.html&quot;&gt;Trino 399&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-400.html&quot;&gt;Trino 400&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-401.html&quot;&gt;Trino 401&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-intro-to-hudi-and-the-hudi-connector&quot;&gt;Concept of the week: Intro to Hudi and the Hudi connector&lt;/h2&gt;

&lt;p&gt;This week we’re talking about the Hudi connector that was added in version 398.&lt;/p&gt;

&lt;h3 id=&quot;what-is-apache-hudi&quot;&gt;What is Apache Hudi?&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Apache Hudi&lt;/a&gt; (pronounced “hoodie”) is a streaming
data lakehouse platform by combining warehouse and database functionality. Hudi
is a table format that enables transactions, efficient upserts/deletes, advanced
indexing, streaming ingestion services, data clustering/compaction
optimizations, and concurrency.&lt;/p&gt;

&lt;p&gt;Hudi is not just a table format, but has many services aimed at creating
efficient incremental batch pipelines. Hudi was born out of Uber and is used at
companies like Amazon, ByteDance, and Robinhood.&lt;/p&gt;

&lt;h3 id=&quot;merge-on-read-mor-and-copy-on-write-cow-tables&quot;&gt;Merge on read (MOR) and copy on write (COW) tables&lt;/h3&gt;

&lt;p&gt;The Hudi table format and services aim to provide a suite of tools that make
Hudi adaptive to realtime and batch use cases on the data lake. Hudi will lay
out data following 
&lt;a href=&quot;https://hudi.apache.org/docs/next/table_types#merge-on-read-table&quot;&gt;merge on read&lt;/a&gt;,
which optimizes writes over reads, and
&lt;a href=&quot;https://hudi.apache.org/docs/next/table_types#copy-on-write-table&quot;&gt;copy on write&lt;/a&gt;,
which optimizes reads over writes.&lt;/p&gt;

&lt;h3 id=&quot;hudi-metadata-table&quot;&gt;Hudi metadata table&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://hudi.apache.org/docs/next/metadata&quot;&gt;Hudi metadata table&lt;/a&gt; can
improve read/write performance of your queries. The main purpose of this table
is to eliminate the requirement for the “list files” operation. It is a result
from how
&lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Hive-modelled SQL tables&lt;/a&gt;
point to entire directories versus pointing to specific files with ranges.
Using files with ranges help prune out files outside the query criteria.&lt;/p&gt;

&lt;h3 id=&quot;hudi-data-layout&quot;&gt;Hudi data layout&lt;/h3&gt;

&lt;p&gt;Hudi uses 
&lt;a href=&quot;https://hudi.apache.org/docs/next/file_layouts&quot;&gt;multiversion concurrency control (MVCC)&lt;/a&gt;,
where compaction action merges logs and base files to produce new file slices
a cleaning action gets rid of unused/older file slices to reclaim space on the
file system.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/41/hudi-mvcc-files.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;robinhood-trino-and-hudi-use-cases&quot;&gt;Robinhood Trino and Hudi use cases&lt;/h3&gt;

&lt;p&gt;One of the well-known users of Trino and Hudi is Robinhood. Grace (Yue) Lu, who
&lt;a href=&quot;https://www.youtube.com/watch?v=gFTDQGRXOus&quot;&gt;joined us at Trino Summit 2021&lt;/a&gt;,
covers Robinhood’s architecture and use cases for Trino and Hudi.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/41/robinhood-hudi-trino-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Robinhood ingests data via Debezium and streams it into Hudi. Then Trino is able
to read data as it becomes available in Hudi.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/41/robinhood-use-cases.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Hudi and Trino support critical use cases like IPO company stock allocation,
liquidity risk monitoring, clearing settlement reports, and generally fresher
metrics reporting and analysis.&lt;/p&gt;

&lt;h3 id=&quot;the-current-state-of-the-trino-hudi-connector&quot;&gt;The current state of the Trino Hudi connector&lt;/h3&gt;

&lt;p&gt;Before we had 
&lt;a href=&quot;https://trino.io/docs/current/connector/hudi.html&quot;&gt;the official Hudi connector&lt;/a&gt;,
many, like Robinhood, had to use the Hive connector. They were therefore not
able to take advantage of the metadata table and many other optimizations Hudi
provides out of the box.&lt;/p&gt;

&lt;p&gt;The connector gets around that and now enables using some Hudi abstractions.
However, the connector is currently limited to read-only mode and doesn’t
support writes. Spark is the primary system used to stream data to Trino in
Hudi. Check out the demo to see the connector in action.&lt;/p&gt;

&lt;h3 id=&quot;upcoming-features-in-hudi-connector&quot;&gt;Upcoming features in Hudi connector&lt;/h3&gt;

&lt;p&gt;First we want to get the read support improved and support all query types. As a
next step we aim to add DDL support.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The connector only supports copy on write tables, and soon we will add merge
on read table support.&lt;/li&gt;
  &lt;li&gt;Hudi has multiple 
&lt;a href=&quot;https://hudi.apache.org/docs/next/table_types#query-types&quot;&gt;query types&lt;/a&gt;.
Adding snapshot querying support will be coming shortly.&lt;/li&gt;
  &lt;li&gt;Integration with metadata table.&lt;/li&gt;
  &lt;li&gt;Utilize the column statistics index.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-14445-fault-tolerant-execution-for-postgresql-and-mysql-connectors&quot;&gt;PR 14445: Fault-tolerant execution for PostgreSQL and MySQL connectors&lt;/h2&gt;

&lt;p&gt;This &lt;a href=&quot;https://github.com/trinodb/trino/pull/14445&quot;&gt;PR of the episode&lt;/a&gt; was
contributed by Matthew Deady (&lt;a href=&quot;https://github.com/mwd410&quot;&gt;@mwd410&lt;/a&gt;). The
improvements enable writes to PostgreSQL and MySQL when fault-tolerant execution
is enabled (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry-policy&lt;/code&gt; is set to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TASK&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QUERY&lt;/code&gt;). This update included a 
few changes to core classes used for connectors using JDBC clients for Trino to
connect to the database. For example, Matthew was able to build on this PR by
adding a few additional changes to get this working in SQL Server in
&lt;a href=&quot;https://github.com/trinodb/trino/pull/14730&quot;&gt;PR 14730&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thank you so much to Matthew for extending our fault-tolerant execution to
connectors using JDBC clients! As usual, thanks to all the reviewers and
maintainers who got these across the line!&lt;/p&gt;

&lt;h2 id=&quot;demo-using-the-hudi-connector&quot;&gt;Demo: Using the Hudi Connector&lt;/h2&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone the
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hudi/trino-hudi-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git
cd community_tutorials/hudi/trino-hudi-minio
docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For now, you will need to import data using the Spark and Scala method we detail
in the video. Eventually we will provide a SparkSQL in the near term, and update
this to show the Trino DDL support when it lands.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SHOW CATALOGS;

SHOW SCHEMAS IN hudi;

SHOW TABLES IN hudi.default;

SELECT COUNT(*) FROM hudi.default.hudi_coders_hive;

SELECT * FROM hudi.default.hudi_coders_hive;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Hudi&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://onehouse.io&quot;&gt;One House&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blog posts&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://robinhood.engineering/author-balaji-varadarajan-e3f496815ebf&quot;&gt;Fresher Data Lake on S3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Hosts</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022: Federating humans and data</title>
      <link href="https://trino.io/blog/2022/10/19/trino-summit-2022-teaser-2.html" rel="alternate" type="text/html" title="Trino Summit 2022: Federating humans and data" />
      <published>2022-10-19T00:00:00+00:00</published>
      <updated>2022-10-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/10/19/trino-summit-2022-teaser-2</id>
      <content type="html" xml:base="https://trino.io/blog/2022/10/19/trino-summit-2022-teaser-2.html">&lt;p&gt;Trino has long been the de facto standard to querying large data sets over your
cloud or on-prem storage, also known as data lakes. This Trino Summit’s theme 
instead will showcase Trino’s other claim to fame: query federation. Trino is a
query engine providing an access point that exposes ANSI SQL across 
&lt;a href=&quot;https://trino.io/docs/current/connector.html&quot;&gt;multiple data sources&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I urge you to join us either in-person or virtually if you are a fan of Trino,
big data, open source, data engineering, Java, or all the above! This conference
is free and takes place in San Francisco, California on November 10th.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;register-for-the-summit&quot;&gt;Register for the summit&lt;/h2&gt;

&lt;p&gt;I can’t help but bring up the analogy of how Trino federates heterogeneous data
while this Trino Summit will federate many of us in the community form all
corners of the world. It really brings an appreciation to the international
reach of Trino and makes me look forward to more in-person events!&lt;/p&gt;

&lt;p&gt;Trino Summit will be held at the Commonwealth Club in San Francisco, California.
Make sure you register quickly for in-person registration, as it is limited to
250 seats. Virtual registration is also picking up quickly so register today!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h3 id=&quot;get-an-autographed-copy-of-trino-the-definitive-guide-2nd-ed&quot;&gt;Get an autographed copy of Trino: The Definitive Guide, 2nd ed.&lt;/h3&gt;

&lt;p&gt;Want to meet the authors who literally wrote the book on Trino? Visit 
&lt;a href=&quot;https://twitter.com/simpligility&quot;&gt;Manfred Moser&lt;/a&gt;,
&lt;a href=&quot;https://twitter.com/mfullertweets&quot;&gt;Matt Fuller&lt;/a&gt;, and
&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt; at the Trino booth during the
conference. Bring your hard copy of &lt;a href=&quot;/blog/2022/10/03/the-definitive-guide-2.html&quot;&gt;&lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt;&lt;/a&gt; to get it signed by the authors!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/ttdg2-cover.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Don’t have a book? We’ll be giving away autographed copied of the book
throughout the conference!&lt;/p&gt;

&lt;h3 id=&quot;trino-summit-2022-teaser&quot;&gt;Trino Summit 2022 teaser&lt;/h3&gt;

&lt;p&gt;Check out the teaser for this year’s Trino Summit and get ready to &lt;strong&gt;&lt;em&gt;Federate ‘em
all&lt;/em&gt;&lt;/strong&gt;!&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h2 id=&quot;announcing-the-second-round-of-sessions-and-speakers&quot;&gt;Announcing the second round of sessions and speakers&lt;/h2&gt;

&lt;p&gt;As mentioned in the &lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;previous summit teaser&lt;/a&gt;, we announced some of our exciting
lineup of speakers! The topics range from architectures like data mesh and data
lakehouse, to running Trino at scale with fault-tolerant execution, and of
course, query federation.&lt;/p&gt;

&lt;p&gt;We have a full roster planned, but check out the next round of fully confirmed
sessions. Stay tuned for one more blog post as we announce the final sessions in
our agenda as they are confirmed!&lt;/p&gt;

&lt;h3 id=&quot;sk-telecoms-journey-to-iceberg&quot;&gt;SK Telecom’s journey to Iceberg&lt;/h3&gt;

&lt;p&gt;SK Group is one South Korea’s largest conglomerates in the nation covering
industries from manufacturing to telecommunications. SK Telecom uses an
on-premise data platform at petabyte scale using Trino as a query engine. We
chose Trino for its ability to connect to heterogeneous data sources and ensures
fast performance that plays a key role in our data platform.&lt;/p&gt;

&lt;p&gt;As data along with user demands to analyze long-term data increased, the Trino
Hive connector faced several challenges. Queries with an input data size 
exceeding a terabyte put a great burden on the cluster. This caused many jobs to
fail which can be problematic as Trino’s resource sharing architecture affects
multiple users when a heavy query occurs.&lt;/p&gt;

&lt;p&gt;To address this situation, we optimized the data structure, tuned queries, and
used the resource group to isolate queries, but none of this fixed the problem.
We investigated Apache Iceberg and realized it could address some of these
scaling issues we were facing. In this talk, we will share our journey.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;JaeChang Song, Data Engineer at SKTelecom and Trino/Iceberg Contributor&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Jennifer OH, Data Engineer at SKTelecom&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;elevating-data-fabric-to-data-mesh-solving-data-needs-in-hybrid-data-lakes&quot;&gt;Elevating Data Fabric to Data Mesh: solving data needs in hybrid data lakes&lt;/h3&gt;

&lt;p&gt;At Comcast, we have long had a complex hybrid data lakes that consists of
data lakes in on-prem and multiple cloud environments. Comcast uses Trino to
bridge the data in these environments using an architecture we call Data Fabric.
Data Fabric is an abstraction layer that uses an internally built connector that
connects to multiple instances of Trino. This enables us to query across all
of these environments from a single Trino instance.&lt;/p&gt;

&lt;p&gt;In recent years, emerging architectures like Data Mesh have nicely complemented
the goals we have been building to for years. While we have effectively 
implemented some aspects of a Data Mesh, there are still core tenants that
cannot be addressed by Trino alone. This is the journey we are on at Comcast,
and we like to share our experience so far, challenges we overcame, and
the ones yet to be resolved. Data abstraction, availability, movement, and
governance are the various topics we will touch upon in this session.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Sajumon Joseph, Sr Principal Architect&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Pavan Madhineni, Sr. Manager; Product Development Engineering&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-at-quora-speed-cost-reliability-challenges-and-tips&quot;&gt;Trino at Quora: Speed, Cost, Reliability Challenges and Tips&lt;/h3&gt;

&lt;p&gt;Trino has become an essential part of Quora’s tech stack and a major component
of our A/B testing framework that powers our decision-making on the product.
Trino has brought a lot of advantages to us. However, at Quora’s scale, we face
cost, speed, and reliability challenges when operating Trino.&lt;/p&gt;

&lt;p&gt;In this session, we will talk about how we resolve the challenges. Some
approaches are: auto-scale Trino clusters, experiment with different cluster and
JVM configurations, and instance types, build checkers to detect slow workers
and inefficient queries, and set up extensive monitoring.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Yifan Pan, Software Engineer of Data Infrastructure Team at Quora; 
Administrator/Primary Owner of Trino infrastructure at Quora&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;how-we-use-trino-to-analyze-our-product-led-growth-plg-user-activation-funnel&quot;&gt;How we use Trino to analyze our Product-led Growth (PLG) user activation funnel&lt;/h3&gt;

&lt;p&gt;Being a PLG company, we must track and analyze every action our users perform
within the product to remove friction and maximize usage and satisfaction. To
understand how effectively and quickly users become educated and then active in
the product, we had to instrument the user journey from signup to the Aha moment
and beyond.&lt;/p&gt;

&lt;p&gt;There are many tools on the market that can be used to analyze user behavior,
but none met our needs. In this session you will learn how we built a data
architecture to collect, model, and enrich user behavior events to optimize
Trino query performance that accelerated our ability to understand and improve
user conversion rates.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Roy Hasson, Head of Product at Upsolver&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I hope you all are as excited as we are to finally federate the Trino community
face-to-face! This conference is shaping up to be educational, fun, and filled
with Trino experts and aficionados.&lt;/p&gt;

&lt;p&gt;Stay tuned for new developments in upcoming blog posts, don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register&lt;/a&gt;, and always, &lt;strong&gt;&lt;em&gt;Federate them
all&lt;/em&gt;&lt;/strong&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Trino has long been the de facto standard to querying large data sets over your cloud or on-prem storage, also known as data lakes. This Trino Summit’s theme instead will showcase Trino’s other claim to fame: query federation. Trino is a query engine providing an access point that exposes ANSI SQL across multiple data sources. I urge you to join us either in-person or virtually if you are a fan of Trino, big data, open source, data engineering, Java, or all the above! This conference is free and takes place in San Francisco, California on November 10th.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Release of the second edition of Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2022/10/03/the-definitive-guide-2.html" rel="alternate" type="text/html" title="Release of the second edition of Trino: The Definitive Guide" />
      <published>2022-10-03T00:00:00+00:00</published>
      <updated>2022-10-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/10/03/the-definitive-guide-2</id>
      <content type="html" xml:base="https://trino.io/blog/2022/10/03/the-definitive-guide-2.html">&lt;p&gt;It was time for a refresh. A little while ago in April 2021, we announced
the &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino version of our definitive guide&lt;/a&gt;. But again, Trino as a project and community
has continued to innovate and grow. Numerous smaller and larger details changed,
and the examples and resources needed to be fixed.&lt;/p&gt;

&lt;p&gt;Today, we are happy to announce that after a few months of updates, testing, and
editing, the second edition of &lt;strong&gt;Trino: The Definitive Guide&lt;/strong&gt; is available.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;get-a-free-copy-from-starburst-now&quot;&gt;&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Get a free copy from Starburst now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;The &lt;a href=&quot;https://www.oreilly.com/library/view/trino-the-definitive/9781098137229/&quot;&gt;new edition of the book from
O’Reilly&lt;/a&gt;
is available in digital formats as well as physical copies. You can find more
information about the book on &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;our permanent page about
it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The book is now updated to Trino release 392 for all filenames, installation
methods, commands, names and properties. We addressed all problems that our
readers found and reported to us as well.&lt;/p&gt;

&lt;p&gt;We updated to Java 17 usage, added more SQL statements, and added info about
&lt;a href=&quot;https://trino.io/blog/2022/09/20/python-progress.html&quot;&gt;Python tools like dbt&lt;/a&gt; and clients like Metabase. We talk about the lakehouse architecture and new
connectors like Iceberg and Delta Lake.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? Go get a copy, check out the &lt;a href=&quot;https://github.com/trinodb/trino-the-definitive-guide&quot;&gt;updated example code
repository&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/README.md&quot;&gt;give us a
star&lt;/a&gt;, provide feedback,
and contact us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Martin, and Matt&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And one last tip, join us at &lt;a href=&quot;/blog/2022/09/22/trino-summit-2022-teaser.html&quot;&gt;Trino Summit 2022&lt;/a&gt; in San Francisco in November for a chat
and maybe even a signed hardcopy of the book.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Matt Fuller</name>
        </author>
      

      <summary>It was time for a refresh. A little while ago in April 2021, we announced the Trino version of our definitive guide. But again, Trino as a project and community has continued to innovate and grow. Numerous smaller and larger details changed, and the examples and resources needed to be fixed. Today, we are happy to announce that after a few months of updates, testing, and editing, the second edition of Trino: The Definitive Guide is available. Get a free copy from Starburst now!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/ttdg2-cover.png" />
      
    </entry>
  
    <entry>
      <title>Trino Summit 2022 will be legendary</title>
      <link href="https://trino.io/blog/2022/09/22/trino-summit-2022-teaser.html" rel="alternate" type="text/html" title="Trino Summit 2022 will be legendary" />
      <published>2022-09-22T00:00:00+00:00</published>
      <updated>2022-09-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/09/22/trino-summit-2022-teaser</id>
      <content type="html" xml:base="https://trino.io/blog/2022/09/22/trino-summit-2022-teaser.html">&lt;p&gt;Commander Bun Bun is back and this year we have an exciting lineup of speakers.
Topics range from architectures like data mesh and data lakehouse, to running
Trino at scale with fault-tolerant execution, and query federation. This 
conference is free and takes place on November 10th. The summit is a hybrid
event for in-person and virtual attendance. Find out more details below!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;register-for-the-summit&quot;&gt;Register for the summit&lt;/h2&gt;

&lt;p&gt;This year’s Trino Summit will be hosted at the Commonwealth Club in San 
Francisco, CA. In-person registration is limited to 250 seats so make sure you
register quickly before spots run out!&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register now
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h3 id=&quot;trino-summit-2022-teaser&quot;&gt;Trino Summit 2022 teaser&lt;/h3&gt;

&lt;p&gt;Get ready to federate them all this year! Many times when folks think of Trino,
their first instinct is to consider the data lake use case where it replaces
Hive or other data lakehouse query engines. However, this summit will also drill
into the lesser discussed query federation use case. Federate ‘em all!&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/o2MJvRKG14M&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h2 id=&quot;announcing-the-first-sessions-and-speakers&quot;&gt;Announcing the first sessions and speakers&lt;/h2&gt;

&lt;p&gt;We have a full roster planned but here is a glance at a few full confirmed
sessions. Stay tuned for future blog posts as we announce more session as they
are confirmed!&lt;/p&gt;

&lt;h3 id=&quot;state-of-trino-keynote&quot;&gt;State of Trino keynote&lt;/h3&gt;

&lt;p&gt;Hear the latest on the state of the open source Trino project. Trino
is the award-winning MPP SQL query engine. In this session, Trino creators
discuss the latest features that have landed in the last year, the roadmap for
the year ahead, and community growth highlights.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Martin Traverso, Co-Creator of Trino and CTO, Starburst&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Dain Sundstrom, Co-Creator of Trino and CTO, Starburst&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;David Phillips, Co-Creator of Trino and CTO, Starburst&lt;/em&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-for-large-scale-etl-at-lyft&quot;&gt;Trino for large scale ETL at Lyft&lt;/h3&gt;

&lt;p&gt;At Lyft, we are processing petabytes of data daily through Trino
for various use cases. A single query can execute as long as 4 hours with
terabytes of memory reserved. There are quite many challenges to operate Trino
ETL at such a scale: how to make all queries as performant as possible with low
failures rates; how should we define clusters, routing groups and resource
groups for changing volume across a day; how to keep commitment to user SLOs
during unexpected spikes, etc.&lt;/p&gt;

&lt;p&gt;We’ll share what we’ve done with our config tunings, large query/user
identifications, autoscaling and fault tolerant features to execute Trino at
such a scale. We’ll also share our upcoming challenges and plans to move steps
further with Trino adoption across the company.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Charles Song, Senior Software Engineer at Lyft&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;rewriting-history-migrating-petabytes-of-data-to-apache-iceberg-using-trino&quot;&gt;Rewriting history: Migrating petabytes of data to Apache Iceberg using Trino&lt;/h3&gt;

&lt;p&gt;Dataset interoperability between data platform components continues to
be a difficult hurdle to overcome. This short coming often results in siloed
data and frustrated users. Although open table formats like Apache Iceberg aim
to break down these silos by providing a consistent and scalable table
abstraction, migrating your pre-existing data archive to a new format can still
be daunting. This talk will outline challenges we faced when rewriting petabytes
of Shopify’s data into Iceberg table format using the Trino engine. A rapidly
evolving landscape, I will highlight recent contributions to Trino’s Iceberg
integration that made our work possible while also illustrating how we designed
our system to scale. Topics will include: what to consider when designing your
migration strategy, how we optimized Trino’s write performance and how to
recover from corrupt table states. Finally, I will compare the query performance
of old and migrated datasets using Shopify’s datasets as benchmarks.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Marc Laforet, Senior Data Engineer at Shopify&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;federating-them-all-on-starburst-galaxy&quot;&gt;Federating them all on Starburst Galaxy!&lt;/h3&gt;

&lt;p&gt;You’ve federated them all on Trino, but to beat the elite four at
Indigo Plateau, every data trainer needs help. In this talk, I will cover how
Starburst Galaxy is the fastest path to query federation and cover a demo that
trainers can follow later. We’ll also cover cool features like schema discovery
and fault-tolerance execution. The queries we’ll run will be with Pokémon data
so that you don’t have to witness yet another taxi cab or iris data set.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Monica Miller, Developer Advocate at Starburst*&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;using-trino-with-apache-airflow-for-almost-all-your-data-problems&quot;&gt;Using Trino with Apache Airflow for (almost) all your data problems&lt;/h3&gt;

&lt;p&gt;Trino is incredibly effective at enabling users to extract insights
quickly and effectively from large amount of data located in dispersed and
heterogeneous federated data systems. However, some business data problems are
more complex than interactive analytics use cases, and are best broken down into
a sequence of interdependent steps, a.k.a. a workflow. For these use cases,
dedicated software is often required in order to schedule and manage these
processes with a principled approach. In this session, we will look at how we
can leverage Apache Airflow to orchestrate Trino queries into complex workflows
that solve practical batch processing problems, all the while avoiding the use
of repetitive, redundant data movement.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Philippe Gagnon, Solutions Architect at Astronomer&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Stay tuned for new developments in upcoming blog posts, don’t forget to
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register&lt;/a&gt;, and always, federate them
all!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Dain Sundstrom</name>
        </author>
      

      <summary>Commander Bun Bun is back and this year we have an exciting lineup of speakers. Topics range from architectures like data mesh and data lakehouse, to running Trino at scale with fault-tolerant execution, and query federation. This conference is free and takes place on November 10th. The summit is a hybrid event for in-person and virtual attendance. Find out more details below!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-summit-2022/summit-logo.png" />
      
    </entry>
  
    <entry>
      <title>Trino charms Python</title>
      <link href="https://trino.io/blog/2022/09/20/python-progress.html" rel="alternate" type="text/html" title="Trino charms Python" />
      <published>2022-09-20T00:00:00+00:00</published>
      <updated>2022-09-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/09/20/python-progress</id>
      <content type="html" xml:base="https://trino.io/blog/2022/09/20/python-progress.html">&lt;p&gt;Wow, have we ever come a long way with Python support for Trino. It feels like
ages ago that we talked about DB-API, trino-python-client, SQLAlchemy, Apache
Superset, and more in &lt;a href=&quot;https://trino.io/episodes/12.html&quot;&gt;Trino Community Broadcast episode
12&lt;/a&gt;. More recently we talked about dbt in
&lt;a href=&quot;https://trino.io/episodes/21.html&quot;&gt;episode 21&lt;/a&gt; and &lt;a href=&quot;https://trino.io/episodes/30.html&quot;&gt;episode
30&lt;/a&gt;, but there is so much more for Pythonistas,
Pythonians, Python programmers, and simply users of Python-powered tools.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;where-are-we-now&quot;&gt;Where are we now&lt;/h2&gt;

&lt;p&gt;Python usage shows up with nearly every Trino deployment these days, and we had
some really great developments for you all recent months:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.starburst.io&quot;&gt;Starburst&lt;/a&gt; has really ramped up the contributions to
the foundation of a lot of Python tools connecting to Trino. The
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt; receives
improvements regularly and is definitely a first-class client at the same
level as the JDBC driver or the CLI.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getdbt.com/&quot;&gt;dbt Labs&lt;/a&gt; and Starburst have worked hard on
launching and improving the &lt;a href=&quot;https://github.com/starburstdata/dbt-trino&quot;&gt;dbt-trino
project&lt;/a&gt; and enabling automated
data transformation flows.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://airflow.apache.org/&quot;&gt;Apache Airflow&lt;/a&gt; use cases are abound and the
&lt;a href=&quot;/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs.html&quot;&gt;integration is improving&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://superset.apache.org/&quot;&gt;Apache Superset&lt;/a&gt; and
&lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt; continue to add features and treat Trino as a
major data source and integration, and we should probably have another Trino
Community Broadcast episode to see that all in action.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://airbyte.com/&quot;&gt;Airbyte&lt;/a&gt; was &lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;demoed at Cinco de Trino&lt;/a&gt; and is &lt;a href=&quot;/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data.html&quot;&gt;widely used by companies such as
Lyft&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And of course there are well-known usages such as notebooks everywhere, on your
workstation, in your company, and out in the cloud. But is there more? There
must be!&lt;/p&gt;

&lt;h2 id=&quot;what-else-could-we-do&quot;&gt;What else could we do&lt;/h2&gt;

&lt;p&gt;All of these developments are great for our users. I want to encourage you all
to try these tools and learn how amazing they are with Trino. At the same time
it feels like there got to be even more. The Python ecosystem is so large, and
there are probably dozens of use cases we never heard about, have not
considered, or dreamed about in our wildest dreams.&lt;/p&gt;

&lt;p&gt;On the other hand I am sure there are still problems with these tools and
integrations. What is an edge case for us, might be a daily task for you. What
we consider hard and complicated, might be just what you have to deal with
anyway. And in the spirit of constant improvement, we really want to fix these
things and make it all amazing. But we need your help.&lt;/p&gt;

&lt;h2 id=&quot;let-us-know-what-you-think&quot;&gt;Let us know what you think&lt;/h2&gt;

&lt;p&gt;This is now your opportunity to tell us what need to make your Trino and Python
experience better.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://forms.gle/4bzMPZxby6E4xKm98&quot; target=&quot;_blank&quot;&gt;
        Help Trino and Python
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Trino, Python, and all the tools in the ecosystem go from strength to strength.
With your help we want to supercharge the tooling to hero levels. With your help
and input we can do it.&lt;/p&gt;

&lt;p&gt;Join us in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python-client&lt;/code&gt; on &lt;a href=&quot;https://trino.io/community.html&quot;&gt;Trino slack&lt;/a&gt;,
and don’t forget to &lt;a href=&quot;https://forms.gle/4bzMPZxby6E4xKm98&quot;&gt;answer that survey&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks, and see you at the &lt;a href=&quot;/blog/2022/06/30/trino-summit-call-for-speakers.html&quot;&gt;Trino Summit 2022&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Manfred, Brian, and Dain&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Brian Zhan, Dain Sundstrom</name>
        </author>
      

      <summary>Wow, have we ever come a long way with Python support for Trino. It feels like ages ago that we talked about DB-API, trino-python-client, SQLAlchemy, Apache Superset, and more in Trino Community Broadcast episode 12. More recently we talked about dbt in episode 21 and episode 30, but there is so much more for Pythonistas, Pythonians, Python programmers, and simply users of Python-powered tools.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/images/logos/python.png" />
      
    </entry>
  
    <entry>
      <title>Trino&apos;s tenth birthday celebration recap</title>
      <link href="https://trino.io/blog/2022/09/12/tenth-birthday-celebration-recap.html" rel="alternate" type="text/html" title="Trino&apos;s tenth birthday celebration recap" />
      <published>2022-09-12T00:00:00+00:00</published>
      <updated>2022-09-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/09/12/tenth-birthday-celebration-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/09/12/tenth-birthday-celebration-recap.html">&lt;p&gt;What an exciting month we had in August! August marked the ten-year birthday of
the Trino project. Don’t worry if you missed all the excitment as we’ve
condensed it all in this post.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;blog-posts&quot;&gt;Blog posts&lt;/h2&gt;

&lt;p&gt;We felt it necessary to chronicle the larger events that happened in the last
decade of the project through the lens of where we are today.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Why leaving Facebook/Meta was the best thing we could do for the Trino Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;A decade of query engine innovation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/08/trino-tenth-birthday.html&quot;&gt;Happy tenth birthday Trino!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We shared these posts on HackerNews and the Facebook and the query innovation 
posts both hit the front page. This resulted in one of the largest amount of 
page views on the Trino website in a given day - more than 25k views!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-tenth-birthday/hn-top.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;trino-ten-year-timeline-video&quot;&gt;Trino ten-year timeline video&lt;/h2&gt;

&lt;p&gt;Another way we celebrated was creating an epic ten-year montage video that
chronicles the incredible journey starting with the Presto project’s humble
beginnings, and how it evolved into the success that Trino is today:&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/hPD95_-bZZw&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;h2 id=&quot;birthday-celebration-with-the-creators-of-trino&quot;&gt;Birthday celebration with the creators of Trino&lt;/h2&gt;

&lt;p&gt;To cap things off last month, we hosted a meetup with the creators to reflect
on the last ten years, laugh and listen to some stories from the early days,
talk about the exciting features currently launching, and speculate on the next
ten years of Trino. Here are some highlights you missed:&lt;/p&gt;

&lt;h3 id=&quot;adding-dynamic-catalogs&quot;&gt;Adding dynamic catalogs&lt;/h3&gt;

&lt;p&gt;Dain discusses what dynamic catalogs could look like in Trino. Currently, to add
catalogs in Trino, you need to add the new catalog configuration file and then
restart Trino. With dynamic catalogs, you can add and remove these catalogs at
runtime with no restart required. There is still no guarantee of exactly when
this feature would arrive, but some of the foundations are currently being 
added. &lt;a href=&quot;https://www.youtube.com/clip/UgkxkYmwM6gmw9-GceMUb5IxqIKm0qNXt3fY&quot; target=&quot;_blank&quot;&gt; 
&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt; Dain dives into this a bit
more in this clip&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;vectorization-and-performance&quot;&gt;Vectorization and performance&lt;/h3&gt;

&lt;p&gt;As more marketing around vectorized databases has come up recently many have
asked if Trino will be following the trend. This question comes up at an
interesting time as 
&lt;a href=&quot;https://trino.io/episodes/36.html&quot;&gt;Trino now requires Java 17 to run&lt;/a&gt;. Java 17
comes with a lot of capabilities to vectorize, and while we are excited to start
looking into these capabilities, simply updating workloads to use vectorization
doesn’t pack the performance punch that many would expect it to. The answer is
more complex:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Do modern workloads benefit from vectorization? 
&lt;a href=&quot;https://www.youtube.com/clip/UgkxmPAur8thP_D-_GpCcg-sqprEAqwWdyck&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
See Martin’s answer to this&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Is there a benefit to vectorization over Java’s auto-vectorization?
&lt;a href=&quot;https://www.youtube.com/clip/Ugkx1AKbq0jQyZhOH4MKNf3LO4i9kZAmLqpJ&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Sometimes, but Dain elaborates on when&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;If not vectorization, what type of performance improvements does Trino focus on?
&lt;a href=&quot;https://www.youtube.com/clip/UgkxQwDYDS6evVJelNVjWAgrIhzg_Q-cAEyq&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Martin and Dain list some simple but impactful ones&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;The debate around query time optimization versus runtime adaption.
&lt;a href=&quot;https://www.youtube.com/clip/Ugkxt5ryTBP-EPEEo_OOcW2PKvNiJkj5n8UR&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Which should you optimize first?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;polymorphic-table-functions&quot;&gt;Polymorphic table functions&lt;/h3&gt;

&lt;p&gt;One feature that is top-of-mind for everyone in the Trino project are
&lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;polymorphic table functions&lt;/a&gt;
or simply “table functions” as Dain prefers to call them.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is a table function?
&lt;a href=&quot;https://www.youtube.com/clip/Ugkx62IKgPd_v9eGBaPUHP2hyaRkWSXh8w8h&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
David and Dain discuss standard and polymorphic table functions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Could we rewrite the &lt;a href=&quot;https://trino.io/docs/current/connector/googlesheets&quot;&gt;Google Sheets connector&lt;/a&gt;
as a table function?.
&lt;a href=&quot;https://www.youtube.com/clip/UgkxKIhplQHgEULQkSrjKs4M5w8oNdQMJaoL&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
David and Dain discuss how this would work&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Why table functions are so incredibly powerful.
&lt;a href=&quot;https://www.youtube.com/clip/UgkxQcokpdgPjiuMKMC5-3HwHvlbmZjxAvxe&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Eric and Dain talk about why PTFs are a game changer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about polymorphic table functions, check out the
recent &lt;a href=&quot;https://trino.io/episodes/38.html&quot;&gt;Trino Community Broadcast episode&lt;/a&gt; that
covers the potential of these functions in much more detail.&lt;/p&gt;

&lt;h3 id=&quot;the-early-days-of-presto-and-trino&quot;&gt;The early days of Presto and Trino&lt;/h3&gt;

&lt;p&gt;We wanted to get some insight into what the early days of the project looked
like, and how Martin, Dain, David, and Eric began the daunting task of designing
and building a distributed query engine from scratch. Some of the discussions
were interesting while others were downright hilarious. Here are some steps you
can take to write your own query engine, at least if you want to do it the way
the Trino creators did it:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Look up a bunch of research papers to see how others are doing this 📑.
  &lt;a href=&quot;https://www.youtube.com/clip/gkxGjPYZRx8rhtAndyho7AZgsM4e9wG9Jt4&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Side note: Papers tend to be highly aspirational and skip important fundamentals.
&lt;a href=&quot;https://www.youtube.com/clip/Ugkx6Hqe5iglsTgrR9hVo9U3ITi8LSxxMu4U&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
Video&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Address the real challenges of making a query engine.
  &lt;a href=&quot;https://www.youtube.com/clip/Ugkx57PezuXyRWHrxxxoLaKni6jqFZ-StwY-&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Take your initial version and just throw it away 😂🗑🚮.
  &lt;a href=&quot;https://www.youtube.com/clip/UgkxJz7zve36QJZZDdtC3S29vI-Ak1jRifAH&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Expand outside the initial use cases by learning from other companies and
  building community 👥.
  &lt;a href=&quot;https://www.youtube.com/clip/UgkxQrBl0BzOrjvwDcEN4KAAyqehcRUc1tsf&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Cause a &lt;a href=&quot;https://en.wikipedia.org/wiki/Brownout_(software_engineering)&quot;&gt;brownout&lt;/a&gt;
  on the Facebook network 📉.
  &lt;a href=&quot;https://www.youtube.com/clip/Ugkx6SyQTFgwX_kdeH018VGt2pMUbldvuKtC&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Realize the system you replaced was actually faster in some cases, but
  for all the wrong reasons ❌🙅.
  &lt;a href=&quot;https://www.youtube.com/clip/UgkxTqBY2nMAALn-OkglE5DT9dHlBuC18qf8&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;
  Video&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After a lot of the initial work was done, Presto was deployed at Facebook and
soon after open sourced. From here, we know that the velocity of the project
picked up and once the project was independent of Facebook, the features took
off even more. While everything may seem calculated in hindsight, it was a lot
of hard work to grow the community and adoption around Presto and now Trino.
The creators knew they were making a project that would be utilized outside the
walls of Facebook, but
&lt;a href=&quot;https://www.youtube.com/clip/Ugkxh2J-1bi1rUoBpuld_FAuXYZgz2bvqPPx&quot; target=&quot;_blank&quot;&gt;&lt;i class=&quot;fab fa-youtube&quot; style=&quot;color: red;&quot;&gt;&lt;/i&gt;  they could never have 
anticipated the sheer scale of adoption Trino would see&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;We hope you enjoyed all the fun we had celebrating these first ten years of the
Trino project. We are thrilled to think of what the following decades will
bring. We’d like to leave you with closing thoughts from Dain:&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/6TFLKcF24HM?clip=Ugkx5bFnjvRX0USjk8vgRJdqLwZQo7Ffg0xm&amp;amp;clipt=ELfJ2gEY8o7eAQ&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px;
margin-bottom:5px; max-width: 100%;&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>What an exciting month we had in August! August marked the ten-year birthday of the Trino project. Don’t worry if you missed all the excitment as we’ve condensed it all in this post.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-tenth-birthday/creators.jpeg" />
      
    </entry>
  
    <entry>
      <title>40: Trino&apos;s cold as Iceberg!</title>
      <link href="https://trino.io/episodes/40.html" rel="alternate" type="text/html" title="40: Trino&apos;s cold as Iceberg!" />
      <published>2022-09-08T00:00:00+00:00</published>
      <updated>2022-09-08T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/40</id>
      <content type="html" xml:base="https://trino.io/episodes/40.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/15/trino-iceberg.png&quot; /&gt;&lt;br /&gt;
Looks like Commander Bun Bun is safe on this Iceberg&lt;br /&gt;
&lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;hosts&quot;&gt;Hosts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Brian Olsen, Developer Advocate at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;@bitsondatadev&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Ryan Blue, creator of Iceberg and CEO at
 &lt;a href=&quot;https://tabular.io&quot;&gt;Tabular&lt;/a&gt; (&lt;a href=&quot;https://github.com/rdblue&quot;&gt;@rdblue&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Sam Redai, Developer Advocate at &lt;a href=&quot;https://tabular.io&quot;&gt;Tabular&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/samuelredai&quot;&gt;@samuelredai&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/tomnats&quot;&gt;Tom Nats&lt;/a&gt;, Director of Customer Solutions at &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;register-for-trino-summit-2022&quot;&gt;Register for Trino Summit 2022!&lt;/h2&gt;

&lt;p&gt;Trino Summit 2022 is coming around the corner! This &lt;strong&gt;free&lt;/strong&gt; event on November
10th will take place in-person at the Commonwealth Club in San Francisco, CA or
can also be attended remotely!  If you want to present, the
&lt;a href=&quot;https://sessionize.com/trino-summit-2022/&quot;&gt;call for speakers&lt;/a&gt; is open until
September 15th.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;You can register for the conference&lt;/a&gt;
at any time. We must limit in-person registrations to 250
attendees, so register soon if you plan on attending in person!&lt;/p&gt;

&lt;h2 id=&quot;releases-394-to-395&quot;&gt;Releases 394 to 395&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-394.html&quot;&gt;Trino 394&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;JSON output format for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; expressions.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function in BigQuery connector.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; support in BigQuery connector.&lt;/li&gt;
  &lt;li&gt;TLS support in Pinot connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-395.html&quot;&gt;Trino 395&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Better performance for large clusters.&lt;/li&gt;
  &lt;li&gt;Improved memory efficiency for aggregations and fault tolerant execution.&lt;/li&gt;
  &lt;li&gt;Faster aggregations over decimal columns.&lt;/li&gt;
  &lt;li&gt;Support for dynamic function resolution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Cole:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The improved performance of inserts on Delta Lake, Hive, and Iceberg is a huge
one. We’re not entirely sure how much it’ll matter in production use cases, but
some of the benchmarks suggested it could be massive - one test showed a 75%
reduction in query duration.&lt;/li&gt;
  &lt;li&gt;Dynamic function resolution in the SPI is going to unlock some very neat
possibilities down the line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-394.html&quot;&gt;Trino 394&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-395.html&quot;&gt;Trino 395&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-latest-features-in-apache-iceberg-and-the-iceberg-connector&quot;&gt;Concept of the week: Latest features in Apache Iceberg and the Iceberg connector&lt;/h2&gt;

&lt;p&gt;It has been over a year since we had Ryan on the Trino Community Broadcast as
guest to discuss what Apache Iceberg is and how it can be used in Trino. Since
then, the adoption of Iceberg in our community has skyrocketed. Iceberg is
delivering as a much better alternative to the Hive table format.&lt;/p&gt;

&lt;p&gt;The initial phase of the Iceberg connector in Trino aimed to provide fast and
interoperable read support. A typical usage was Trino alongside other query
engines like Apache Spark which supported many of the data modification language
(DML) SQL features on Iceberg. One of the biggest requests we got as adoption
increased was the ability to do everything through Trino. This episode dives
into some of the latest features that were missing from the early iterations of
the Iceberg connector and what has changed in Iceberg as well!&lt;/p&gt;

&lt;h3 id=&quot;what-is-apache-iceberg&quot;&gt;What is Apache Iceberg?&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Iceberg&lt;/a&gt; is a next-generation table format that
defines a standard around the metadata used to map data to a SQL query engine.
It addresses a lot of the maintainability and reliability issues many engineers
experienced with the way
&lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Hive modeled SQL tables&lt;/a&gt;
over big data files.&lt;/p&gt;

&lt;p&gt;One common confusion to point out is that table format is not equivalent to file
formats like ORC or Parquet. The table format is the layer that maintains
metadata mapping these files to the concept of a table and other common database
abstractions.&lt;/p&gt;

&lt;p&gt;This episode assumes you have some basic knowledge of Trino and Iceberg already. If
you are new to Iceberg or need a refresher, we recommend the two older episodes
about Iceberg and Trino basics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/episodes/14.html&quot;&gt;14: Iceberg: March of the Trinos&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/episodes/15.html&quot;&gt;15: Iceberg right ahead!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;why-iceberg-over-other-formats&quot;&gt;Why Iceberg over other formats?&lt;/h3&gt;

&lt;p&gt;There has been some great advancements to big data technologies that brought
back SQL and data warehouse capabilities. However, Hive and Hive-like table
formats are still missing some capabilities due to limitations that Hive tables
have, such as dropping and reintroducing stale data unintentionally. On top of
that, Hive tables require a lot of knowledge of Hive internals. Some recent
formats aim to remain backwards compatible with Hive, but inadvertently
reintroduce these limitations.&lt;/p&gt;

&lt;p&gt;This is not the case with Iceberg. Iceberg has the most support for query
engines and puts a heavy emphasis on being a format that is interoperable. This
improves the level of flexibility users have to address a wider array of use
cases that may involve querying over a system like Snowflake or a data lakehouse
running with Iceberg. All of this is made possible by the
&lt;a href=&quot;https://iceberg.apache.org/spec&quot;&gt;Iceberg specification&lt;/a&gt; that all these query
engines must follow.&lt;/p&gt;

&lt;p&gt;Finally, a great video presented by Ryan Blue that dives into Iceberg is,
“&lt;a href=&quot;https://www.youtube.com/watch?v=_GW3GYZK66U&quot;&gt;Why you shouldn’t care about Iceberg&lt;/a&gt;.”&lt;/p&gt;

&lt;h3 id=&quot;metadata-catalogs&quot;&gt;Metadata catalogs&lt;/h3&gt;

&lt;p&gt;Catalogs, in the context of Iceberg, refer to the central storage of metadata.
Catalogs are also used to provide the atomic compare-and-swap needed to support
&lt;a href=&quot;https://iceberg.apache.org/docs/latest/reliability&quot;&gt;serializable isolation in Iceberg&lt;/a&gt;.
We’ll refer to them as metadata catalogs to avoid confusion with Trino
&lt;a href=&quot;https://trino.io/docs/current/sql/show-catalogs.html&quot;&gt;catalogs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The two existing catalogs supported in Trino’s Iceberg connector are the
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#hive-metastore-catalog&quot;&gt;Hive Metastore Service&lt;/a&gt;
and the AWS metastore counterpart of the Hive Metastore, Glue. While this
provides a nice migration from the Hive model, many are looking to replace these
rather cumbersome catalogs with something that’s lightweight. It turns out that
the Iceberg connector only uses the Hive Metastore Service to point to top-level
metadata files in Iceberg while the majority of metadata exist in the metastore
files in storage. This makes it even more compelling to get rid of the complex
Hive service in favor of simpler services. Two popular catalogs outside of these
are the &lt;a href=&quot;https://iceberg.apache.org/docs/latest/jdbc&quot;&gt;JDBC catalog&lt;/a&gt; and the
&lt;a href=&quot;https://github.com/apache/iceberg/pull/4348&quot;&gt;REST catalog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are two PRs in progress to support these metadata catalogs in Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/11772&quot;&gt;Trino PR 11772: Support JDBC catalog in Iceberg connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/13294&quot;&gt;Trino PR 13294: Add Iceberg RESTSessionCatalog Implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;branching-tagging-and-auditing-oh-my&quot;&gt;Branching, tagging, and auditing, oh my!&lt;/h3&gt;

&lt;p&gt;Another feature set that is coming in Iceberg is the ability to use
&lt;a href=&quot;https://github.com/apache/iceberg/pull/5364&quot;&gt;refs to alias your snapshots&lt;/a&gt;.
This would enable branching and tagging behavior similar to git and treating
the snapshot as a commit. This is yet another way that simplifies moving between
known states of the data in Iceberg.&lt;/p&gt;

&lt;p&gt;On a related note, branching and tagging will eventually be used in the
&lt;a href=&quot;https://tabular.io/blog/integrated-audits&quot;&gt;audit integration in Iceberg&lt;/a&gt;.
Auditing allows you to push a soft commit by making a snapshot available, but
it is not initially published to the primary table. This is achieved using Spark
and setting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spark.wap.id&lt;/code&gt; configuration property. This enables interesting
patterns like
&lt;a href=&quot;https://www.dremio.com/subsurface/write-audit-publish-pattern-via-apache-iceberg/&quot;&gt;Write-Audit-Publish (WAP) pattern&lt;/a&gt;,
where you first write the data, audit it using a data quality tool like
&lt;a href=&quot;https://greatexpectations.io&quot;&gt;Great Expectations&lt;/a&gt;, and lastly publish the data
to be visible from the main table. Currently, auditing has to use the
cherry-pick operation to publish. This becomes more streamlined with branching
and tagging.&lt;/p&gt;

&lt;h3 id=&quot;the-puffin-file-format&quot;&gt;The Puffin file format&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://iceberg.apache.org/puffin-spec&quot;&gt;Puffin file format&lt;/a&gt; is an
alternative to &lt;a href=&quot;https://parquet.apache.org/&quot;&gt;Parquet&lt;/a&gt; and
&lt;a href=&quot;https://orc.apache.org/&quot;&gt;ORC&lt;/a&gt;. This format stores information such as indexes
and statistics about data managed in an Iceberg table that cannot be stored
directly within the Iceberg manifest. A Puffin file contains arbitrary pieces of
information called “blobs”, along with metadata necessary to interpret them.&lt;/p&gt;

&lt;p&gt;This format &lt;a href=&quot;https://www.mail-archive.com/dev@iceberg.apache.org/msg03593.html&quot;&gt;was proposed&lt;/a&gt;
by long-time Trino maintainer, &lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr Findeisen @findepi&lt;/a&gt;,
to address a performance issue noted when using Trino on Iceberg. The Puffin
format is a great extension for those using Iceberg tables, as it enables better
query plans in Trino at the file level.&lt;/p&gt;

&lt;h3 id=&quot;pyiceberg&quot;&gt;pyIceberg&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/apache/iceberg/tree/master/python&quot;&gt;pyIceberg library&lt;/a&gt;
is an exciting development that enables users to read their data directly from
Iceberg into their own Python code easily.&lt;/p&gt;

&lt;h3 id=&quot;trino-iceberg-connector-updates&quot;&gt;Trino Iceberg connector updates&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/merge&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;&lt;/a&gt; (&lt;a href=&quot;https://github.com/trinodb/trino/pull/7933&quot;&gt;PR&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/update&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;&lt;/a&gt; (&lt;a href=&quot;https://github.com/trinodb/trino/pull/12026&quot;&gt;PR&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/delete&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;&lt;/a&gt; (&lt;a href=&quot;https://github.com/trinodb/trino/pull/11886&quot;&gt;PR&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Time travel (&lt;a href=&quot;https://github.com/trinodb/trino/pull/10258&quot;&gt;PR&lt;/a&gt;) was initially
released in
&lt;a href=&quot;https://trino.io/docs/current/release/release-385.html#iceberg-connector&quot;&gt;version 385&lt;/a&gt;,
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt; syntax for snapshots/time travel
&lt;a href=&quot;https://github.com/trinodb/trino/pull/10768&quot;&gt;was deprecated&lt;/a&gt; in
&lt;a href=&quot;https://trino.io/docs/current/release/release-387.html#iceberg-connector&quot;&gt;version 387&lt;/a&gt;,
and there were two bug fixes for this feature in versions
&lt;a href=&quot;https://trino.io/docs/current/release/release-386.html#iceberg-connector&quot;&gt;386&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/docs/current/release/release-388.html#iceberg-connector&quot;&gt;388&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#alter-table-set-properties&quot;&gt;Partition migration&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/12259&quot;&gt;PR&lt;/a&gt;)
While Trino was able to read tables with these migrations applied by other query
engines, this feature allows Trino to write these changes.&lt;/li&gt;
  &lt;li&gt;The following three features are table maintenance commands.
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#optimize&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimize&lt;/code&gt;&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/10497&quot;&gt;PR&lt;/a&gt;) which is the equivalent to
the Spark SQL
&lt;a href=&quot;https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_data_files&quot;&gt;rewrite_data_files&lt;/a&gt;.&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#expire-snapshots&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt;&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/10810&quot;&gt;PR&lt;/a&gt;) and uses the equivalent name
in Spark.&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html#remove-orphan-files&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;remove_orphan_files&lt;/code&gt;&lt;/a&gt;
(&lt;a href=&quot;https://github.com/trinodb/trino/pull/10810&quot;&gt;PR&lt;/a&gt;) and uses the equivalent name
in Spark.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Iceberg v2 support (&lt;a href=&quot;https://github.com/trinodb/trino/pull/11880&quot;&gt;PR1&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/12351&quot;&gt;PR2&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/12749&quot;&gt;PR3&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/11642&quot;&gt;PR4&lt;/a&gt;, &lt;a href=&quot;https://github.com/trinodb/trino/pull/9881&quot;&gt;PR5&lt;/a&gt;, and many more…)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Almost every release has some sort of Iceberg improvement around
&lt;a href=&quot;https://github.com/trinodb/trino/pull/13636&quot;&gt;planning&lt;/a&gt; or
&lt;a href=&quot;https://github.com/trinodb/trino/pull/13395&quot;&gt;pushdown&lt;/a&gt;. If you want all the
latest features and performance improvements described here, it’s important to
keep up with the latest Trino version.&lt;/p&gt;

&lt;h2 id=&quot;pr-13111-scale-table-writers-per-task-based-on-throughput&quot;&gt;PR 13111: Scale table writers per task based on throughput&lt;/h2&gt;

&lt;p&gt;This &lt;a href=&quot;https://github.com/trinodb/trino/pull/13111&quot;&gt;PR of the episode&lt;/a&gt; was
contributed by Gaurav Sehgal (&lt;a href=&quot;https://github.com/gaurav8297&quot;&gt;@gaurav8297&lt;/a&gt;) to
enable Trino to automatically scale writers. This PR aims to the number of task
writers per worker.&lt;/p&gt;

&lt;p&gt;You can enable this feature by setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scale_task_writers&lt;/code&gt; true in your
configuration. Its initial test results are showing a sixfold speed increase.&lt;/p&gt;

&lt;p&gt;Thank you so much to Gaurav and all the reviewers that got this PR through!&lt;/p&gt;

&lt;h2 id=&quot;demo-dml-operations-on-iceberg-using-trino&quot;&gt;Demo: DML operations on Iceberg using Trino&lt;/h2&gt;

&lt;p&gt;For this demo of the episode, we use the same schema as the demo we ran in
&lt;a href=&quot;https://trino.io/episodes/15.html&quot;&gt;episode 15&lt;/a&gt;, and revise the syntax to
include new features.&lt;/p&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone the
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git
cd iceberg/trino-iceberg-minio
docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now open up your favorite Trino client and connect it to
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;localhost:8080&lt;/code&gt; to run the following commands:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * Make sure to first create a bucket names &quot;logging&quot; in MinIO before running
 */
CREATE SCHEMA iceberg.logging
WITH (location = &apos;s3a://logging/&apos;);

/**
 * Create table
 */
CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format_version = 2, -- New property to specify Iceberg spec format. Default 2
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;day(event_time)&apos;,&apos;level&apos;]
);

/**
 * Inserting two records. Notice event_time is on the same day but different hours.
 */

INSERT INTO iceberg.logging.logs VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 12:23:53.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;1 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 13:36:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;2 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Notice one partition was created for both records at the day granularity.
 */

/**
 * Update the partitioning from daily to hourly 🎉
 */
ALTER TABLE iceberg.logging.logs
SET PROPERTIES partitioning = ARRAY[&apos;hour(event_time)&apos;];

/**
 * Inserting three records. Notice event_time is on the same day but different hours.
 */
INSERT INTO iceberg.logging.logs VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;3 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;WARN&apos;,
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;4 message&apos;,
  ARRAY [&apos;bad things could be happening&apos;]
),
(
  &apos;WARN&apos;,
  timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;,
  &apos;5 message&apos;,
  ARRAY [&apos;bad things could be happening&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Now there are three partitions:
 * 1) One partition at the day granularity containing our original records.
 * 2) One at the hour granularity for hour 15 containing two new records.
 * 3) One at the hour granularity for hour 16 containing the last new record.
 */

SELECT * FROM iceberg.logging.logs
WHERE event_time &amp;lt; timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;;

/**
 * This query correctly returns 4 records with only the first two partitions
 * being touched. Now let&apos;s check the snapshots.
 */


SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Update
 */
UPDATE
  iceberg.logging.logs
SET
  call_stack = call_stack || &apos;WHALE HELLO THERE!&apos;
WHERE
  lower(level) = &apos;warn&apos;;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Read data from an old snapshot (Time travel)
 *
 * Old way: SELECT * FROM iceberg.logging.&quot;logs@2806470637437034115&quot;;
 */

SELECT * FROM iceberg.logging.logs FOR VERSION AS OF 2806470637437034115;

/**
 * Merge
 */
CREATE TABLE iceberg.logging.src (
   level varchar NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;
);

INSERT INTO iceberg.logging.src VALUES
 (
   &apos;ERROR&apos;,
   &apos;3 message&apos;,
   ARRAY [&apos;This one will not show up because it is an ERROR&apos;]
 ),
 (
   &apos;WARN&apos;,
   &apos;4 message&apos;,
   ARRAY [&apos;This should show up&apos;]
 ),
 (
   &apos;WARN&apos;,
   &apos;5 message&apos;,
   ARRAY [&apos;This should show up as well&apos;]
 );

MERGE INTO iceberg.logging.logs AS t
USING iceberg.logging.src AS s
ON s.message = t.message
WHEN MATCHED AND s.level = &apos;ERROR&apos;
        THEN DELETE
WHEN MATCHED
    THEN UPDATE
        SET message = s.message || &apos;-updated&apos;,
            call_stack = s.call_stack || t.call_stack;

DROP TABLE iceberg.logging.logs;

DROP SCHEMA iceberg.logging;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is just the tip of the iceberg that shows the powerful &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statement
and the other features we have added to Iceberg!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://tabular.io&quot;&gt;Tabular&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/community&quot;&gt;Iceberg Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/talks&quot;&gt;Iceberg Talks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/blogs&quot;&gt;Iceberg Blogs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blog posts&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Looks like Commander Bun Bun is safe on this Iceberg https://joshdata.me/iceberger.html</summary>

      
      
    </entry>
  
    <entry>
      <title>Make your Trino data pipelines production ready with Great Expectations</title>
      <link href="https://trino.io/blog/2022/08/24/data-pipelines-production-ready-great-expectations.html" rel="alternate" type="text/html" title="Make your Trino data pipelines production ready with Great Expectations" />
      <published>2022-08-24T00:00:00+00:00</published>
      <updated>2022-08-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/24/data-pipelines-production-ready-great-expectations</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/24/data-pipelines-production-ready-great-expectations.html">&lt;p&gt;An important aspect of a good data pipeline is ensuring data quality. 
You need to verify that the data is what you’re expecting it to be at any given
state. &lt;a href=&quot;https://greatexpectations.io/&quot;&gt;Great Expectations&lt;/a&gt; is an open source
tool created in Python that allows you to write detailed tests called
&lt;a href=&quot;https://docs.greatexpectations.io/docs/terms/expectation/&quot;&gt;expectations&lt;/a&gt;
against your data. Users write these expectations to run validations against the
data as it enters your system. These expectations are expressed as methods in
Python, and stored in JSON and YAML files. One great advantage of expectations 
is the human readable documentation that results from these tests. As you roll
out different versions of the code, you get alerted to any unexpected changes
and have version-specific generated documentation for what changed. Let’s learn
how to write expectations on tables in Trino!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;the-need-for-data-quality&quot;&gt;The need for data quality&lt;/h2&gt;

&lt;p&gt;Managing data pipelines is not for the faint of heart. Nodes fail, you run
out of memory, bursty traffic causes abnormal behavior, and that’s just the tip
of the iceberg. Lots of Trino community members build sophisticated
data pipelines and data applications using Trino. Building data pipelines in
Trino became more common with the addition of a
&lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;fault-tolerant execution mode&lt;/a&gt; to
safeguard against failures when executing long-running and 
resource-intensive queries.&lt;/p&gt;

&lt;p&gt;Aside from all the infrastructure problems that concern data teams, another
category of problems that have been the silent problem for quite some time is
data quality. Faulty data comes in, which can either cause data pipelines to
fail, or it can possibly go unnoticed and cause inaccurate downstream reporting. 
Knowledge is scattered among domain experts, technical experts, and the code and
data itself. Maintenance becomes time-consuming and expensive. Documentation
gets out of date and unreliable. This is why using data quality checks using
libraries like Great Expectations is so important when writing ETL applications.&lt;/p&gt;

&lt;h2 id=&quot;improve-data-quality-in-trino-with-great-expectations&quot;&gt;Improve data quality in Trino with Great Expectations&lt;/h2&gt;

&lt;p&gt;As data quality moves to the forefront of the Trino community, the Great
Expectations and Trino communities have partnered to do some events together:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=pcqAOq3O3Ts&amp;amp;list=PLFnr63che7wZij92ynF_egatbsrH7by7T&amp;amp;index=3&quot;&gt;Trino meetup to discuss Great Expectations&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=4SieRmibb0U&quot;&gt;Great Expectations meetup to discuss Trino&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://superconductive.ai/&quot;&gt;Superconductive&lt;/a&gt; joined this year’s mini Trino 
Summit event 
&lt;a href=&quot;https://www.youtube.com/watch?v=kfJ63DNbAuI&amp;amp;list=PLFnr63che7wYDHjUsmp43THLmAlqPDHlM&quot;&gt;Cinco de Trino&lt;/a&gt;
to showcase using 
&lt;a href=&quot;https://www.youtube.com/watch?v=9HE6LawCHP8&amp;amp;list=PLFnr63che7wYDHjUsmp43THLmAlqPDHlM&amp;amp;index=7&quot;&gt;managed solutions for Great Expectations and Trino&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today, we’re walking through a demo that showcases a scenario with Trino running
as the datalake query engine with multiple phases of data transformations on 
some Pokemon data sets. At each phase, we need to validate that the data is in
the correct schema, counts, and various other factors to validate. We use Trino
with Hive table with CSV for ingest and then move to Iceberg table for the
structure and consume tables. This is one of the great uses of Trino in that you
can operate using any of the popular table formats.&lt;/p&gt;

&lt;h2 id=&quot;trino-and-great-expectations-demo&quot;&gt;Trino and Great Expectations demo&lt;/h2&gt;

&lt;p&gt;In this scenario, we’re going to ingest Pokemon pokedex data and Pokemon Go 
spawn location data which lands as raw CSV files in our data lake. We then use
Trino’s Hive catalog to read the data from the landing files, clean up, and 
optimize that raw data into more performant ORC files in the structure tables.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/data-pipelines-production-ready-great-expectations/trino-ge-lakehouse.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The last step is to join and transform the spawn data and pokedex data into a
single table that is cleaned and ready to be utilized by a data analyst, data
scientist, or other data consumer. Every area of the pipeline where the data is
transformed opens up a liability. The state can go from good to bad when
infrastructure fails or is updated as newer versions of the pipeline roll out.
This is where adding Great Expectations is crucial.&lt;/p&gt;

&lt;p&gt;Now that you have a better understanding of the scenario, feel free to watch the
video, and try running it yourself!&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/h6UYOilESfQ&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/bitsondatadev/trino-datalake/blob/main/tutorials/expecting-greatness-from-trino.md&quot;&gt;Try this Trino demo yourself »&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;While data quality has always been a requirement, the standards for it increase
as the complexity of data lakes increase. It is a necessity that improves the
trust that data consumers have in the data. Dive into the 
&lt;a href=&quot;https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/database/trino/&quot;&gt;Great Expectations documentation&lt;/a&gt;
to learn more about the existing Trino support. If you run into any issues while
running the demo, reach out on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; and let us 
know!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Brian Zhan</name>
        </author>
      

      <summary>An important aspect of a good data pipeline is ensuring data quality. You need to verify that the data is what you’re expecting it to be at any given state. Great Expectations is an open source tool created in Python that allows you to write detailed tests called expectations against your data. Users write these expectations to run validations against the data as it enters your system. These expectations are expressed as methods in Python, and stored in JSON and YAML files. One great advantage of expectations is the human readable documentation that results from these tests. As you roll out different versions of the code, you get alerted to any unexpected changes and have version-specific generated documentation for what changed. Let’s learn how to write expectations on tables in Trino!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/data-pipelines-production-ready-great-expectations/trino-ge.png" />
      
    </entry>
  
    <entry>
      <title>39: Raft floats on Trino to federate silos</title>
      <link href="https://trino.io/episodes/39.html" rel="alternate" type="text/html" title="39: Raft floats on Trino to federate silos" />
      <published>2022-08-18T00:00:00+00:00</published>
      <updated>2022-08-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/39</id>
      <content type="html" xml:base="https://trino.io/episodes/39.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode, we are talking to two engineers from 
&lt;a href=&quot;https://goraft.tech/&quot;&gt;Raft&lt;/a&gt; and discuss how they use Trino to connect data
silos that exist across different departments in various government sectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/edwardwmorgan/&quot;&gt;Edward Morgan&lt;/a&gt;, 
Senior Platform Engineer/DevSecOps Manager at Raft&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/steve-morgan-b9bb6642/&quot;&gt;Steve Morgan&lt;/a&gt;, Chief
Data Engineer at Raft&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;register-for-trino-summit-2022&quot;&gt;Register for Trino Summit 2022!&lt;/h2&gt;

&lt;p&gt;Trino Summit 2022 is coming around the corner! This will be a hybrid event on 
November 10th that will take place in-person at the Commonwealth Club in San 
Francisco, CA and can also be attended remotely!  If you want to present, the 
&lt;a href=&quot;https://sessionize.com/trino-summit-2022/&quot;&gt;call for speakers&lt;/a&gt; is open until
September 15th.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;You can register for the conference&lt;/a&gt;
at any time. We must limit in-person registrations to 250 
attendees, so register soon if you plan on attending in person!&lt;/p&gt;

&lt;h2 id=&quot;releases-392-to-393&quot;&gt;Releases 392 to 393&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-392.html&quot;&gt;Trino 392&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for dynamic filtering with fault-tolerant query execution.&lt;/li&gt;
  &lt;li&gt;Support for correlated subqueries in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Support for Amazon S3 Select pushdown for JSON files.&lt;/li&gt;
  &lt;li&gt;Support for Avro format in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Faster queries when filtering by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__time&lt;/code&gt; column in Druid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-393.html&quot;&gt;Trino 393&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance of highly selective &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Experimental docker image for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ppc64le&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Dynamic filtering support for various connectors.&lt;/li&gt;
  &lt;li&gt;Support for JSON and bytes type in Pinot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lots of other improvements on Delta Lake, Hive, and Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;Merge support in a bunch of connectors.&lt;/li&gt;
  &lt;li&gt;OAuth 2.0 refresh token fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-392.html&quot;&gt;Trino 392&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-393.html&quot;&gt;Trino 393&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-trino-at-raft&quot;&gt;Concept of the episode: Trino at Raft&lt;/h2&gt;

&lt;p&gt;Raft provides consulting services and is particularly skilled at DevSecOps. One
particular challenge they face is dealing with fragmented government
infrastructure. In this episode, we dive in to learn how Trino enables Raft to
supply government sector clients with a data fabric solution. Raft takes a
special stance on using and contributing to open source solutions that run well
on the cloud.&lt;/p&gt;

&lt;h3 id=&quot;intro-to-software-factories&quot;&gt;Intro to software factories&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;p&gt;A “software factory” is an organized approach to software development that
provides software design and development teams a repeatable, well-defined path
to create and update software. It results in a robust, compliant, and more
resilient process for delivering applications to production” 
– &lt;a href=&quot;https://tanzu.vmware.com/software-factory&quot;&gt;VMWare&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a push against the previous attempts from larger government contractors
who tried to build one-size-fits-all solutions that ultimately failed. The new
wave of government solutions relies on methodologies similar to the software
industry that append more rules and standards around technologies they can adopt
in the stack.&lt;/p&gt;

&lt;p&gt;Software factories are now a common practice for government agencies to use, as
they are able to take standardized software stacks that go through rigorous
validation to make sure the meet the standards of the government. One important
element to these stacks are that they can be deployed in virtually any
environment. A common way to do this is using Kubernetes and containers.&lt;/p&gt;

&lt;h3 id=&quot;standards-and-anatomy-of-a-stack&quot;&gt;Standards and anatomy of a stack&lt;/h3&gt;

&lt;p&gt;With the movement towards standardization, government contractors will generally
build their stack using Kubernetes templates. Kubernetes underpins each of these
stacks while telemetry, monitoring, and policy agents are layered on after that.
For Raft, they wanted to provide a “single pane of glass” over the existing
fragmented systems that the Department of Defense (DoD) operates on. They began
to develop a stack that included Trino as their method to connect data over
various silos.&lt;/p&gt;

&lt;h3 id=&quot;data-fabric-at-raft&quot;&gt;Data Fabric at Raft&lt;/h3&gt;

&lt;p&gt;Data Fabric is an attempt to provide government agencies the ability to set up
a data mesh that is backed by Trino. Trino fits well in this narrative as it
provides SQL-over-everything. Data analysts and data scientists only need to
know SQL.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Data Fabric MVP is an end-to-end DataOps capability that can be deployed at the
edge, in the cloud, and in disconnected environments within minutes. It provides
a single control plane for normalizing and combining disparate data lakes, 
platforms, silos, and formats into SQL using Trino for batch data and Apache 
Pinot for user facing streaming analytics.&lt;/p&gt;

  &lt;p&gt;Data Fabric is driven by cloud native policy using Open Policy Agent (OPA) 
integrated with Trino and Kafka to provide row and column level obfuscation. It
provides enterprise data catalog to view data lineage, properties, and data
owners from multiple data platforms. – &lt;a href=&quot;https://datafabric.goraft.tech/&quot;&gt;Raft&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;security-concerns-around-trino&quot;&gt;Security concerns around Trino&lt;/h3&gt;

&lt;p&gt;A common first question the Raft team gets asked is around Trino being a high
security concern. The idea that Trino can connect to multiple data sources from
one location brings up fear that individuals may gain access to information at
a higher classification level than they have. The team has to educate the
different users on the best practices and how to ensure this problem doesn’t
occur. You will need a separate deployment of Data Fabric for each 
classification level and correctly identify policies in OPA that restrict
visibility to information above a users’ clearance.&lt;/p&gt;

&lt;h3 id=&quot;iron-bank-container-repository&quot;&gt;Iron Bank container repository&lt;/h3&gt;

&lt;p&gt;Iron Bank is a central repository of digitally-signed container images, 
including open-source and commercial off-the-shelf software, hardened to the 
DoD’s exacting specifications. Approved containers in Iron Bank have DoD-wide 
reciprocity across all classifications, accelerating the security approval 
process from months or even years down to weeks.&lt;/p&gt;

&lt;p&gt;To be considered for inclusion into Iron Bank, container images must meet
rigorous DoD software security standards. It is an extensive, continuous,
complicated effort for even the most sophisticated IT teams. Continuously
maintaining and managing hardening pipelines while incorporating evolving DoD
specifications and addressing new vulnerabilities (CVEs) can severely stretch
your resources, even if you have advanced tooling and experience in-house. 
(&lt;a href=&quot;https://oteemo.com/accelerate-to-iron-bank/&quot;&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;The Trino Docker image 
&lt;a href=&quot;https://repo1.dso.mil/dsop?filter=trino&quot;&gt;is available in Iron Bank&lt;/a&gt; and is
maintained by folks at &lt;a href=&quot;https://www.boozallen.com/&quot;&gt;Booz Allen Hamilton&lt;/a&gt;. Their
hard work makes it possible for Trino to be deployed in DoD environments.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-pr-13354-add-s3-select-pushdown-for-json-files&quot;&gt;Pull requests of the episode: PR 13354: Add S3 Select pushdown for JSON files&lt;/h2&gt;

&lt;p&gt;This &lt;a href=&quot;https://github.com/trinodb/trino/pull/13354&quot;&gt;PR of the episode&lt;/a&gt; was 
contributed by &lt;a href=&quot;https://github.com/preethiratnam&quot;&gt;preethiratnam&lt;/a&gt;. This pull
request enables S3 pushdown during a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; operation for JSON files. The 
pushdown logic is restricted to only root JSON fields, similar to CSV. S3 select
does support nested column filtering on JSON files, which is planned for another
PR at a later time to limit the scope.&lt;/p&gt;

&lt;p&gt;It’s already expensive enough to query JSON files, as you pay a hefty penalty
for deserialization. This at least filters out a lot of rows. Thanks to 
&lt;a href=&quot;https://github.com/arhimondr&quot;&gt;Andrii Rosa &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;arhimondr&lt;/code&gt;&lt;/a&gt; for the review.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-running-great-expectations-on-a-trino-data-lakehouse-tutorial&quot;&gt;Demo of the episode: Running Great Expectations on a Trino Data Lakehouse Tutorial&lt;/h2&gt;

&lt;p&gt;For this episode’s demo, you’ll need a local Trino coordinator, MinIO instance, 
Hive metastore, and an edge node where various data libraries like Great
Expectations can run. Clone the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-datalake&quot;&gt;trino-datalake&lt;/a&gt; 
repository and navigate to the root directory in your cli. Then 
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-datalake.git

cd trino-datalake

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The rest of the demo is available in 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-datalake/blob/main/tutorials/expecting-greatness-from-trino.md&quot;&gt;this markdown tutorial&lt;/a&gt;
and is covered in the video demo below.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/h6UYOilESfQ&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-episode-how-can-i-deploy-trino-on-kubernetes-without-using-helm-charts&quot;&gt;Question of the episode: How can I deploy Trino on Kubernetes without using Helm charts?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/C0305TQ05KL/p1660685654979289&quot;&gt;Full question from Trino Slack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This user was not able to use Helm, due to some restriction in his company. They
needed the raw kubernetes yaml files to deploy Trino.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; While there are very nice ways that Helm offers to directly deploy to
a service that understands Helm charts, you can also use Helm on your machine to
generate all the kubernetes yaml configuration files. This can be done using the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;helm template&lt;/code&gt; command. See more on this from the 
&lt;a href=&quot;https://trino.io/episodes/31.html&quot;&gt;Trinetes episode&lt;/a&gt; that details this command.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://datafabric.goraft.tech/&quot;&gt;Raft Data Fabric&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/raft_tech&quot;&gt;Raft Twitter&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/company/raft-tech/&quot;&gt;Raft LinkedIn&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://boards.greenhouse.io/raft&quot;&gt;Raft Jobs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://goraft.tech/2022/08/15/trino-sql-everything.html&quot;&gt;Trino - SQL to rule them all&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.airforce-technology.com/news/raft-wins-usaf-sbir-phase-iii-contract/&quot;&gt;Raft wins USAF SBIR Phase III contract for data centralisation services&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://goraft.tech/blog/&quot;&gt;Raft Blog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>Happy tenth birthday Trino!</title>
      <link href="https://trino.io/blog/2022/08/08/trino-tenth-birthday.html" rel="alternate" type="text/html" title="Happy tenth birthday Trino!" />
      <published>2022-08-08T00:00:00+00:00</published>
      <updated>2022-08-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/08/trino-tenth-birthday</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/08/trino-tenth-birthday.html">&lt;p&gt;It’s inspiring and mindblowing to reflect on the ten year journey that has
produced the community around Trino. Trino is the community-driven fork from
Presto, the distributed big data SQL query engine created at Facebook in 2012. We
are a community of engineers, scientists, analysts, and visionaries that work in
a fast paced world where the expectations on the time to insights from our
analytics and the scale of the data are ever-increasing. Sometimes words only do
so much justice to encompass a journey like this one, so we created a video to
let you experience it yourself! Enjoy!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;trinos-first-ten-years-video&quot;&gt;Trino’s first ten years video&lt;/h1&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/hPD95_-bZZw&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;

&lt;p&gt;As we watch the video and think back to the five years Presto and Trino shared,
you begin to appreciate the organic development of the community, and the
excitement around the solution space that the project brought to big data. As a
baseline, Trino offers a faster and more interactive alternative to accessing
data stored in HDFS via Hive. But the project didn’t stop there. Development of
the SPI abstracted metadata and storage access to different
systems, making Trino a suitable engine to query an entire data ecosystem from
one location using ANSI SQL! Since the projects split, Trino has skyrocketed in
development from the original project and added an array of features that
we’ve listed out in the &lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;evolution of the Trino architecture blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-tenth-birthday/trajectory.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To really celebrate this milestone, we wanted to offer some exciting ways for
you to learn more about Trino, and spin up Trino on your own system to play
around with it. We have a list of blogs, project stats, and ways to get involved
below. Starburst is also celebrating by offering free Trino birthday t-shirts
when you 
&lt;a href=&quot;https://www.starburst.io/sweepstakes/?utm_campaign=space-quest&quot;&gt;complete their Space Quest League mission&lt;/a&gt;.
Also don’t forget to attend 
&lt;a href=&quot;/blog/2022/06/30/trino-summit-call-for-speakers.html&quot;&gt;our annual Trino Summit in November&lt;/a&gt;!&lt;/p&gt;

&lt;h1 id=&quot;learn-more-about-trino&quot;&gt;Learn more about Trino&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/p/a5a1088d3114&quot;&gt;Intro to Trino for the Trinewbie&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;Why leaving Facebook/Meta was the best thing we could do for the Trino Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2022/08/04/decade-innovation.html&quot;&gt;A decade of query engine innovation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;We’re rebranding PrestoSQL to Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/01/01/2019-summary.html&quot;&gt;Summary of features in 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/01/08/2020-review.html&quot;&gt;Summary of features in 2020&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/12/31/trino-2021-a-year-of-growth.html&quot;&gt;Summary of features in 2021&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;getting-started-with-trino&quot;&gt;Getting started with Trino&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;A gentle introduction to the Hive connector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/07/04/cbo-introduction.html&quot;&gt;Introduction to the Trino cost-based optimizer&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started&quot;&gt;Trino getting started repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;community-statistics&quot;&gt;Community statistics&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;28250+ commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;5750+ stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;7350+ members 👋 in Slack&lt;/li&gt;
  &lt;li&gt;6950+ pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;4000+ issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;3750+ followers 🐦 on Twitter&lt;/li&gt;
  &lt;li&gt;650+ average weekly members 💬 in Slack&lt;/li&gt;
  &lt;li&gt;1050+ subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;38 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;264 Presto + Trino 🚀 releases (not including PrestoDB releases since the 
fork)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;join-our-community&quot;&gt;Join our community&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Join the &lt;a href=&quot;/slack.html&quot;&gt;Trino Slack workspace&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Watch the &lt;a href=&quot;/broadcast/&quot;&gt;Trino Community Broadcast&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Subscribe to the &lt;a href=&quot;https://www.youtube.com/c/trinodb&quot;&gt;Trino YouTube channel&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Follow us on the &lt;a href=&quot;https://twitter.com/trinodb&quot;&gt;trinodb Twitter account&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Give us a star on the &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;Trino GitHub repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Follow us on the &lt;a href=&quot;https://www.linkedin.com/company/trino-software-foundation&quot;&gt;Trino LinkedIn account&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;trino-summit-2022&quot;&gt;Trino Summit 2022&lt;/h1&gt;

&lt;p&gt;We hope you all join us in celebrating Trino’s birthday today. If you want to 
learn even more, 
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;sign up for our hybrid event, Trino Summit, on the 10th of November 2022&lt;/a&gt;.
If you have a talk you’d like to give around Trino, the 
&lt;a href=&quot;https://www.starburst.io/info/trinosummit/#sponsors&quot;&gt;call for speakers&lt;/a&gt; is open
until September 15th.&lt;/p&gt;

&lt;p&gt;Join our community. We look forward to having you!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Martin Traverso, Dain Sundstrom, David Phillips, Eric Hwang</name>
        </author>
      

      <summary>It’s inspiring and mindblowing to reflect on the ten year journey that has produced the community around Trino. Trino is the community-driven fork from Presto, the distributed big data SQL query engine created at Facebook in 2012. We are a community of engineers, scientists, analysts, and visionaries that work in a fast paced world where the expectations on the time to insights from our analytics and the scale of the data are ever-increasing. Sometimes words only do so much justice to encompass a journey like this one, so we created a video to let you experience it yourself! Enjoy!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/trino-tenth-birthday/how-it-started-going.png" />
      
    </entry>
  
    <entry>
      <title>A decade of query engine innovation</title>
      <link href="https://trino.io/blog/2022/08/04/decade-innovation.html" rel="alternate" type="text/html" title="A decade of query engine innovation" />
      <published>2022-08-04T00:00:00+00:00</published>
      <updated>2022-08-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/04/decade-innovation</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/04/decade-innovation.html">&lt;p&gt;It’s amazing how far we have come! Our massively-parallel processing SQL query
engine, Trino, has really grown up. We have moved beyond just querying object
stores using Hive, beyond just one company using the project, beyond usage in
Silicon Valley, beyond simple SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; statements, and definitely also
beyond our expectations. Let’s have a look at some of the great technical and
architectural changes the project underwent, and how we all benefit from the
&lt;a href=&quot;/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html&quot;&gt;commitment to quality, openness and collaboration&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;runtime-and-deployment&quot;&gt;Runtime and deployment&lt;/h2&gt;

&lt;p&gt;Starting with how you even run Trino and install it, numerous changes came about
in the last decade. We moved from Java 7 to Java 8, then to Java 11, and &lt;a href=&quot;/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;only
recently to the latest supported Java LTS release - Java 17&lt;/a&gt;. Each time we
benefited from the innovations in the runtime performance as well as the
improved Java language features. With &lt;strong&gt;Java 17&lt;/strong&gt;, we are just about to start a lot
of these improvements.&lt;/p&gt;

&lt;p&gt;When it comes to actually &lt;a href=&quot;https://trino.io/episodes/35.html&quot;&gt;running and deploying
Trino&lt;/a&gt;, the &lt;strong&gt;tarball&lt;/strong&gt; is still a good choice
for simple installation and as a base for other packages. Over time we added
&lt;strong&gt;RPM&lt;/strong&gt; archive support, which is being replaced more and more by Docker
&lt;strong&gt;containers&lt;/strong&gt;. The container images also enable modern deployment on Kubernetes
with &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;our Helm chart&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And let us add one last note about deployments. Trino was always designed to
work on large servers. However the actual growth in a decade in the real world
has amazing to see. Machine sizes keep growing to hundreds of CPU cores and
closer to a terabyte of memory, and these truly large machines are now running
as clusters with many workers of that size. And more and more of these
deployments take advantage of our added support for the &lt;strong&gt;ARM processor
architecture&lt;/strong&gt; and the increasing availability of suitable servers from the
cloud providers.&lt;/p&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;p&gt;What is security, authentication, authorization? In the beginning none of this
existed in the first releases of Trino. Two years after launch we added first
simple authentication and authorization support. Today the days when Kerberos
was critical, and you needed to use the Java KeyStore in most deployments are
long gone. The wide adoption of Trino led to improvements such as support for
&lt;a href=&quot;https://trino.io/docs/current/security/internal-communication.html&quot;&gt;automatic certificate creation and TLS for internal
communication&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/security/secrets.html&quot;&gt;secret injection from environment
variables&lt;/a&gt;, and the many
&lt;a href=&quot;https://trino.io/docs/current/security/authentication-types.html&quot;&gt;authentication
types&lt;/a&gt;
starting with LDAP and password file, to the modern OAuth2.0 and SSO systems.
Trino supports fine-grained access control and &lt;a href=&quot;https://trino.io/docs/current/language/sql-support.html#security-operations&quot;&gt;security management SQL commands
like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GRANT&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REVOKE&lt;/code&gt;&lt;/a&gt;.
You can secure connections from client tools, and use numerous methods to ensure
secured access to your data sources.&lt;/p&gt;

&lt;h2 id=&quot;client-tools-and-integrations&quot;&gt;Client tools and integrations&lt;/h2&gt;

&lt;p&gt;In the very beginning all you could do is submit a query to the &lt;a href=&quot;https://trino.io/docs/current/develop/client-protocol.html&quot;&gt;client REST
API&lt;/a&gt;. Very quickly
we added the &lt;a href=&quot;https://trino.io/docs/current/installation/cli.html&quot;&gt;Trino CLI&lt;/a&gt;
and the &lt;a href=&quot;https://trino.io/docs/current/installation/jdbc.html&quot;&gt;JDBC driver&lt;/a&gt;. And
while it has continued to be widely used in the community, and gathered great
features such as command-completion and history, different output formats, and
much more, the Trino CLI is not the only tool anymore. The JDBC driver, the
&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Python client&lt;/a&gt;, the &lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;Go
client&lt;/a&gt;, and the ODBC driver from
&lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt;, all expanded the support for different
client tools. You can query Trino in your Java-based IDE, such as IntelliJ
IDEA, or database tool, such as &lt;a href=&quot;https://dbeaver.io/&quot;&gt;DBeaver&lt;/a&gt; or
&lt;a href=&quot;https://www.metabase.com/&quot;&gt;Metabase&lt;/a&gt;. You can take advantage of visualizations
in &lt;a href=&quot;https://superset.apache.org/&quot;&gt;Apache Superset&lt;/a&gt;, or automate with &lt;a href=&quot;https://airflow.apache.org/&quot;&gt;Apache
Airflow&lt;/a&gt;, &lt;a href=&quot;https://www.getdbt.com/&quot;&gt;dbt&lt;/a&gt;, or
&lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt;. And many commercial tools such as
&lt;a href=&quot;https://www.tableau.com/&quot;&gt;Tableau&lt;/a&gt;, &lt;a href=&quot;https://www.looker.com/&quot;&gt;Looker&lt;/a&gt;,
&lt;a href=&quot;https://powerbi.microsoft.com/&quot;&gt;PowerBI&lt;/a&gt;, or
&lt;a href=&quot;https://www.thoughtspot.com/&quot;&gt;ThoughtSpot&lt;/a&gt; also proudly support Trino users.&lt;/p&gt;

&lt;h2 id=&quot;sql&quot;&gt;SQL&lt;/h2&gt;

&lt;p&gt;All the client tools and integrations rely on the rich SQL support of Trino,
which has grown tremendously. Purely analytics-related support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; and
all its complexities was not enough. Trino gained support for data management to
create schema and tables, but also views and materialized views. And with that
&lt;a href=&quot;https://trino.io/docs/current/language/sql-support.html#write-operations&quot;&gt;write support we needed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;, and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;&lt;/a&gt;.
That’s all done and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; is next. But the core language features were not
able to satisfy the needs of our users. We added functions for a large variety
of topics ranging from simple string and &lt;a href=&quot;https://trino.io/docs/current/functions/datetime.html&quot;&gt;date
functions&lt;/a&gt; to &lt;a href=&quot;https://trino.io/docs/current/functions/json.html&quot;&gt;JSON
support&lt;/a&gt;, &lt;a href=&quot;https://trino.io/docs/current/functions/geospatial.html&quot;&gt;geospatial
functions&lt;/a&gt;, and many
others.&lt;/p&gt;

&lt;p&gt;From the core language perspective we added newer SQL functionality, such as
&lt;a href=&quot;/blog/2021/05/19/row_pattern_matching.html&quot;&gt;window functions and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; support&lt;/a&gt;. Currently we are on a journey to implement
&lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;support for table functions, including polymorphic table functions&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;connectors-and-data-sources&quot;&gt;Connectors and data sources&lt;/h2&gt;

&lt;p&gt;When it comes to the new SQL language features, there are two categories. There
are generic functions and statements that build on top of commonly used
functionality like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt;. These typically work with any connector and therefore
any data sources. And then there are SQL language features that need support in
a connector. After all, inserting data in PostgreSQL and an object storage
system are very different. Our community has been hard at work however, and
numerous connectors have gone way beyond simple read-only access.&lt;/p&gt;

&lt;p&gt;Looking at the number of available connectors, innovation has been tremendous.
The original Hive connector with support for HDFS and a Hive Metastore Service,
became a powerhouse of features. Support for object storage systems including
Amazon S3 and compatible systems, Azure Data Lake Storage, and Google Cloud
Storage, was supplemented by support for Amazon Glue as metastore. We also
constantly added support for different file formats in these systems, and
improved performance for ORC, Parquet, Avro, and others.&lt;/p&gt;

&lt;p&gt;The initial idea to support other data sources led to connectors for over a
dozen other databases, including relational systems such
&lt;a href=&quot;https://www.postgresql.org/&quot;&gt;PostgreSQL&lt;/a&gt;,
&lt;a href=&quot;https://www.oracle.com/database/&quot;&gt;Oracle&lt;/a&gt;, &lt;a href=&quot;https://www.microsoft.com/en-us/sql-server&quot;&gt;SQL
Server&lt;/a&gt;, and many others. We also
gained support for &lt;a href=&quot;https://www.elastic.co/elasticsearch/&quot;&gt;Elasticsearch&lt;/a&gt; and
&lt;a href=&quot;https://www.opensearch.org/&quot;&gt;OpenSearch&lt;/a&gt;, &lt;a href=&quot;https://www.mongodb.com/&quot;&gt;MongoDB&lt;/a&gt;,
&lt;a href=&quot;https://kafka.apache.org/&quot;&gt;Apache Kafka&lt;/a&gt;, and other systems that traditionally
are not available to query with SQL. Trino unlocks completely new use cases for
these systems.&lt;/p&gt;

&lt;p&gt;The wide range of supported systems includes traditional data lakes and data
warehouses. With the emerging new table formats and the related Trino
connectors, our project is a powerful tool to run your lakehouse system. &lt;a href=&quot;https://delta.io/&quot;&gt;Delta
Lake&lt;/a&gt; and &lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;
connectors are already capable of full read and write operations and include
numerous other features. An &lt;a href=&quot;https://hudi.apache.org/&quot;&gt;Apache Hudi&lt;/a&gt; connector is
in the works and coming soon.&lt;/p&gt;

&lt;p&gt;We also have robust and widely used connectors for real-time analytics systems
like &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Apache Pinot&lt;/a&gt;, &lt;a href=&quot;https://druid.apache.org/&quot;&gt;Apache
Druid&lt;/a&gt; and &lt;a href=&quot;https://clickhouse.com/&quot;&gt;Clickhouse&lt;/a&gt;,
that are constantly improved by the community.&lt;/p&gt;

&lt;h2 id=&quot;query-processing-and-performance&quot;&gt;Query processing and performance&lt;/h2&gt;

&lt;p&gt;Last but not least, these queries also need to be processed. From the start high
efficiency and low latency were a core design goal, and with features like
native compilation the resulting performance surpassed other systems. Over the
years our query analyzer and planner was supplemented by more and more
sophisticated algorithms and features. Connectors learned to retrieve and manage
table statistics, the optimizer was created and morphed into a &lt;a href=&quot;/blog/2019/07/04/cbo-introduction.html&quot;&gt;cost-based
optimizer&lt;/a&gt;, and we added further
improvements that benefit query processing performance. We added dynamic
filtering, &lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;dynamic partition pruning&lt;/a&gt;, predicate pushdown, join pushdown,
aggregate function pushdown and numerous others. Each of these improvements was
also finely tuned, and runs in production with huge workloads providing us more
data on how to improve next.&lt;/p&gt;

&lt;p&gt;One large pivot we recently added was the addition of &lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;fault-tolerant query
execution mode&lt;/a&gt;. Queries execution
can survive cluster node failures when this feature is enabled. Parts of the
execution can be retried and query processing can proceed. Trino is moving on
from the best analytics engine to be the best query engine for many more use
case!&lt;/p&gt;

&lt;h2 id=&quot;looking-forward&quot;&gt;Looking forward&lt;/h2&gt;

&lt;p&gt;As you can see there is a lot to look back to and celebrate. But while we are
definitely proud of our successes working with the community, we see no time to rest.
There are many more improvements we are working on. Just to tease you a bit, let
us just mention that there will be more polymorphic table functions, new
lakehouse connectors and features, more client tools, and maybe even dynamic
configuration of the cluster.&lt;/p&gt;

&lt;p&gt;What would you like to add? Join us to celebrate and innovate towards your
favorite features. And who knows, we might see you in the &lt;a href=&quot;/blog/2022/06/30/trino-summit-call-for-speakers.html&quot;&gt;Trino Summit&lt;/a&gt; in November, or in a
future episode of the &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community Broadcast&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser, Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>It’s amazing how far we have come! Our massively-parallel processing SQL query engine, Trino, has really grown up. We have moved beyond just querying object stores using Hive, beyond just one company using the project, beyond usage in Silicon Valley, beyond simple SQL SELECT statements, and definitely also beyond our expectations. Let’s have a look at some of the great technical and architectural changes the project underwent, and how we all benefit from the commitment to quality, openness and collaboration.</summary>

      
      
    </entry>
  
    <entry>
      <title>Why leaving Facebook/Meta was the best thing we could do for the Trino Community</title>
      <link href="https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html" rel="alternate" type="text/html" title="Why leaving Facebook/Meta was the best thing we could do for the Trino Community" />
      <published>2022-08-02T00:00:00+00:00</published>
      <updated>2022-08-02T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-for-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-for-trino.html">&lt;p&gt;It might surprise some that our departure from Facebook was one of the simplest 
decisions we’ve ever made. Many posts that discuss leaving a FAANG company focus
on leaving some grand sum of money or prestige of working at the company. For 
us, we were leaving the company where we had launched a project that we knew 
would quickly outgrow the walls of Facebook, and solve a much larger set of 
problems in the analytics domain. At the time we didn’t quite anticipate that 
Presto, a distributed SQL query engine for big data analytics, would be adopted 
around the globe by thousands of companies and an overwhelming number of 
industries. We appreciate Facebook for serving as the launchpad that inspired 
others to adopt Presto. Despite the harmonious beginnings, once the needs of the
community and Facebook no longer aligned, we had to leave, but we’ll get to that
part shortly.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/leaving-facebook-meta-best-for-trino/original-gang.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;people-make-up-communities-not-companies&quot;&gt;People make up communities, not companies&lt;/h2&gt;

&lt;p&gt;When we created Presto, it was clear to us that it needed to be open source.
Presto started in 2012, just before the Facebook IPO. The culture was very
conducive to starting an open source project. At that time, Facebook was working
on Open Compute which ended up disrupting the hardware industry, and we wanted
to achieve a similar impact for the analytics industry with Presto. We lobbied for and
gained approval from the VP of Infrastructure, Jay Parikh, and released 
&lt;a href=&quot;https://web.archive.org/web/20220203224702/https://www.computerworld.com/article/2485668/facebook-goes-open-source-with-query-engine-for-big-data.html&quot;&gt;Presto as an open source project&lt;/a&gt;. It’s something that we wanted to
do from the beginning, because we had worked with open source projects and 
believed that the most successful projects are open source.&lt;/p&gt;

&lt;p&gt;Getting other people and companies involved makes for a healthier project. You
end up not just building something that satisfies your needs, but needs from
everyone else, and in turn, you benefit. We reached out personally to
people from companies like Airbnb, Dropbox, Netflix, and LinkedIn to get them
involved because we wanted to bootstrap a real community. Five people at
Facebook hacking away was not enough. We actually had these companies beta test
Presto, so that when we launched, the problems that they had found were fixed.&lt;/p&gt;

&lt;p&gt;It’s important to understand why that’s beneficial to really grasp our
philosophy behind open source. In reality, when we say we’re getting more
companies involved, that’s true, but more importantly, we’re getting people
involved. Individuals in the tech space are interested in solving technology
problems. Companies are interested in solving problems that benefit their board,
investors, and their customers. It’s incredibly common to see an overlap in the
problems that engineers, analysts, and scientists are interested in solving with
the problems that companies need to solve, but it’s never guaranteed.&lt;/p&gt;

&lt;p&gt;Moreover, the interest of a company is very susceptible to change from company
growth, IPOs, acquisitions, directional pivots, and general political and
cultural changes. As people start to put their time and energy into a project,
their own identity starts to blend with the success of the project. This is much
less the case with corporations. Since corporations include many people, it
only takes a small set of people in the right position to decide that a project
is no longer aligned with the direction or goals of a company.&lt;/p&gt;

&lt;p&gt;Those of us in the Trino Software Foundation believe that 
&lt;a href=&quot;https://venturebeat.com/2021/08/27/who-owns-open-source-projects-people-or-companies/&quot;&gt;individuals that work on Trino actually make up the community&lt;/a&gt; and not the companies who so graciously allow their employees to
contribute. We view our community as visionaries that want to solve problems and
build systems that last for decades into the future. We don’t allow near-sighted
decisions that may affect the quality of the system, or that may diminish the
value of the application to the greater problem space. Most people do not want
to work on something for years, and then have the company change direction and
throw away all their work.&lt;/p&gt;

&lt;p&gt;To be clear, we’re not saying it’s a bad thing when a company moves in another
direction. That is the nature of business and having corporate involvement can
also be a healthy component of open source. To us, however, the core of what
makes a project long-lasting and beneficial for everyone using the product are
the people who are there building the system and interested in the problem
space. So what happened at Facebook that caused us to leave?&lt;/p&gt;

&lt;h2 id=&quot;why-we-left-facebook&quot;&gt;Why we left Facebook&lt;/h2&gt;

&lt;p&gt;As Presto became central to the infrastructure of prominent projects in Facebook,
it attracted the attention of engineers and managers at Facebook who wanted to 
work on this project. This is a strong sign of success, but some of these folks
did not have the same commitment to the open-source community. This was the
source of much of the conflict as engaging in open-source takes a lot of time
and effort, and we had a strict policy of “no one is special”. This means that
everyone’s code is reviewed, and just because you work for Facebook you still
have to earn commit rights. Engineers at Facebook are strongly motivated to
create “memorable” works to advance in the company, and this means this extra
work is just slowing things down. Feedback from these engineers ultimately
culminated in the managers making the decision to give automatic contributor
rights to any Facebook engineer working on Presto, so that these engineers could
move faster.&lt;/p&gt;

&lt;p&gt;You may think Facebook engineers or managers are the big bad wolf in this
scenario, but they really are not. Engineers at these highly competitive
companies must create memorable work, or they will not get the promotions they
deserve. And if you are a junior engineer and do not get promoted, you get
fired. Corporate leaders also have the right to change how they allocate
resources to work on open-source projects. There’s nothing inherently wrong with
any of this. The problem was changing the commitment we made to keep the
open-source community neutral. It was at that point we knew that we had to
create a fork of the project if we wanted to keep the community’s interest at
the forefront for the project to remain healthy.&lt;/p&gt;

&lt;p&gt;It was also at this point we made our single biggest mistake. We didn’t change
the name away from Presto. It was admittedly hard to walk away from a name we
all knew and loved. We believed that we had set up the project, so that the name
“Presto” was owned by the community and not Facebook. The truth is that once the
community walked out of the project, Facebook was the only one left in Presto
and they became the sole owner. But, the biggest reason this was absolutely the
wrong choice is much simpler; it made the people that stayed at Facebook really
angry. We expected Facebook to do what they really wanted: stop doing the extra
open-source work, fork internally, and leave the community alone. Instead, they
somehow found the motivation to do a lot of work to set up a competing project.
Finally, we spent two additional years continuing to build the Presto name
rather than building the new name and brand. In hindsight, all of this was just
dumb, and we were suffering from our own sunk cost fallacy. So we continued
under the Presto name with the distinguishing suffix of PrestoSQL versus the
original project’s PrestoDB.&lt;/p&gt;

&lt;h2 id=&quot;building-the-trino-community&quot;&gt;Building the Trino community&lt;/h2&gt;

&lt;p&gt;The new PrestoSQL project gave a new home to the existing Presto community. It
provided a project that focused on the open source community and not just the
needs of Facebook. It also gave us time to troubleshoot problems of people who
used Presto. This is what we were doing internally at Facebook but instead we
applied our knowledge of the system towards the community. This was one of the
reasons why leaving Facebook was so beneficial. As we worked closer with
everyone else, we started learning what areas of the project we should focus on
and it turns out that many of the things we were working on at Facebook were
simply not problems that all the other people in the community were facing. This
wasn’t the only benefit to us leaving Facebook, though.&lt;/p&gt;

&lt;p&gt;The hardest part about making a new project successful is user adoption. 
Building great software doesn’t organically build a community. Presto gained 
some of its initial popularity because Facebook used it. We never had to try 
very hard to develop the community initially as the Facebook brand did a great 
job at getting people’s attention. But this community was exclusive to Silicon 
Valley companies. Leaving Facebook acted as a forcing function for us to build 
the community in a classic grassroots way. We went out and started talking to 
people, getting people connected, doing more promotions and events. We were 
pretty motivated after we left. However, all of this is a lot of work for a few 
programmers and while it’s great to see people respond to your work, it takes a
lot out of you. This provided the conditions that gave rise for members to step
up in the new project and become more involved.&lt;/p&gt;

&lt;p&gt;We saw the pattern repeat when
&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;we were forced to rebrand and changed the name to Trino&lt;/a&gt;.
We doubled down again on developing the community, and again participation
accelerated. It’s because of this that we believe the Trino community is stronger
than ever before.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/leaving-facebook-meta-best-for-trino/stars.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since the split, Trino release cycles have increased and far surpassed the speed
we had when we were running Presto. Once brand confusion was settled with the
change to the Trino name, the community numbers skyrocketed and we saw 
&lt;a href=&quot;/blog/2021/12/31/trino-2021-a-year-of-growth.html&quot;&gt;unprecedented growth in metrics like GitHub stars, YouTube subscribers, and Slack members&lt;/a&gt;. 
We have many new community-driven features released in Trino that we will be
discussing in more detail in another blog post coming soon. To name a few, Trino now 
&lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;supports fault-tolerant execution mode&lt;/a&gt;,
&lt;a href=&quot;https://github.com/trinodb/trino/issues/37&quot;&gt;revamped its timestamp support&lt;/a&gt;, 
&lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;dynamic partition pruning&lt;/a&gt;,
&lt;a href=&quot;/blog/2022/07/22/polymorphic-table-functions.html&quot;&gt;polymorphic table functions&lt;/a&gt;,
&lt;a href=&quot;/blog/2021/03/10/introducing-new-window-features.html&quot;&gt;advanced window functions&lt;/a&gt;, 
and much much more!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/leaving-facebook-meta-best-for-trino/trajectory.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;These metrics help confirm our experience in previous open source projects and
with Trino. In the long run, individual-driven open source projects tend to lead
to healthier communities and healthier ecosystems over company-driven open
source projects. We believe that, we practice that, and we are now reaping the
benefits of it as we close the pages of the first decade of this remarkable
project. We can’t begin to express how thankful we are to all of you who
believed in us and have helped grow Trino to what it is today. Also, we do
thank the Facebook leadership, especially Jay Parikh, who gave us the green
light to create and open source Presto from the beginning. We are looking
forward to the twentieth and thirtieth anniversaries as we continue to disrupt
the analytics industry and improve the lives of those who work in it.&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, and David Phillips</name>
        </author>
      

      <summary>It might surprise some that our departure from Facebook was one of the simplest decisions we’ve ever made. Many posts that discuss leaving a FAANG company focus on leaving some grand sum of money or prestige of working at the company. For us, we were leaving the company where we had launched a project that we knew would quickly outgrow the walls of Facebook, and solve a much larger set of problems in the analytics domain. At the time we didn’t quite anticipate that Presto, a distributed SQL query engine for big data analytics, would be adopted around the globe by thousands of companies and an overwhelming number of industries. We appreciate Facebook for serving as the launchpad that inspired others to adopt Presto. Despite the harmonious beginnings, once the needs of the community and Facebook no longer aligned, we had to leave, but we’ll get to that part shortly.</summary>

      
      
    </entry>
  
    <entry>
      <title>Diving into polymorphic table functions with Trino</title>
      <link href="https://trino.io/blog/2022/07/22/polymorphic-table-functions.html" rel="alternate" type="text/html" title="Diving into polymorphic table functions with Trino" />
      <published>2022-07-22T00:00:00+00:00</published>
      <updated>2022-07-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/07/22/polymorphic-table-functions</id>
      <content type="html" xml:base="https://trino.io/blog/2022/07/22/polymorphic-table-functions.html">&lt;p&gt;In the Trino community, we know that being the coolest query engine is a tough
job. We boldly face the intricacies of the SQL standard to bring you the newest
and most powerful features. Today, we proudly announce that as of release 381,
Trino is on its way to full support for polymorphic table functions (PTFs).&lt;/p&gt;

&lt;p&gt;In this blog post, we are explaining the concept of table functions and 
exploring how they can be leveraged. We also look at what we have already 
implemented, and take a sneak peek into the future.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h3 id=&quot;definition-time&quot;&gt;Definition time&lt;/h3&gt;

&lt;p&gt;There are several kinds of functions you can call in a SQL query: scalar
functions, aggregate functions, and window functions. They might process the
input row by row (scalar) or all at once (aggregate). One thing they have in
common is that they return scalar values. Table functions are different. They
return tables. In a query, they can appear in any place where a table reference
shows up such as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;foo&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can also use table functions in joins:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;bar&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;another_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Polymorphic table functions (PTFs) are a subset of table functions where the
schema of the returned table is determined dynamically. The returned table
schema can depend on the arguments you pass to the function.&lt;/p&gt;

&lt;h3 id=&quot;ok-but-why-are-we-so-excited&quot;&gt;OK, but why are we so excited?&lt;/h3&gt;

&lt;p&gt;We are excited because this feature is a real game changer! Polymorphic table
functions make SQL extensible, provide a framework for processing data in
previously impossible ways, and can act as a bridge between the Trino engine and
external systems or resources you might need for processing your data.
Additionally, polymorphic table functions are standard SQL, and they are very
convenient to use.&lt;/p&gt;

&lt;h3 id=&quot;what-is-available-in-trino-today&quot;&gt;What is available in Trino today?&lt;/h3&gt;

&lt;p&gt;So far, we have added a framework for table functions which can be executed by
the connector. Although this is not the full PTF feature yet, we couldn’t wait
to bring it to life. We added query pass-through table functions for JDBC-based
connectors and ElasticSearch. They mostly go by the name &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt;, and they take
a single argument, that being the query text:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;postgresql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class=&quot;s1&quot;&gt;&apos;SELECT
          name
        FROM
          tpch.nation
        WHERE
          nationkey = 0&apos;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And this will return:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;ALGERIA&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Something you can’t notice from that example is that when you’re passing that
“query” argument, it’s taking the entire query and having PostgreSQL execute it.
Whatever connector you’re using, the query argument you pass needs to be written
so that it works on the underlying database. On the opposite and more exciting
side of that, if you have a legacy query specific to a database which has
non-standard SQL syntax and would be difficult to rewrite for Trino, now you can
pass that entire query down to the connector by wrapping it in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt;
function, skipping the need to migrate it.&lt;/p&gt;

&lt;p&gt;Besides PostgreSQL, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; table function has equivalent implementations
for Druid, MySQL, Oracle, Redshift, SQL Server, MariaDB, and SingleStore.
ElasticSearch has a similar function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;raw_query&lt;/code&gt;. You can check out the
&lt;a href=&quot;https://trino.io/docs/current/connector.html&quot;&gt;Trino docs for each supported connector&lt;/a&gt;
for full details.&lt;/p&gt;

&lt;p&gt;But while we’re here, another cool example to showcase is using query
pass-through to take advantage the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MODEL&lt;/code&gt; clause in Oracle:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;SUBSTR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;SUBSTR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;product&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;product&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;sales&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;oracle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;SELECT
        *
      FROM
        sales_view
      MODEL
        RETURN UPDATED ROWS
        MAIN
          simple_model
        PARTITION BY
          country
        MEASURES
          sales
        RULES
          (sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2001] = 1000,
          sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2002] = sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2001] + sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Bounce&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2000],
          sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2002] = sales[&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;, 2001])
      ORDER BY
        country&apos;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can pass an entire query through to leverage a feature that isn’t a part of
the SQL standard, and with that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MODEL&lt;/code&gt; clause, Oracle can do some fancy
multidimensional array processing for you right then and there, returning the
results as a table back into Trino. We don’t want to get too sidetracked delving
into the specifics of non-Trino tech, so if you want to learn more about what
you can do, check out the connectors you use, and see what cool possibilities
are out there!&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h2&gt;

&lt;p&gt;Now that we’ve discussed what PTFs are, how they work in Trino, and what they do
today, it’s useful to look forward to what’s coming next. The next thing we’re
working on is adding the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function to BigQuery.&lt;/p&gt;

&lt;h3 id=&quot;big-ideas&quot;&gt;Big ideas&lt;/h3&gt;

&lt;p&gt;Beyond what’s currently planned, there’s a lot that polymorphic table functions
can do for us. One common function that engineers and analysts commonly request
in Trino is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt;. This is a capability that dynamically groups different
values of an input column and converts each value as a set of columns in the
output table. A potential use of PTFs would enable a PIVOT-like transformation
on data, which otherwise isn’t included in the standard SQL specification.&lt;/p&gt;

&lt;p&gt;Another exciting potential is the ability to write scripts to transform or
generate tables in popular languages like Python, Scala, or Javascript. These
can be used to add even more new capabilities that SQL is missing.&lt;/p&gt;

&lt;h3 id=&quot;looking-forward&quot;&gt;Looking forward&lt;/h3&gt;

&lt;p&gt;The journey to full PTF support in Trino has just begun. A dedicated operator
for table functions is the next big thing. Right now, Trino can handle PTFs, but
they must be pushed down to the connector and executed there. The Trino engine
does not yet know how to execute them. With an operator, the Trino engine will
be able to control and handle table function execution, and we will be able to
pass tables as arguments to table functions. This will unlock the full potential
of PTFs in Trino, and empower Trino to solve a new class of problems and expand
its potential for application in many new domains.&lt;/p&gt;

&lt;p&gt;If you have any questions or ideas for table functions that you would find
useful, reach out to us on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;, and
we would love to hear your thoughts and feedback. We’ll also be doing a Trino
Community Broadcast on PTFs on July 28th @ 1pm EDT, so tune in then to have your
questions answered live!&lt;/p&gt;

&lt;p&gt;If you want to learn more about how to implement PTFs, we are working on another
blog post for you already.&lt;/p&gt;

&lt;p&gt;Happy querying!&lt;/p&gt;</content>

      
        <author>
          <name>Kasia Findeisen, Brian Olsen, and Cole Bowden</name>
        </author>
      

      <summary>In the Trino community, we know that being the coolest query engine is a tough job. We boldly face the intricacies of the SQL standard to bring you the newest and most powerful features. Today, we proudly announce that as of release 381, Trino is on its way to full support for polymorphic table functions (PTFs). In this blog post, we are explaining the concept of table functions and exploring how they can be leveraged. We also look at what we have already implemented, and take a sneak peek into the future.</summary>

      
      
    </entry>
  
    <entry>
      <title>38: Trino tacks on polymorphic table functions</title>
      <link href="https://trino.io/episodes/38.html" rel="alternate" type="text/html" title="38: Trino tacks on polymorphic table functions" />
      <published>2022-07-21T00:00:00+00:00</published>
      <updated>2022-07-21T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/38</id>
      <content type="html" xml:base="https://trino.io/episodes/38.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode we have the pleasure to chat with a couple familiar faces who
have been hard at work building and understanding the features we’re talking
about today:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/kasiafi&quot;&gt;Kasia Findeisen&lt;/a&gt;, Trino Maintainer&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt;, Trino Cocreator and Maintainer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-387-to-391&quot;&gt;Releases 387 to 391&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-387.html&quot;&gt;Trino 387&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for writing ORC Bloom filters for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt; columns.&lt;/li&gt;
  &lt;li&gt;Support for querying Pinot via the gRPC endpoint.&lt;/li&gt;
  &lt;li&gt;Support for predicate pushdown on string columns in Redis.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OPTIMIZE&lt;/code&gt; on Iceberg tables with non-identity partitioning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-388.html&quot;&gt;Trino 388&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for JSON output in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance for row data types.&lt;/li&gt;
  &lt;li&gt;Support for OAuth 2.0 refresh tokens.&lt;/li&gt;
  &lt;li&gt;Support for table and column comments in Delta Lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-389.html&quot;&gt;Trino 389&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;row&lt;/code&gt; type and aggregation.&lt;/li&gt;
  &lt;li&gt;Faster joins when spilling to disk is disabled.&lt;/li&gt;
  &lt;li&gt;Improved performance when writing non-structural types to Parquet.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;raw_query&lt;/code&gt; table function for full query pass-through in Elasticsearch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-390.html&quot;&gt;Trino 390&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for setting comments on views.&lt;/li&gt;
  &lt;li&gt;Improved UNNEST performance.&lt;/li&gt;
  &lt;li&gt;Support for Databricks runtime 10.4 LTS in Delta Lake connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-391.html&quot;&gt;Trino 391&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for AWS Athena partition projection.&lt;/li&gt;
  &lt;li&gt;Faster writing of Parquet data in Iceberg and Delta Lake.&lt;/li&gt;
  &lt;li&gt;Support for reading BigQuery external tables.&lt;/li&gt;
  &lt;li&gt;Support for table and column comments in BigQuery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights and notes according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2022/07/14/trino-updates-to-java-17.html&quot;&gt;Java 17 arrived as required runtime in 390&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Remove support for Elasticsearch versions below 6.6.0, add testing for OpenSearch 1.1.0.&lt;/li&gt;
  &lt;li&gt;New raw query table function in Elasticsearch can replace old full text search and query pass-through support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-387.html&quot;&gt;Trino 387&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-388.html&quot;&gt;Trino 388&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-389.html&quot;&gt;Trino 389&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-390.html&quot;&gt;Trino 390&lt;/a&gt;,
and &lt;a href=&quot;https://trino.io/docs/current/release/release-391.html&quot;&gt;Trino 391&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-polymorphic-table-functions&quot;&gt;Concept of the episode: Polymorphic table functions&lt;/h2&gt;

&lt;p&gt;We normally cover a broad variety of topics in the Trino community broadcast,
exploring different technical details, pull requests, and neat things that are
going on in Trino at large. This episode, however, we’re going to be more
focused, only taking a look at a particular piece of functionality that we’re
all very excited about: polymorphic table functions, or PTFs for short. If
you’re unfamiliar with what this means, that can sound like technobabble word
soup, so we can start exploring this with a simple question…&lt;/p&gt;

&lt;h3 id=&quot;what-is-a-table-function&quot;&gt;What is a table function?&lt;/h3&gt;

&lt;p&gt;The easiest answer to this question is that it’s a function which returns a
table. Scalar, aggregate, and window functions all work a little differently,
but ultimately, they all return a single value each time they are invoked. Table
functions are unique in that they return an entire table. This gives them some
interesting properties that we’ll dive into, but it also means that you can only
invoke them in situations where you’d use a full table, such as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;foo&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can also use table functions in joins:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;bar&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;another_table_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And while that’s all neat, it begs the question…&lt;/p&gt;

&lt;h4 id=&quot;what-can-you-do-with-table-functions&quot;&gt;What can you do with table functions?&lt;/h4&gt;

&lt;p&gt;While standard table functions are cool, they have to return a pre-defined
schema, which limits their flexibility. However, they still have some
interesting uses as means of shortening queries or performing multiple
operations at once. If you frequently find yourself selecting from the same
table with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause checking equality to a specific column but with a
different value each time, you could define a table function which takes that
value as a parameter and allows you to skip all the copying and pasting just for
the sake of one line changing. You could take an extremely lengthy sub-query
with multiple joins and abbreviate it to something as short as one of the
examples above, and then use that in other queries. Or, if you want to update a
table, but you also want to insert into another table as part of the same
operation, you could combine those two steps into one table function, ensuring
that users won’t forget the second part of that process.&lt;/p&gt;

&lt;p&gt;So table functions are functions that return tables. It really is that simple,
and we’re already two-thirds of the way to understanding what polymorphic table
functions are. And now it’s time to add in that fun ‘polymorphic’ word.&lt;/p&gt;

&lt;h3 id=&quot;what-makes-a-table-function-polymorphic&quot;&gt;What makes a table function polymorphic?&lt;/h3&gt;

&lt;p&gt;A polymorphic table function is a type of table function where the schema of
the returned table is determined dynamically. This means that the returned table
data, including its schema, can be determined by the arguments you pass to the
function. And you might imagine, that makes PTFs a lot more powerful than an
ordinary, run-of-the-mill table function.&lt;/p&gt;

&lt;h4 id=&quot;what-can-you-do-with-polymorphic-table-functions&quot;&gt;What can you do with polymorphic table functions?&lt;/h4&gt;

&lt;p&gt;When you’re not determining the schema of the returned table well in advance,
you get the flexibility to do some pretty crazy things. It can be as simple as
adding or removing columns as part of the function, or it can be as complex as
building and returning an entirely new table based on some input data.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-the-many-ways-you-can-leverage-ptfs&quot;&gt;Demo of the episode: The many ways you can leverage PTFs&lt;/h2&gt;

&lt;p&gt;But we’ve talked enough at a high level about what PTFs are, so now it’s a good
time to look at what PTFs can actually do for you to make your life as a Trino
user easier, better, and more efficient.&lt;/p&gt;

&lt;h3 id=&quot;possible-polymorphic-table-functions&quot;&gt;Possible polymorphic table functions&lt;/h3&gt;

&lt;p&gt;One thing to note - all the examples we’re about to look at are &lt;em&gt;hypothetical&lt;/em&gt;.
We’re working to bring functions similar to these to Trino soon, but there’s a
few things left to implement before we get there, so for now, this is meant to
highlight why we’re implementing PTFs, and we’ll take a look at what you can
currently do with them a little later. When it does come time to implement
these functions, they will not be exactly the same as you see them here.&lt;/p&gt;

&lt;h4 id=&quot;select-except&quot;&gt;Select except&lt;/h4&gt;

&lt;p&gt;Imagine a table with 10 columns, named col1, col2, col3, etc. If you want to
select all the columns except the first one from that table, you end up with a
query that looks like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;col2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col10&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;my&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But that’s long, and it’s a pain to type, and it gets messy, especially if your
column names aren’t extremely short due to being part of a contrived example.
With a simple PTF, you could get the same result with:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;excl_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;columns_to_exclude&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;col1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, this isn’t a great PTF, because it’s going to take more time to implement
than it takes to just write out your column names, and at least when we’re using
only 10 columns and short column names, invoking the function takes more writing
than doing it the old-fashioned way. Also, this is going to perform worse than
writing the query the ordinary way. As a rule of thumb, if it can be written
with normal SQL, it will be more performant when done that way. There are plans
to work on optimizing PTFs, but that’s not going to happen soon, so for the time
being, we’re focusing on how they enable things which previously couldn’t
be done at all, rather than making queries look nicer or cleaner.&lt;/p&gt;

&lt;p&gt;All that said, we wanted to include this example because this does a good job at
demonstrating how polymorphic table functions can work and what they can do for
you. But it’s a simple example, and now we can look at some which are a little
more complex and a little more practical.&lt;/p&gt;

&lt;h4 id=&quot;csvreader&quot;&gt;CSVreader&lt;/h4&gt;

&lt;p&gt;If you’ve ever tried to create a table from a CSV file, you know it can be a
painful experience. It has to be very explicit, very diligent, and there’s a lot
of manual cross-checking involved in ensuring that each column aligns perfectly
and is correctly typed for the columns present in the CSV. Enter polymorphic
table functions, here to save the day.&lt;/p&gt;

&lt;p&gt;Remember, this is hypothetical, so by the time we get to implementing something
similar to this in Trino, it will certainly look different. But a table function
like this will be defined on the connector, so all the end user needs to worry
about is what its signature might look like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CSVreader&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Filename&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;FloatCols&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DateCols&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One key thing to note here is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DESCRIPTOR&lt;/code&gt; type. It is a type that describes
a list of column names, and there will be a function to convert a parameterized
list to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DESCRIPTOR&lt;/code&gt; type. Other than that, everything else here does what
you’d expect - you pass the function the name of the CSV file, the columns which
should be typed as floats, and the columns which should have a date typing. All
unspecified columns will still be handled as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt;. Calling the function
might look something like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;CSVreader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Filename&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;my_file.csv&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;FloatCols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;principle&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;interest&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;DateCols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;due_date&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Given a CSV with this content:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-csv&quot;&gt;docno,name,due_date,principle,interest
123,Alice,01/01/2014,234.56,345.67
234,Bob,01/01/2014,654.32,543.21
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Such a function would return a table that looks like:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;docno&lt;/th&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;due_date&lt;/th&gt;
      &lt;th&gt;principle&lt;/th&gt;
      &lt;th&gt;interest&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;2014-01-01&lt;/td&gt;
      &lt;td&gt;234.56&lt;/td&gt;
      &lt;td&gt;345.67&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;2014-01-01&lt;/td&gt;
      &lt;td&gt;654.32&lt;/td&gt;
      &lt;td&gt;543.21&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;With a well-written PTF, the days of toiling over parsing a CSV into SQL are
over!&lt;/p&gt;

&lt;h4 id=&quot;pivot&quot;&gt;Pivot&lt;/h4&gt;

&lt;p&gt;Pivot is an oft-requested feature which hasn’t been built in Trino because it
isn’t a part of the standard SQL specification. A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt; keyword or built-in
function isn’t planned, but with PTFs, we can support &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt;-like functionality
without needing to deviate from SQL.&lt;/p&gt;

&lt;p&gt;A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PIVOT&lt;/code&gt; PTF might have the following definition:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pivot&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PASS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THROUGH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SEMANTICS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Output_pivot_columns&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns4&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_pivot_columns5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But before we look at how you can invoke this, there’s a few clauses here that
are worth explaining…&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PASS THROUGH&lt;/code&gt; means that the input data (and all of its rows) will be fully
available in the output. The alternative to this is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NO PASS THROUGH&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH ROW SEMANTICS&lt;/code&gt; means that the result will be determined on a row-by-row
basis. The alternative to this is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH SET SEMANTICS&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And of course, the function takes some parameters, so a good function author
defines what those parameters do.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;‘Input’ is the input table. It’s any generic table.&lt;/li&gt;
  &lt;li&gt;‘Output_pivot_columns’ is the names of the columns to be created in the pivot
table.&lt;/li&gt;
  &lt;li&gt;Input_pivot_columns are all the columns to be pivoted into the output columns.
The first parameter is required, but you can specify more groupings. The
number of input columns in a group to be pivoted and the number of output
columns must be the same.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you’ve got a PIVOT function, and you understand how to invoke it, so all you
need to do is listen to &lt;a href=&quot;https://youtu.be/8w3wmQAMoxQ?t=82&quot;&gt;Ross from Friends&lt;/a&gt;
and make it happen:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;acctvalue&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Pivot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Input_table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;My&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;Data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Output_pivot_columns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;acctvalue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Input_pivot_columns1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;acctvalue1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Input_pivot_columns2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accttype2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;acctvalue2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we presume we have this data in My.Data:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;ID&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;accttype1&lt;/th&gt;
      &lt;th&gt;acctvalue1&lt;/th&gt;
      &lt;th&gt;accttype2&lt;/th&gt;
      &lt;th&gt;acctvalue2&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;20000&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;350&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;25000&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;120&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The output of that query will be:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;ID&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;accttype&lt;/th&gt;
      &lt;th&gt;acctvalue&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;20000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;123&lt;/td&gt;
      &lt;td&gt;Alice&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;350&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;external&lt;/td&gt;
      &lt;td&gt;25000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;234&lt;/td&gt;
      &lt;td&gt;Bob&lt;/td&gt;
      &lt;td&gt;internal&lt;/td&gt;
      &lt;td&gt;120&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You can see the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PASS THROUGH&lt;/code&gt; clause in action when you select D.id and D.name.&lt;/p&gt;

&lt;h4 id=&quot;execr&quot;&gt;ExecR&lt;/h4&gt;

&lt;p&gt;As a bonus cherry on top, and as an example of something very fun that you can
do with PTFs, how about executing an entire script written in R?&lt;/p&gt;

&lt;p&gt;A connector could provide a function with the signature:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ExecR&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Script&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Input_table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PASS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;THROUGH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SEMANTICS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Rowtype&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The inputs here are the script, which can simply be pasted into the query as
text, the input table which contains the data for the script to run on, and then
a descriptor for row typing, as there’s otherwise no way for the engine to know
after running the R script. Worth pointing out and contrary to the PIVOT
example, this function has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NO PASS THROUGH&lt;/code&gt; because the R script will not have
the ability to copy input rows into output rows.&lt;/p&gt;

&lt;p&gt;Invoking this function is relatively straightforward:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ExecR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Script&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;...&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;Input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;My&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;Data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Rowtype&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col2&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;REAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And depending on your script and your data, you can make this as simple or as
extreme as you’d like!&lt;/p&gt;

&lt;h2 id=&quot;pull-request-of-the-episode-pr-12325-support-query-pass-through-for-jdbc-based-connectors&quot;&gt;Pull request of the episode: PR 12325: Support query pass-through for JDBC-based connectors&lt;/h2&gt;

&lt;p&gt;We’ve spent a lot of time talking about hypothetical value that we will be able
to derive from polymorphic table functions sometime down the line, but we should
also pump the brakes a little and take a look at what we &lt;em&gt;already&lt;/em&gt; have in Trino
in terms of polymorphic table functions. This PR, authored by Kasia Findeisen,
was the first code to land in Trino that allowed access to PTFs. It’s just one
particular PTF, but it’s pretty neat, so we can jump into it with a demo and an
explanation for how we’re already changing the game with PTFs.&lt;/p&gt;

&lt;h3 id=&quot;demo-of-the-episode-2-using-connector-specific-features-with-query-pass-through&quot;&gt;Demo of the episode #2: Using connector-specific features with query pass-through&lt;/h3&gt;

&lt;p&gt;Trino sticks to the SQL standard, which means that custom extensions and syntax
aren’t supported. If you’re using a Trino connector where the underlying
database has a neat feature that isn’t a part of the SQL standard, you
previously were unable to take advantage of that, and you knew it wasn’t going
to be added to Trino. But now with query pass-through, you can leverage any of
the cool non-standard extensions that belong to connectors! We’ll look at a
couple different examples, but keep in mind, because this is pushing an entire
query down to the connector, the possibilities will be based on what the
underlying database is capable of.&lt;/p&gt;

&lt;h4 id=&quot;group_concat-in-mysql&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP_CONCAT()&lt;/code&gt; in MySQL&lt;/h4&gt;

&lt;p&gt;In a table where we have employees and their manager ID, but no direct way to
list managers with all their employees, we can push down a query to MySQL and
use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP_CONCAT()&lt;/code&gt; to combine them all into one column with this query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
  *
FROM
  TABLE(
    mysql.system.query(
      query =&amp;gt; &apos;SELECT
        manager_id, GROUP_CONCAT(employee_id)
      FROM
        company.employees
      GROUP BY
        manager_id&apos;
    )
  );
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;model-clause-in-oracle&quot;&gt;MODEL clause in Oracle&lt;/h4&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MODEL&lt;/code&gt; clause in Oracle is an incredibly powerful way to manipulate and
view data. As it’s non-ANSI compliant, it’s specific to Oracle, but if you want
to use it, now you can! Through polymorphic table functions, you can generate
and perform sophisticated calculations on multidimensional arrays - try saying
that five times fast. We don’t have the time to explain everything about how
this feature works, but if you want clarification, you can check out
&lt;a href=&quot;https://docs.oracle.com/cd/B19306_01/server.102/b14223/sqlmodel.htm&quot;&gt;the Oracle documentation on MODEL&lt;/a&gt;
and try it out for yourself.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
  SUBSTR(country, 1, 20) country,
  SUBSTR(product, 1, 15) product,
  year,
  sales
FROM
  TABLE(
    oracle.system.query(
      query =&amp;gt; &apos;SELECT
        *
      FROM
        sales_view
      MODEL
        RETURN UPDATED ROWS
        MAIN
          simple_model
        PARTITION BY
          country
        MEASURES
          sales
        RULES
          (sales[&apos;Bounce&apos;, 2001] = 1000,
          sales[&apos;Bounce&apos;, 2002] = sales[&apos;Bounce&apos;, 2001] + sales[&apos;Bounce&apos;, 2000],
          sales[&apos;Y Box&apos;, 2002] = sales[&apos;Y Box&apos;, 2001])
      ORDER BY
        country&apos;
    )
  );
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Funnily enough, Oracle also supports polymorphic table functions, so if you
wanted to, you could use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function to then invoke a PTF in Oracle,
including any of the hypothetical examples we went into above! PTFs inside of
PTFs are possible! …though probably not the best idea.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-where-are-we-at-and-whats-coming-next&quot;&gt;Question of the episode: Where are we at, and what’s coming next?&lt;/h2&gt;

&lt;p&gt;Right now, there’s a few things on the radar for moving forward with PTFs. The
first and more simple task at hand is expanding the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query&lt;/code&gt; function to other
connectors. We started with the JDBC connectors, but we have also landed a
similar function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;raw_query&lt;/code&gt; for ElasticSearch, are working on a BigQuery
implementation, and there may still be more yet to come.&lt;/p&gt;

&lt;p&gt;On a broader scope, the reason this was the first PTF that was implemented is
because Trino doesn’t have to do anything to make it work. The next big step in
powering PTFs up is to create an operator and make the engine aware of them, so
that the engine can handle and process PTFs itself, which will open the door to
the wide array of possibilities we explored earlier.&lt;/p&gt;

&lt;p&gt;And finally, once that’s done, we plan on empowering you, the Trino community,
to go out and actually &lt;em&gt;make&lt;/em&gt; some polymorphic table functions. You already can
implement them today, but with those limitations: you can’t use table or
descriptor arguments, and the connector has to perform the execution. But once
the full framework for PTFs has been built, those examples from earlier (and
many possible others) still need to be implemented. There is a
&lt;a href=&quot;https://trino.io/docs/current/develop/table-functions.html&quot;&gt;developer guide&lt;/a&gt; on
implementing table functions which exists today, but there are plans to expand
it so that it’s easier to go in and add the PTFs which will make a difference
for you and your workflows.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Check out the in-person and virtual
&lt;a href=&quot;https://www.meetup.com/pro/trino-community/&quot;&gt;Trino Meetup groups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Slowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino updates to Java 17</title>
      <link href="https://trino.io/blog/2022/07/14/trino-updates-to-java-17.html" rel="alternate" type="text/html" title="Trino updates to Java 17" />
      <published>2022-07-14T00:00:00+00:00</published>
      <updated>2022-07-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/07/14/trino-updates-to-java-17</id>
      <content type="html" xml:base="https://trino.io/blog/2022/07/14/trino-updates-to-java-17.html">&lt;p&gt;You’ve already read the title, and it’s exciting news - as of Trino version 390,
which releases today, Trino has officially been updated from Java 11 to Java 17.
This has a few implications, the most important of which is that if you aren’t
running the Docker image (which automatically comes with the correct version of
Java) and you’ve been running Trino on Java 16 or older, you’ll need to update
Java to run Trino versions 390 and later. It’s also worth mentioning that newer
versions of Java, such as Java 18 or 19, are not supported - they might work,
but they haven’t been tested or benchmarked - Java 17 is the new, recommended
version for Trino.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The reason this change is exciting is that using a new and better version of
Java will make Trino better, too! This initial change is an update to the
runtime version, or what the Trino engine uses while it runs. Because the Java
language performs slightly better on the whole with this update, you may see
some small, across-the-board performance improvements when switching from Java
11 to Java 17. So when you’ve got the time, we strongly recommend making the
upgrade!&lt;/p&gt;

&lt;p&gt;The plan is to update the build to Java 17 a few weeks from now, which will also
allow us to use Java 17 APIs and the changes to the language in Trino code. With
new language features, there are more tools in the development toolkit, and
it’ll allow us to write cleaner and better code moving forwards.&lt;/p&gt;

&lt;p&gt;This upgrade has been in the works for a while and been a long time coming, so
if you want to learn more about the specifics, one of the best places to check
that out is the Trino Community Broadcast. Updating to Java 17 was the focus of
&lt;a href=&quot;https://trino.io/episodes/36.html&quot;&gt;episode 36&lt;/a&gt;, and we also talked about it
previously in &lt;a href=&quot;https://trino.io/episodes/35.html&quot;&gt;episode 35&lt;/a&gt;. If you want to
check out the code changes that made this happen, you can view
&lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;the tracking issue on Github&lt;/a&gt; for
more information.&lt;/p&gt;

&lt;p&gt;And finally, we want to give a shoutout to &lt;a href=&quot;https://github.com/wendigo&quot;&gt;Mateusz Gajewski&lt;/a&gt;
for all the hard work in driving this change.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>You’ve already read the title, and it’s exciting news - as of Trino version 390, which releases today, Trino has officially been updated from Java 11 to Java 17. This has a few implications, the most important of which is that if you aren’t running the Docker image (which automatically comes with the correct version of Java) and you’ve been running Trino on Java 16 or older, you’ll need to update Java to run Trino versions 390 and later. It’s also worth mentioning that newer versions of Java, such as Java 18 or 19, are not supported - they might work, but they haven’t been tested or benchmarked - Java 17 is the new, recommended version for Trino.</summary>

      
      
    </entry>
  
    <entry>
      <title>How to use Airflow with Trino</title>
      <link href="https://trino.io/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs.html" rel="alternate" type="text/html" title="How to use Airflow with Trino" />
      <published>2022-07-13T00:00:00+00:00</published>
      <updated>2022-07-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs</id>
      <content type="html" xml:base="https://trino.io/blog/2022/07/13/how-to-use-airflow-to-schedule-trino-jobs.html">&lt;p&gt;The recent addition of the &lt;a href=&quot;/docs/current/admin/fault-tolerant-execution.html&quot;&gt;fault-tolerant
execution&lt;/a&gt; architecture,
delivered to Trino by Project Tardigrade, makes the use of Trino for running
your ETL workloads an even more compelling alternative than ever before. We’ve
set up a demo environment for you to easily give it a try in &lt;a href=&quot;https://www.starburst.io/platform/starburst-galaxy/&quot;&gt;Starburst
Galaxy&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;With Project Tardigrade providing an out-of-the-box solution with advanced
resource-aware task scheduling and granular retries at the task/query level, we still
need a robust tool to schedule and manage workloads themselves. Apache
Airflow is a great choice for this purpose.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://airflow.apache.org/&quot;&gt;Apache Airflow&lt;/a&gt; is a widely used workflow engine that allows you to schedule and
run complex data pipelines. Airflow provides many plug-and-play operators and
hooks to integrate with many third-party services like Trino.&lt;/p&gt;

&lt;p&gt;To get started using Airflow to run data pipelines with Trino you need to
complete the following steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Install of Apache Airflow 2.10+&lt;/li&gt;
  &lt;li&gt;Install the TrinoHook&lt;/li&gt;
  &lt;li&gt;Create a Trino connection in Airflow&lt;/li&gt;
  &lt;li&gt;Deploy a TrinoOperator&lt;/li&gt;
  &lt;li&gt;Deploy your DAGs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;installing-apache-airflow-in-docker&quot;&gt;Installing Apache Airflow in Docker&lt;/h2&gt;

&lt;p&gt;The best way to get you going, if you don’t already have an Airflow cluster
available, is to run Airflow in a container using docker compose. Just be
aware that this is not best practice for a production environment.&lt;/p&gt;

&lt;p&gt;Requirements for the host:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Docker&lt;/li&gt;
  &lt;li&gt;Docker Compose 1.28+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 1) Create a directory named airflow for all our configuration files.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ mkdir airflow
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 2) In the airflow directory create three subdirectory called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dags&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plugins&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;logs&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ cd airflow
$ mkdir dags plugins logs
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 3) Download the Airflow docker compose yaml file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ curl -LfO &apos;https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 4) Create an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.env&lt;/code&gt; configuration file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ echo -e &quot;AIRFLOW_UID=$(id -u)&quot; &amp;gt; .env
$ echo &quot;AIRFLOW_GID=0&quot; &amp;gt;&amp;gt; .env 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Step 5) Start the Airflow containers&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;installing-the-trinohook&quot;&gt;Installing the TrinoHook&lt;/h2&gt;

&lt;p&gt;If running Airflow in docker, you need to install the TrinoHook in
all the docker containers using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;apache/airflow:x.x.x&lt;/code&gt; image.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker ps 
CONTAINER ID   IMAGE                  PORTS                              NAMES
cffdfaeb757e   apache/airflow:2.3.0   0.0.0.0:8080-&amp;gt;8080/tcp             airflow_airflow-webserver_1
b0e72f479a66   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-worker_1
4cdb11b3e5e3   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-triggerer_1
41d3c3107ddb   apache/airflow:2.3.0   0.0.0.0:5555-&amp;gt;5555/tcp, 8080/tcp   airflow_flower_1
229a11e9cdd3   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-scheduler_1
68160240857d   postgres:13            5432/tcp                           airflow_postgres_1
a96b98da85df   redis:latest           6379/tcp                           airflow_redis_1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To install the TrinoHook you run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pip install apache-airflow-providers-trino&lt;/code&gt; in
the first five containers.  Run the following command replacing the container id of
each of the containers in your deployment.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker exec -it &amp;lt;container_id&amp;gt; pip install apache-airflow-providers-trino
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once you have done that you need to restart all five containers:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker container restart &amp;lt;container_id_1&amp;gt; ... &amp;lt;container_id_5&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;creating-a-trino-connection&quot;&gt;Creating a Trino connection&lt;/h2&gt;

&lt;p&gt;After you have installed the TrinoHook and restarted Airflow you can create a
connection to your Trino cluster through the Airflow web UI.  If you just
installed Airflow, then go to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;http://localhost:8080&lt;/code&gt; on your browser and login.
The default credentials unless changed are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;airflow&lt;/code&gt; for username and password.&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;Admin&lt;/strong&gt; &amp;gt; &lt;strong&gt;Connections&lt;/strong&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-connections.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Click on the blue button to &lt;strong&gt;Add a new record&lt;/strong&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-new-connection.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Select &lt;strong&gt;Trino&lt;/strong&gt; from the &lt;strong&gt;Connection Type&lt;/strong&gt; dropdown and provide the following information:&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;Connection Id&lt;/td&gt;
   &lt;td&gt;Whatever you want to call your connection.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
    Host
   &lt;/td&gt;
   &lt;td&gt;The hostname or host ip of your trino cluster, e.g., &lt;code&gt;localhost&lt;/code&gt;, &lt;code&gt;10.10.10.1&lt;/code&gt;, or &lt;code&gt;www.mytrino.com&lt;/code&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Schema&lt;/td&gt;
   &lt;td&gt;A schema in your Trino cluster.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Login&lt;/td&gt;
   &lt;td&gt;The username of the user that Airflow uses to connect to Trino.  Best practice would be to create a service account like ‘airflow’. Just understand that this user access level is used to execute SQL statements in Trino.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Password&lt;/td&gt;
   &lt;td&gt;The password of the user that Airflow uses to connect to Trino if authentication is enabled.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Port&lt;/td&gt;
   &lt;td&gt;The port where the Trino Web UI can be accessed, e.g., &lt;code&gt;8080&lt;/code&gt;, &lt;code&gt;8443&lt;/code&gt;.&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Extra&lt;/td&gt;
   &lt;td&gt;Additional settings, like &lt;code&gt;protocol:https&lt;/code&gt; if using TLS, or &lt;code&gt;verify:false&lt;/code&gt; if you are using a self-signed certificate.&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Be aware that the test button might not actually return any feedback for Trino connections.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-add-connection.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;deploying-a-trinooperator&quot;&gt;Deploying a TrinoOperator&lt;/h2&gt;

&lt;p&gt;At the time of writing this article there is no TrinoOperator, so you have to
write your own.  You find an implementation in the following section, to get you started.  This operator allows you to
execute any SQL statements that Trino supports such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET SESSION&lt;/code&gt;, and others. You can run multiple statements in a single task so
they are part of a single Trino session.&lt;/p&gt;

&lt;p&gt;To create the TrinoOperator use your favorite text editor to create a file called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino_operator.py&lt;/code&gt; with the following code in it and place it in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;airflow/plugins&lt;/code&gt; directory you created earlier. Airflow automatically compiles the code and you are ready to start
writing DAGs.&lt;/p&gt;

&lt;p&gt;For those new to Airflow, DAG (Directed Acyclic Graph) is a core Airflow
concept, a collection of tasks with dependencies and relationships that indicate
to Airflow how they should be executed. DAGs are written in Python.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.models.baseoperator&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BaseOperator&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.utils.decorators&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;apply_defaults&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.providers.trino.hooks.trino&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TrinoHook&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typing&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Sequence&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Callable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fetchall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoCustomHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TrinoHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Callable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:sphinx-autoapi-skip:&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;super&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TrinoHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BaseOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;template_fields&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Sequence&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,)&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@apply_defaults&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;super&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;task_instance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Creating Trino connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoCustomHook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;isinstance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;

            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Executing single sql statement&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;get_first&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Executing multiple sql statements&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;isinstance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql_statement&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql_statements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;extend&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql_statement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))))&lt;/span&gt;

            &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Executing multiple sql statements&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hook&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;autocommit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;deploying-a-dag&quot;&gt;Deploying a DAG&lt;/h2&gt;

&lt;p&gt;Now that you have deployed the TrinoOperator you can start writing DAGs for your
data pipelines. Let’s write and deploy a simple sample DAG.  DAGs just like the
TrinoOperator are deployed into the airflow/dags
directory you created earlier.&lt;/p&gt;

&lt;p&gt;Create a file called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_first_trino_dag.py&lt;/code&gt; with the following code, and save it in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;airflow/dags&lt;/code&gt; directory.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pendulum&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DAG&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;airflow.operators.python_operator&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PythonOperator&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trino_operator&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TrinoOperator&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;## This method is called by task2 (below) to retrieve and print to the logs the return value of task1
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;print_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;task_instance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kwargs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_instance&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Return Value: &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task_instance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;xcom_pull&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task_ids&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;return_value&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;default_args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;depends_on_past&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;dag_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;my_first_trino_dag&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;schedule_interval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;0 8 * * *&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;start_date&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pendulum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2022&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tz&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;US/Central&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;catchup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tags&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 1 runs a Trino select statement to count the number of records 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## in the tpch.tiny.customer table
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;trino_connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;select count(1) from tpch.tiny.customer&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 2 is a Python Operator that runs the print_command method above 
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PythonOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;print_command&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;python_callable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;print_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;provide_context&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;dag&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 3 demonstrates how you can use results from previous statements in new SQL statements
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_3&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;trino_connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;select { { task_instance.xcom_pull(task_ids=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;,key=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;return_value&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;)[0] } }&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## Task 4 demonstrates how you can run multiple statements in a single session.  
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Best practice is to run a single statement per task however statements that change session 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## settings must be run in a single task.  The set time zone statements in this example will 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## not affect any future tasks but the two now() functions would timestamps for the time zone 
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## set before they were run.
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TrinoOperator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;task_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;task_4&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;trino_conn_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;trino_connection&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;set time zone &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;America/Chicago&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;; select now(); set time zone &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;UTC&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; ; select now()&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;## The following syntax determines the dependencies between all the DAG tasks.
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Task 1 will have to complete successfully before any other tasks run.
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Tasks 3 and 4 won&apos;t run until Task 2 completes.
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;## Tasks 3 and 4 can run in parallel if there are enough worker threads. 
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;task1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;task2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;task4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Just like with the TrinoOperator DAGs are picked up and compiled by Airflow
automatically.  When Airflow fails to compile your DAG it displays an error
message at the top of the page in the main page where all the DAGs are listed.
You can refresh this page a few times until your DAG is either added to the list
or you see an error message.  You can expand the message to see the source of
the error.  Usually the information provided is enough to understand the issue.&lt;/p&gt;

&lt;p&gt;Once the DAG shows up on your list you can trigger a manual run, using the play
button on the right to  activate your DAG.  I recommend switching to the Graph
view, using the action links on the right to see  how tasks change status as
they run.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-dag.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;You can see logs for each task by clicking on the corresponding box and selecting Log from the options at the top.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;60%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-task.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Check out the logs for the print_command task to see the return value of select statement from task_1&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;60%&quot; src=&quot;/assets/blog/trino-airflow-blog/airflow-logs.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;As you can see, output from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;print()&lt;/code&gt; commands can be found in these logs.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Apache Airflow has been around for many years now. It is used by many large
companies in production environments. The open source project has an active
community, and I expect that in the near future we will have an official
TrinoHook with additional out-of-the-box functionality. While there might be a
slight learning curve for new users I think that is worth it.&lt;/p&gt;

&lt;p&gt;On the Trino side there are some exciting enhancements for &lt;a href=&quot;/docs/current/admin/fault-tolerant-execution.html&quot;&gt;fault-tolerant
execution&lt;/a&gt; on
the roadmap of Project Tardigrade that will make Trino and Airflow an even
better combination.&lt;/p&gt;

&lt;p&gt;Stay tuned.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note from Trino community&lt;/em&gt;: We welcome blog submissions from the community. If
you have blog ideas, send a message in the #dev chat. We will mail you
Trino swag as a token of appreciation for successful submissions. Enter the &lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-1aek3l6bn-ZMsvFZJqP1ULx5pU17WP1Q&quot;&gt;Trino
Slack&lt;/a&gt;
and join the conversation in the #project-tardigrade
&lt;a href=&quot;https://join.slack.com/share/enQtMzc3OTczMzkxNDU0OC1mNzEyOWUzNjUyMTgyNDU3ZGJlYTZjYTllYTI1ZmFhMDBlMzYwZWQzOGVkMjhhOGNlMmQ5MWIxM2RmNzZjNWY0&quot;&gt;channel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://cutt.ly/airflow-reddit&quot;&gt;Discuss on Reddit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://news.ycombinator.com/item?id=32100426&quot;&gt;Discuss On Hacker News&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Willie Valdez</name>
        </author>
      

      <summary>The recent addition of the fault-tolerant execution architecture, delivered to Trino by Project Tardigrade, makes the use of Trino for running your ETL workloads an even more compelling alternative than ever before. We’ve set up a demo environment for you to easily give it a try in Starburst Galaxy.</summary>

      
      
    </entry>
  
    <entry>
      <title>Announcing the 2022 Trino Summit</title>
      <link href="https://trino.io/blog/2022/06/30/trino-summit-call-for-speakers.html" rel="alternate" type="text/html" title="Announcing the 2022 Trino Summit" />
      <published>2022-06-30T00:00:00+00:00</published>
      <updated>2022-06-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/06/30/trino-summit-call-for-speakers</id>
      <content type="html" xml:base="https://trino.io/blog/2022/06/30/trino-summit-call-for-speakers.html">&lt;p&gt;We are pleased to announce the upcoming 2022 Trino Summit. The summit is
scheduled as &lt;em&gt;hybrid&lt;/em&gt; event on the 10th of November 2022, and attendance is
free! You will be able to join us online, or you can make the trip to San
Francisco and meet us at the Commonwealth Club on the downtown waterfront.
Please be aware that spots at the live event are limited, so register soon if
you want to attend. Please also be aware that you need to register regardless of
whether you’ll be joining us in-person or online.&lt;/p&gt;

&lt;div class=&quot;card-deck spacer-30&quot;&gt;
    &lt;a class=&quot;btn btn-pink&quot; href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;
        Register to attend
    &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Starburst is the lead sponsor for the summit, but they welcome other sponsors to
help make this a successful event for the Trino community. If that interests you
or your employer, you should &lt;a href=&quot;mailto:events@starburst.io&quot;&gt;contact the Starburst team for more information.&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;If you’d like to share your knowledge and information about Trino usage and give
a talk at this year’s Trino Summit, we’re putting out a call for speakers. We
will be accepting submissions from now until September 15th, but we recommend
submitting soon, because slots are filling up fast.&lt;/p&gt;

&lt;p&gt;We’re looking for intermediate to advanced-level talks on a variety of themes.
If you have an interesting story about how you were able to leverage Trino,
found a neat way to extend it with a custom plugin, or swapped to Trino for a
performance win, we’d love to hear about it. We’re excited to expand our speaker
lineup with talks from the broader Trino community. If you’re interested, you
can check out the speaker registration page for more information.&lt;/p&gt;

&lt;p&gt;And of course, we’re looking forward to seeing you there, whether in-person or
online!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update from 15th September 2022:&lt;/em&gt; The call for speakers is closed. Thank you
for all your submissions.&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>We are pleased to announce the upcoming 2022 Trino Summit. The summit is scheduled as hybrid event on the 10th of November 2022, and attendance is free! You will be able to join us online, or you can make the trip to San Francisco and meet us at the Commonwealth Club on the downtown waterfront. Please be aware that spots at the live event are limited, so register soon if you want to attend. Please also be aware that you need to register regardless of whether you’ll be joining us in-person or online. Register to attend Starburst is the lead sponsor for the summit, but they welcome other sponsors to help make this a successful event for the Trino community. If that interests you or your employer, you should contact the Starburst team for more information.</summary>

      
      
    </entry>
  
    <entry>
      <title>Using Trino as a batch processing engine</title>
      <link href="https://trino.io/blog/2022/06/24/trino-meetup-extract-trino-load.html" rel="alternate" type="text/html" title="Using Trino as a batch processing engine" />
      <published>2022-06-24T00:00:00+00:00</published>
      <updated>2022-06-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/06/24/trino-meetup-extract-trino-load</id>
      <content type="html" xml:base="https://trino.io/blog/2022/06/24/trino-meetup-extract-trino-load.html">&lt;p&gt;This past week, &lt;a href=&quot;https://github.com/arhimondr&quot;&gt;Andrii Rosa&lt;/a&gt; hosted a virtual
Trino meetup on the topic of using Trino as a batch processing engine. You can
view the talk from the meetup embedded below. Andrii dives into the history of
Trino as an engine for Batch ETL (extract, transform, load) processing, some
challenges related to that, as well as the new fault-toleration execution
capabilities being added to Trino and how they improve it for Batch ETL use
cases.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;
&lt;iframe width=&quot;560&quot; height=&quot;400&quot; src=&quot;https://www.youtube.com/embed/2Ywqbz4T-Sw?t=1116&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;
&lt;div class=&quot;spacer-30&quot;&gt;&lt;/div&gt;

&lt;p&gt;Andrii also gives an update on the work in progress with fault-tolerant
execution, where we are today, and what’s planned for the near future. The
meetup wraps up a with an attendee Q&amp;amp;A at the end. If you’d like to learn more,
go check out the talk!&lt;/p&gt;</content>

      
        <author>
          <name>Cole Bowden</name>
        </author>
      

      <summary>This past week, Andrii Rosa hosted a virtual Trino meetup on the topic of using Trino as a batch processing engine. You can view the talk from the meetup embedded below. Andrii dives into the history of Trino as an engine for Batch ETL (extract, transform, load) processing, some challenges related to that, as well as the new fault-toleration execution capabilities being added to Trino and how they improve it for Batch ETL use cases.</summary>

      
      
    </entry>
  
    <entry>
      <title>37: Trino powers up the community support</title>
      <link href="https://trino.io/episodes/37.html" rel="alternate" type="text/html" title="37: Trino powers up the community support" />
      <published>2022-06-16T00:00:00+00:00</published>
      <updated>2022-06-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/37</id>
      <content type="html" xml:base="https://trino.io/episodes/37.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode we have the pleasure to chat with our colleagues, who now make 
the Trino community better every day:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/cole-m-bowden/&quot;&gt;Cole Bowden&lt;/a&gt;, Developer Advocate at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/n1neinchnick&quot;&gt;Jan Waś&lt;/a&gt;, Software Engineer at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/KostasPardalis&quot;&gt;Kostas Pardalis&lt;/a&gt;, Group Project Manager at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/Moni4489&quot;&gt;Monica Miller&lt;/a&gt;, Developer Advocate at Starburst&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-382-to-386&quot;&gt;Releases 382 to 386&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-382.html&quot;&gt;Trino 382&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for reading wildcard tables in the BigQuery connector.&lt;/li&gt;
  &lt;li&gt;Support for adding columns in the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support updating Iceberg table partitioning.&lt;/li&gt;
  &lt;li&gt;Improved &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; performance in the MySQL, Oracle, and PostgreSQL connectors.&lt;/li&gt;
  &lt;li&gt;Basic authentication in the Prometheus connector.&lt;/li&gt;
  &lt;li&gt;Exchange spooling on Google Cloud Storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-383.html&quot;&gt;Trino 383&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_exists&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_query&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_value&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Support for table comments in the Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Support IAM roles for exchange spooling on S3.&lt;/li&gt;
  &lt;li&gt;Improved performance for aggregation queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-384.html&quot;&gt;Trino 384&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for new pass-through query table function for Druid, MariaDB, MySQL,
Oracle, PostgreSQL, Redshift, SingleStore and SQL Server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-385.html&quot;&gt;Trino 385&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_array&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_object&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;Support for time travel syntax in the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp(p)&lt;/code&gt; type in MariaDB connector.&lt;/li&gt;
  &lt;li&gt;Performance improvements in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-386.html&quot;&gt;Trino 386&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved performance for fault-tolerant query execution&lt;/li&gt;
  &lt;li&gt;Faster queries on Delta Lake&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;383 had a regression, don’t use it.&lt;/li&gt;
  &lt;li&gt;As mentioned last time, exchange spooling is now supported on the three major
cloud object storage systems.&lt;/li&gt;
  &lt;li&gt;Query pass-through table function is a massive feature. We are adding this to
other connectors, and more details are coming in a future special episode.&lt;/li&gt;
  &lt;li&gt;Special props to &lt;a href=&quot;https://github.com/kasiafi&quot;&gt;Kasia&lt;/a&gt; for all the new JSON functions.&lt;/li&gt;
  &lt;li&gt;Phoenix 4 support is gone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-382.html&quot;&gt;Trino 382&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-383.html&quot;&gt;Trino 383&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-384.html&quot;&gt;Trino 384&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-385.html&quot;&gt;Trino 385&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-386.html&quot;&gt;Trino 386&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-how-to-strengthen-the-trino-community&quot;&gt;Concept of the episode: How to strengthen the Trino community&lt;/h2&gt;

&lt;p&gt;What is community, and why has this word seen more use around technical projects,
particularly those in the open-source space. There’s really no formal definition
of community in the context of technology. David Spinks, author of the book, 
“The Business of Belonging”, defines community as:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A group of people who feel a shared sense of belonging.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For technical projects, this sense of belonging generally comes from the shared
affinity towards a specific product, like Trino, or it could be a brand that
hosts many products, like Google or Microsoft. There’s a lot that could be 
discussed here regarding why communities have become an essential ingredient to
a project’s success. The quick answer I like to offer is that projects,
open-source or proprietary, that have strong communities behind them
innovate and grow faster, and are more successful overall.&lt;/p&gt;

&lt;p&gt;As such, the Trino Software Foundation (TSF) recognizes that Trino will only be
as successful as the health of the community that builds, tests, uses, and 
shares it. The activities around building a technical community fall in between
engineering, marketing, and customer enablement. A common name that encompasses
the individuals that work in this space is developer relations, DevRel for
short. The goal of our work with the maintainers, contributors, users, and all 
other members of the community is the following:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Grow all aspects of the Trino project, and the Trino community to empower
current and future members of the community.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We introduce some new faces who are stewards in our journey to growing the
adoption of our favorite query engine, what each of them does, and how their
work impacts you as a community member! Most importantly, you can learn how to
get involved and help us learn how to best navigate ideas, issues, or any other
contributions you may have that helps Trino to be the best query engine.&lt;/p&gt;

&lt;h3 id=&quot;improving-the-onboarding-and-getting-started-pages&quot;&gt;Improving the onboarding and getting started pages&lt;/h3&gt;

&lt;p&gt;We don’t really have a seamless onboarding experience for new users. Many 
members have asked questions on where to get started. One logical place people
tend to go to when browsing on the front page of the Trino site is the 
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;getting started tab&lt;/a&gt;, which is ironically 
still on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino.io/download.html&lt;/code&gt; page. When you open this page, you are
brought to a page primarily containing the latest binary downloads, some
community links, and some reading material to books and other resources.&lt;/p&gt;

&lt;p&gt;The main thing you don’t really see is much getting started material. A lot of
the material is intermediate level at best. There is not much beginner level
guides to offer the self-service onboarding many are looking for when they just
want to play around without having to bother or wait for anyone to respond. As
it stands today, there is some work that Brian and Monica have started to create
in this area to make the onboarding simpler.&lt;/p&gt;

&lt;p&gt;A very common self-service getting started material is the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started&quot;&gt;trino-getting-started&lt;/a&gt;
repo that Brian created to host demonstrations for the broadcast
to show off some new feature or connector capabilities. This has been a good
way to offer a simple environment to get them started. The only way
to find this repository is to ask someone first. It would be ideal to showcase
getting started materials as part of the default experience of learning about
Trino.&lt;/p&gt;

&lt;p&gt;Monica is working now on building up some demos using SaaS products like
Starburst Galaxy as another method of using Trino without needing to install
Docker among having to use any of your hardware to run through some examples.
These options are typically more UI driven and much more approachable by
members of the community that aren’t engineers or administrators.&lt;/p&gt;

&lt;h3 id=&quot;release-process&quot;&gt;Release process&lt;/h3&gt;

&lt;h4 id=&quot;filling-out-a-pull-request&quot;&gt;Filling out a pull request&lt;/h4&gt;

&lt;p&gt;We’ve got a handy PR template that exists for all contributors to use when 
they’ve submitting a pull request to Trino. Most of it is simple and
self-explanatory. We ask you to describe what’s happening, where the change is
happening, and what type of change it is. These are for the sake of the
reviewers, giving them some important context so they understand what’s going on
when they review the code. For simpler changes, it’s not usually necessary to go
into a ton of detail here, but it’s nice to give a little summary for anyone looking at the PR.&lt;/p&gt;

&lt;p&gt;The next steps are what really matter for every single PR that’s going to be
merged - the documentation and release notes for a change. These are about
communicating to our users. Documentation refers to Trino docs, not code
comments. If Trino users need to be told how to use the feature you’re
changing because of how you’re changing it, that means we need to have
documentation for it. The PR template gives the options for how to go about
this, but it’s incredibly helpful to have this filled out. Similarly, we ask
whether or not release notes are necessary for the change, and what release
notes you propose for your change. Generally speaking, if it needs to be
documented, it almost always should have a release note. Even if it isn’t
documented, a release note is often a good idea - things like performance
improvements don’t require our users to change how they use Trino, but they
won’t mind knowing that something has gotten better! The release process
involves heavy editing of release notes, so it’s ok for the suggested note to be
imperfect.&lt;/p&gt;

&lt;h3 id=&quot;what-is-developer-experience-devex&quot;&gt;What is developer experience (DevEx)?&lt;/h3&gt;

&lt;p&gt;Trino is a technology that is built by developers, but also heavily used by 
developers. We want to ensure that the experience of both contributors and users
of Trino is the best possible. To do that, we have to focus on many different
aspects of this experience, from committing code to the CLIs and tools we offer
for debugging queries and most importantly to building a sustainable community
that can give answers and drive the future of the project. This is what DX is
for Trino.&lt;/p&gt;

&lt;h3 id=&quot;community-metrics&quot;&gt;Community metrics&lt;/h3&gt;

&lt;p&gt;A while ago we started gathering metrics related to &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the Trino GitHub repository&lt;/a&gt;.
This helped us identify issues like huge CI queue times. Most importantly we. can verify
that the changes we made improved things, and how much.&lt;/p&gt;

&lt;p&gt;In February this year, the 95th percentile of the CI queue time (not even the 
total run time!) was as high as almost 7 hours. Trino uses public GitHub runners,
and there can only be 60 jobs running concurrently at the same time. This is a 
bottleneck because Trino has extensive test coverage for the core engine, all
connectors, and other plugins. Because we can’t increase the number of runners,
we looked into doing impact analysis to skip tests for modules not impacted by
any change in a pull request.&lt;/p&gt;

&lt;p&gt;Since April, the 95th percentile of the CI queue time is under 1 hour, even 
though the number of contributions is at an all-time high.&lt;/p&gt;

&lt;p&gt;We keep track of these selected metrics in reports we create by running queries 
using the Trino CLI, saving the results in a markdown file, and publishing them 
as static pages using GitHub pages. The data is gathered using 
Trino connectors for the GitHub API and Git repositories. There’s a GitHub
actions workflow running on a schedule, that spins up a Trino server, so there’s no
infrastructure to maintain, except for a single S3 bucket. All of it is publicly
available in the &lt;a href=&quot;https://github.com/nineinchnick/trino-cicd&quot;&gt;nineinchnick/trino-cicd&lt;/a&gt; 
repository. On the right, there’s a link to GitHub pages with reports.&lt;/p&gt;

&lt;p&gt;We continue to add more reports, like tracking flaky tests or pull request 
activity:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://nineinchnick.github.io/trino-cicd/reports/flaky/&quot;&gt;Flaky tests&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://nineinchnick.github.io/trino-cicd/reports/pr/&quot;&gt;Pull request activity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By being data-driven and transparent, we make sure to provide a good 
experience for everyone, and this also helps us figure out where we need more 
resources to focus on.&lt;/p&gt;

&lt;p&gt;We’re open to suggestions on what to track and which metrics to report on, so 
feel free to open issues and pull requests in the repository mentioned above, or
start a thread on the Trino Slack.&lt;/p&gt;

&lt;h3 id=&quot;pull-request-triage&quot;&gt;Pull request triage&lt;/h3&gt;

&lt;p&gt;One of the things we’ve been tracking over the last couple weeks has been the 
state of incoming PRs. We want to make sure that
each PR reaches a maintainer, and that they all receive timely feedback after 
asking for a review. The goal in looking into this process is to help 
streamline and improve the time-to-initial-comment. The pleasant discovery 
is that it doesn’t seem like we have a lot of room to improve on that front. Not
to pat ourselves on the back too heavily, but PRs find their way to maintainers,
and get an initial review quite quickly, and there’s little work to be done on
that front.&lt;/p&gt;

&lt;p&gt;Our next exploration is tracking PRs that don’t quickly get
approved and merged, and monitoring their life cycle and making sure follow-up
reviews are happening in a timely manner as well. We now know that we are
effective at giving initial feedback on a PR, but we also want to make sure that
these PRs aren’t falling off a cliff or turning into a long, drawn-out process
where each development iteration is slower than the last.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-pr-12259-support-updating-iceberg-table-partitioning&quot;&gt;Pull requests of the episode: PR 12259: Support updating Iceberg table partitioning&lt;/h2&gt;

&lt;p&gt;This months &lt;a href=&quot;https://github.com/trinodb/trino/issues/12259&quot;&gt;PR of the episode&lt;/a&gt; 
was contributed by &lt;a href=&quot;https://github.com/alexjo2144&quot;&gt;alexjo2144&lt;/a&gt;. This feature is
an exciting update on the ability to modify the partition specification of a 
table in Iceberg. This is an update since Brian 
&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;wrote about this feature&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;At the time of writing, Trino is able to perform reads from tables that have 
multiple partition spec changes but partition evolution write support does not
yet exist.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This brings us much closer to having more feature parity with other query 
engines to manage Iceberg tables entirely through Trino. Thanks to our friend 
&lt;a href=&quot;https://github.com/findinpath&quot;&gt;Marius Grama &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;findinpath&lt;/code&gt;&lt;/a&gt; for the review.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-iceberg-table-partition-migrations&quot;&gt;Demo of the episode: Iceberg table partition migrations&lt;/h2&gt;

&lt;p&gt;For this episode’s demo, you’ll need a local Trino coordinator, MinIO instance,
and Hive metastore backed by a database. Clone the 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started&quot;&gt;trino-getting-started&lt;/a&gt; 
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then 
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This demo is actually very similar to a demo we did in 
&lt;a href=&quot;/episodes/15.html&quot;&gt;episode 15&lt;/a&gt;, except now we get to showcase one of Iceberg’s
most exciting features, partition evolution.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * Make sure to first create a bucket names &quot;logging&quot; in MinIO before running
 */

CREATE SCHEMA iceberg.logging
WITH (location = &apos;s3a://logging/&apos;);

CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;day(event_time)&apos;]
);

/**
 * Inserting two records. Notice event_time is on the same day but different hours.
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 12:23:53.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;1 message&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 13:36:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;2 message&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Notice one partition was created for both records at the day granularity.
 */

/**
 * Update the partitioning from daily to hourly 🎉
 */
ALTER TABLE iceberg.logging.logs 
SET PROPERTIES partitioning = ARRAY[&apos;hour(event_time)&apos;];

/**
 * Inserting three records. Notice event_time is on the same day but different hours.
 */
INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;3 message&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
), 
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-01 15:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;4 message&apos;, 
  ARRAY [&apos;bad things could be happening&apos;]
), 
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;5 message&apos;, 
  ARRAY [&apos;bad things could be happening&apos;]
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;

/**
 * Now there are three partitions:
 * 1) One partition at the day granularity containing our original records.
 * 2) One at the hour granularity for hour 15 containing two new records.
 * 3) One at the hour granularity for hour 16 containing the last new record.
 */

SELECT * FROM iceberg.logging.logs 
WHERE event_time &amp;lt; timestamp &apos;2021-04-01 16:55:23&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;;

/**
 * This query correctly returns 4 records with only the first two partitions
 * being touched. 
 */

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There’s been a lot of cool things going into the Iceberg connector these days,
and another exciting one that came out in release 381 was the support for 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/12026&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; in Iceberg&lt;/a&gt;. So we’re 
gonna showcase that:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * Update
 */
UPDATE
  iceberg.logging.logs
SET
  call_stack = call_stack || &apos;WHALE HELLO THERE!&apos;
WHERE
  lower(level) = &apos;warn&apos;;

DROP TABLE iceberg.logging.logs;

DROP SCHEMA iceberg.logging;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-episode-can-i-force-a-pushdown-join-into-a-connected-data-source&quot;&gt;Question of the episode: Can I force a pushdown join into a connected data source?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.trinoforum.org/t/forcing-push-down-join-into-connected-data-source/177&quot;&gt;Full question from Trino Forum&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Is there a way to “quote” a sub query, to tell the Trino planner just pushdown 
the query and don’t bother making a sub plan?&lt;/p&gt;

&lt;p&gt;I have a star schema, with one huge table (&amp;gt;100M rows) and a dimension table 
that has static attributes of the huge table.
The dimension table is filtered to create a map, that is joined to the huge 
table. The result is group by on a dimension and finally some of the metrics 
from the huge table are aggregated to calculate stats.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; We’ve recently introduced Polymorphic Table Functions to Trino in 
version 381.&lt;/p&gt;

&lt;p&gt;In version 384, which was just released a few days ago, the query table function
was added in PR 12325.&lt;/p&gt;

&lt;p&gt;For a quick example in MySQL:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; USE mysql.tiny;
USE
trino:tiny&amp;gt; SELECT * FROM TABLE(system.query(query =&amp;gt; &apos;SELECT 1 a&apos;));
a
---
1
(1 row)

trino:tiny&amp;gt; SELECT * FROM TABLE(system.query(query =&amp;gt; &apos;SELECT @@version&apos;));
@@version
-----------
8.0.29
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So this will run exactly the command on the underlying database (not exactly a 
pushdown but a pass-through) and return the results to Trino as a Table. 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT @@version&lt;/code&gt; is MySQL specific syntax that returns MySQL output as a table
that now Trino is able to further process.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>Building A Modern Data Stack for QazAI</title>
      <link href="https://trino.io/blog/2022/06/08/building-a-modern-data-stack-for-qaz-ai.html" rel="alternate" type="text/html" title="Building A Modern Data Stack for QazAI" />
      <published>2022-06-08T00:00:00+00:00</published>
      <updated>2022-06-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/06/08/building-a-modern-data-stack-for-qaz-ai</id>
      <content type="html" xml:base="https://trino.io/blog/2022/06/08/building-a-modern-data-stack-for-qaz-ai.html">&lt;p&gt;At QazAI, we build data lakes as a service for companies.  In the original
architecture, we get raw data in S3, transform the S3 data with Hive, and then
delivered the data to business units via our datamart built on Clickhouse (for optimal delivery speeds). Over time, we were dragged down by the slower speeds and high costs of running Hive, and started shopping for a faster and cheaper open source engine to do our ETL data transformations.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;100%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/old-architecture.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows our existing stack. The big problem to solve was that the
Hadoop cluster was extremely inefficient. This leads to slow queries, and up
to 10x higher costs.&lt;/p&gt;

&lt;p&gt;Like many others, I was initially drawn to Trino to run analytics over Hive
tables because of its speed, but found many other advantages as well. Key among
them are the following characteristics.&lt;/p&gt;

&lt;h2 id=&quot;speed&quot;&gt;Speed&lt;/h2&gt;

&lt;p&gt;Queries ran 10 to 100 times faster, compared to our old stack. It was fantastic,
simply beyond our expectations.&lt;/p&gt;

&lt;h2 id=&quot;standard-sql&quot;&gt;Standard SQL&lt;/h2&gt;

&lt;p&gt;Standard SQL dialect that everyone already knew. Data analysts loved getting to
use a dialect they were already familiar with.&lt;/p&gt;

&lt;h2 id=&quot;federated-analytics&quot;&gt;Federated analytics&lt;/h2&gt;

&lt;p&gt;Ability to connect with other databases and run federated queries. After I had
connected all the available data sources, I showed the results to the data
analysts. They were simply amazed, some were shocked when the ‘join’ operation
between the tables of various databases had been completed successfully. To
emphasize - this saved days of work.  You could join data from other data
sources straight away, avoiding the need to create a staging layer in the data
warehouse.&lt;/p&gt;

&lt;h2 id=&quot;simplicity-of-setup&quot;&gt;Simplicity of setup&lt;/h2&gt;

&lt;p&gt;Trino just works out of the box. This is what makes it great. As open source
users, we’re used to going through a complicated software setup process. But
with Trino, there’s no need to deploy anything else. You simply install packages
from the open source repository, and things work. It’s magical. To top that off,
Trino feels like a commercial product with its detailed documentation and active
Slack community that is willing to help you out on everything.&lt;/p&gt;

&lt;h2 id=&quot;exploring-trino-as-an-option-for-etl&quot;&gt;Exploring Trino as an option for ETL&lt;/h2&gt;

&lt;p&gt;A great number of connectors, standard SQL, high processing speed - all these
advantages raise an obvious question: ‘Why not use Trino for ETL processes as
well?&lt;/p&gt;

&lt;p&gt;At QazAI, the key blocker to using Trino for ETL was that Trino doesn’t have
fault tolerance. As a result, our pipelines did not have reliable landing times,
and required a lot of manual monitoring.&lt;/p&gt;

&lt;p&gt;This is precisely what made Project Tardigrade so exciting for us. Proving that
Trino is indeed a true community-driven project, Trino community members have
embarked on the Tardigrade project. The main feature of this technology is the
ability to divide the query into phases, and restart the failed phases. We’ve
been running tests to explore this. The ETL pipeline on Trino running on 5 bare
metal nodes is 20 times faster compared to ETL running on the stack consisting
of Sqoop, HDFS, Hive, and custom Python scripts.&lt;/p&gt;

&lt;h2 id=&quot;testing-trino-for-etl&quot;&gt;Testing Trino for ETL&lt;/h2&gt;

&lt;p&gt;Let’s play a bit with the rental database called DVD.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/rentaldb-schema.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;For instance, we create the database shown above in PostgreSQL and work with the &lt;em&gt;rental&lt;/em&gt; table.&lt;/p&gt;

&lt;p&gt;First, we move the table from PostgreSQL to our warehouse in HDFS and Hive.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_rental&lt;/span&gt;  
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we perform the same operation but we use the table of Iceberg format on S3 with hidden partitioning.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_rental&lt;/span&gt;  
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partitioning&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;month(rental_date)&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;bucket(inventory_id, 10)&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we perform the same operation:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_staff&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;username&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;picture&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_customer&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;activebool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;create_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Great. What if there is a need to enrich the data with the employees’ and
clients’ names? To do this, we create a table, move it to the
core layer, and then apply denormalization.&lt;/p&gt;

&lt;p&gt;Here we move the measurements table.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_staff&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;username&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;picture&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_customer&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;activebool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;create_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;active&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s union the Staff and Customers tables.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_core_rental&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;PARQUET&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;--cast(customer_id as integer) as customer_id,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;--cast(staff_id as integer) as staff_id,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_rental&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_customer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_staff&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If this table is required by data analysts, then we can easily move it to the data mart (the Clickhouse layer we use to deliver data to end users).&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickhouse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_analysis_table&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;customer_name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;customer_lastname&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;staff_name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;staff_lastname&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;   
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;MergeTree&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;order_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;customer_name&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;customer_lastname&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A simple insert/select query and nothing more.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickhouse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_analysis_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dvd_core_rental&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Alternatively we can easily move the datamart to Clickhouse directly from PostgreSQL without intermediate data layers.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickhouse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental_analysis_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rental_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;inventory_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;return_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staff_lastname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last_update&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rental&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgresqldvd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rnt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staff_i&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Great.&lt;/p&gt;

&lt;p&gt;One may suggest that this sample dataset is a small one with only 16 000 rows.
The production ETL is mostly run over huge tables containing millions or
billions of rows.  Let’s test. We work with the &lt;em&gt;tpch&lt;/em&gt; database with the scaling
factor 3000.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/tpch-schema.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;For testing, we consider three tables: &lt;em&gt;lineitem&lt;/em&gt; (18 billion rows),
&lt;em&gt;orders&lt;/em&gt; (450 million rows) and &lt;em&gt;partsupp&lt;/em&gt; (2.4 billion rows).&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_customer&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;–&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;450&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sf3000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_lineitem&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;–&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sf3000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lineitem&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_partsupp&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;–&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sf3000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partsupp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, we try to join all three of these tables as it is shown in the ER diagram.
Let’s make it more challenging by turning off one of the workers, which should
result in a query failure. To enable the automatic query rerun of the failed one
we set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry_policy=QUERY&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config. properties&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf3000_lineitem_joined&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partkey&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;suppkey&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linenumber&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;quantity&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;extendedprice&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;discount&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tax&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;returnflag&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linestatus&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shipdate&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;commitdate&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;receiptdate&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shipinstruct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shipmode&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;comment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;availqty&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;supplycost&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shippriority&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf100000_lineitem&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf100000_partsupp&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;suppkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;psupp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;suppkey&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iceberg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch_sf100000_orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;litem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The query has been completed in 4 hours. Also, at query processing, worker 22
has been turned off. The query has been automatically started over and completed
successfully. At the query processing, three tables have been joined (&lt;em&gt;the
triple join&lt;/em&gt;): 18 billion rows x 2.4 billion rows x 450 million rows.&lt;/p&gt;

&lt;p&gt;This experiment gave us the confidence to move forward in our plans to rebuild
our architecture with Trino in order to perform analytical and transformational
manipulations upon data directly in S3, which will allow us to exclude HDFS and
Hive interference in these processes.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;100%&quot; src=&quot;/assets/blog/qaz-ai-modern-data-stack/new-architecture.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;As a result we will achieve faster pipelines.&lt;/p&gt;

&lt;p&gt;A huge thanks to the Trino development team and the Trino community for an
excellent product, which I enjoy using and allows me to go beyond conventional
usage patterns.&lt;/p&gt;

&lt;p&gt;If you are looking for help building your data warehouse, or if you’re
interested in joining us at QazAI, feel free to reach out to me at Baurzhan Kuspayev on the &lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-1aek3l6bn-ZMsvFZJqP1ULx5pU17WP1Q&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note from Trino community&lt;/em&gt;: We welcome blog submissions from the community. If you have blog ideas, please send a message in the #dev chat. We will mail you Trino swag as a token of appreciation for successful submissions. &lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-1aek3l6bn-ZMsvFZJqP1ULx5pU17WP1Q&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://cutt.ly/qaz-ai-trino-reddit&quot;&gt;Discuss on Reddit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://news.ycombinator.com/item?id=31672725&quot;&gt;Discuss On Hacker News&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Baurzhan Kuspayev</name>
        </author>
      

      <summary>At QazAI, we build data lakes as a service for companies. In the original architecture, we get raw data in S3, transform the S3 data with Hive, and then delivered the data to business units via our datamart built on Clickhouse (for optimal delivery speeds). Over time, we were dragged down by the slower speeds and high costs of running Hive, and started shopping for a faster and cheaper open source engine to do our ETL data transformations.</summary>

      
      
    </entry>
  
    <entry>
      <title>An opinionated guide to consolidating our data</title>
      <link href="https://trino.io/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data.html" rel="alternate" type="text/html" title="An opinionated guide to consolidating our data" />
      <published>2022-05-24T00:00:00+00:00</published>
      <updated>2022-05-24T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data</id>
      <content type="html" xml:base="https://trino.io/blog/2022/05/24/an-opinionated-guide-to-consolidating-our-data.html">&lt;h2 id=&quot;maximizing-your-experience-with-zero-choices&quot;&gt;Maximizing your experience with zero choices.&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;I’m publishing this blog post in partnership with the Trino community to go
along a lightning talk I’m giving for their event, Cinco de Trino. This article
was originally published &lt;a href=&quot;https://abhi-vaidyanatha.medium.com/an-opinionated-guide-to-consolidating-your-data-b09386b2b9b5&quot;&gt;on Abhi’s Medium
site&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“My data is all over the place and attempting to analyze or query it is not
only time consuming and expensive, but also emotionally taxing.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;Maybe you haven’t heard those exact words before, but data consolidation is a
real problem. It is common for organizations to have correlated data stored in
various silos or APIs. Performing consistent operations across these various
data sources requires understanding both architecture and surgery, skills that
you may not have picked up as a data practitioner. If you’re part of the Trino
community and are reading this post, you’ve likely encountered unperformant
queries due to unconsolidated data.&lt;/p&gt;

&lt;p&gt;In the past, the data engineering world was not graced with the same level of
love and &lt;a href=&quot;https://tailwindcss.com/&quot;&gt;tooling&lt;/a&gt; as other communities, so we were
expected to make do with whatever came our way. In order to perform the wildly
basic task of moving our data around, we were asked to tithe large sums of money
to the closed-source ELT overlords.&lt;/p&gt;

&lt;p&gt;So where does that leave us? Thankfully things have changed, so here’s how you
can move all your data to a central location for free (well, minus the
infrastructure costs) while making few architectural choices.&lt;/p&gt;

&lt;h2 id=&quot;the-tool&quot;&gt;The tool&lt;/h2&gt;
&lt;p&gt;You don’t have too many choices for FOSS ELT/ETL.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://airbyte.com/&quot;&gt;Airbyte&lt;/a&gt; has been recently making waves as the main
contender for open-source ELT. As of writing this article, it’s only been around
for about two years, during which its established itself as one of the fastest
growing startups in existence. It requires three terminal commands to deploy and
is managed entirely through a UI, so it’s operable by many. It also supports
syncing your data incrementally, so you don’t need to resync existing data when
you want to sync new data. It is relatively new, so some of the polish that
comes with an established project is not there yet. Think of it like a
precocious child.&lt;/p&gt;

&lt;p&gt;You could use &lt;a href=&quot;https://meltano.com/&quot;&gt;Meltano&lt;/a&gt; to take advantage of the large
&lt;a href=&quot;https://www.singer.io/&quot;&gt;Singer&lt;/a&gt; connector ecosystem, but it’s more complicated
to set up and is more of a holistic ops platform, which may be excessive for
your use case.&lt;/p&gt;

&lt;p&gt;You could also use this esoteric project called KETL that is only available at
this sketchy SourceForge &lt;a href=&quot;https://sourceforge.net/projects/ketl/&quot;&gt;link&lt;/a&gt;. But
maybe don’t do that.&lt;/p&gt;

&lt;p&gt;For consolidating your data, use Airbyte. It’s straightforward to setup,
requires minor configuration, and has tightly scoped responsibilities.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/640/1*zqLMo7P3o_HG7EJ2E1dbpg.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-destination&quot;&gt;The destination&lt;/h2&gt;

&lt;p&gt;Let’s use a data lake. Its unstructured nature leaves more flexibility for
purpose and we’ll assume that our data has not been processed or filtered yet.&lt;/p&gt;

&lt;p&gt;Data warehouses are more expensive, require more upkeep, and benefit from the
ETL paradigm as opposed to ELT. Airbyte is an ELT tool focused mostly on the EL
bit, which makes it easier to use with the unstructured data lakes.&lt;/p&gt;

&lt;p&gt;Additionally, S3 supports query engines such as Trino, which will allow us to
query and analyze our data once its been consolidated. Trino also functions as a
powerful data lake transformation engine, so if you’re on the fence due to data
malleability, this might help bring you over.&lt;/p&gt;

&lt;p&gt;We could use Azure Blob Storage or GCS, but for this tutorial, I’ll be keeping
it simple with Amazon S3. If you’ve set up an S3 bucket and IAM, skip the next
paragraph.&lt;/p&gt;

&lt;p&gt;Create a S3 bucket with default settings and grab an access key from IAM. To do
this, head to the top right of the screen in the AWS Management Console where it
says your email provider and then click on &lt;strong&gt;Security Credentials&lt;/strong&gt;. Click
&lt;strong&gt;Create New Access Key&lt;/strong&gt; and save that information for later.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1202/1*mYeldXLcvi7iPBDZ1GKEug.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-deployment&quot;&gt;The deployment&lt;/h2&gt;

&lt;p&gt;Today, we’ll be deploying Airbyte locally on a workstation. Alternatively, you
can deploy it on your own infrastructure, but this requires managing networking
and security, which is unpalatable for a quick demonstration. If you want your
syncs to continue running in perpetuity, you’ll want to deploy Airbyte
externally to your machine. For a guide to deploying Airbyte on EC2 click
&lt;a href=&quot;https://docs.airbyte.com/deploying-airbyte/on-aws-ec2&quot;&gt;here&lt;/a&gt;. For a guide to
deploying Airbyte on Kubernetes, click
&lt;a href=&quot;https://docs.airbyte.com/deploying-airbyte/on-plural&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To begin, install &lt;a href=&quot;https://www.docker.com/products/docker-desktop/&quot;&gt;Docker&lt;/a&gt; and
docker-compose on your workstation.&lt;/p&gt;

&lt;p&gt;Then clone the repository and spin up Airbyte with docker-compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:airbytehq/airbyte.git
cd airbyte
docker-compose up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once you see the following banner, you’re good to go.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1148/1*7Fg7Vwi5vgkg94SYRuACLQ.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-data-sources&quot;&gt;The data sources&lt;/h2&gt;

&lt;p&gt;Head over to localhost:8000 on your machine, complete the sign-up flow, and
you’ll be greeted with an onboarding workflow. We’re going to skip this workflow
to emulate a traditional usage of Airbyte. Click on the Sources tab in the left
sidebar and click on +New Source. This is where we’ll be setting up all of our
disparate data sources.&lt;/p&gt;

&lt;p&gt;Search for your data sources in the drop down and fill out the required
configuration. If you’re having trouble setting up a particular data source,
head to the &lt;a href=&quot;https://docs.airbyte.com/&quot;&gt;Airbyte docs&lt;/a&gt;. There’s a dedicated page
for every connector; for example, this is the &lt;a href=&quot;https://docs.airbyte.com/integrations/sources/google-analytics-v4&quot;&gt;setup
guide&lt;/a&gt; for
the Google Analytics source. If you’re just testing Airbyte out, use the PokeAPI
source, as it lets you sync dummy data with no authentication. If your required
data source doesn’t exist, you can request it
&lt;a href=&quot;https://airbyte.com/connector-requests&quot;&gt;here&lt;/a&gt; or build it yourself by heading
&lt;a href=&quot;https://docs.airbyte.com/connector-development/&quot;&gt;here&lt;/a&gt; (isn’t open-source
great?)&lt;/p&gt;

&lt;p&gt;Once you have all of your data sources set up, it will look something like this.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*6_sNtdhFKkSnicyqe2Hhmg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Now we just need to set up our connection to S3 and we are good to go.&lt;/p&gt;

&lt;h2 id=&quot;the-destination-again&quot;&gt;The destination (again)&lt;/h2&gt;

&lt;p&gt;Head over to the &lt;em&gt;Destinations&lt;/em&gt; tab in the left sidebar and follow the same
process for setting up our connection to S3. Click on &lt;em&gt;+New Destination&lt;/em&gt; and
search for S3. Then fill out the configuration for your bucket. We’ll now use
that access key that we generated earlier!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*24LRs9-dB7l35DgsXU6pqQ.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;For output format, I recommend using Parquet for analytics purposes. It’s a
&lt;a href=&quot;https://www.qubole.com/tech-blog/columnar-format-in-data-lakes-for-dummies/&quot;&gt;columnar storage
format&lt;/a&gt;,
which is optimized for reads. JSON, CSV, and Avro are supported, but will be
less performant on read.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*tVw2sbTLYDlHpKB97M7cKg.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-connection&quot;&gt;The connection&lt;/h2&gt;

&lt;p&gt;Finally, head over to the &lt;strong&gt;Connections&lt;/strong&gt; tab in the sidebar and click &lt;strong&gt;+New
Connection&lt;/strong&gt;. You will need to do this process for each data source that you
have set up. Select any existing source and click your S3 Destination that you
set up from the drop down. I failed to set up a connection with my GitHub
source, so I navigated to the Airbyte Troubleshooting Discourse and filed an
issue. Response times are really fast there, so I’ll likely be able to resolve
this within a day or two.&lt;/p&gt;

&lt;p&gt;You will then be greeted with the following connection setup page. For most
analytics jobs, syncing more frequently than every 24 hours is expensive and
overkill, so stick with the default. For sources that support it, click on the
sync mode in the streams table to use the &lt;strong&gt;Incremental / Append&lt;/strong&gt; sync mode.
This ensures that every time you sync, Airbyte will check for new data and only
pull in data that you haven’t synced before.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*FZyFWtb3P4sqO77p-WZjAw.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Once you hit &lt;strong&gt;Set up connection&lt;/strong&gt;, Airbyte will run your first sync! You can
click into your connection to get access to the sync logs, replication settings,
and transformation settings if supported.&lt;/p&gt;

&lt;p&gt;Checking our S3 bucket, we can see that our data has successfully reached! If
you’re just testing things out, you’re done.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;https://miro.medium.com/max/1400/1*qrEc7u2hiUUZv4TO5qOv6A.png&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;the-analysis&quot;&gt;The analysis&lt;/h2&gt;

&lt;p&gt;Now that you’ve set up your data pipelines, if you want to run transformation
jobs, Trino enables that use case well — Lyft, Pinterest, and Shopify have all
done this to great success. There’s also a &lt;a href=&quot;https://github.com/starburstdata/dbt-trino&quot;&gt;dbt-trino
plugin&lt;/a&gt; managed by the folks over at
Starburst. Alternatively, you could also accomplish this using &lt;a href=&quot;https://docs.aws.amazon.com/AmazonS3/latest/userguide/tutorial-s3-object-lambda-uppercase.html&quot;&gt;S3 Object
Lambda&lt;/a&gt;
if you want to stay within the AWS landscape when possible.&lt;/p&gt;

&lt;p&gt;Once your data is in a queryable state, you can now use
&lt;a href=&quot;https://trino.io/docs/current/connector/hive-s3.html&quot;&gt;Trino&lt;/a&gt; or your favorite
query engine to your heart’s content! If you want to get started with querying
these heterogenous data sources using Trino, here’s a &lt;a href=&quot;https://janakiev.com/blog/presto-trino-s3/&quot;&gt;getting-started
guide&lt;/a&gt; on how to do that. Finally,
join the &lt;a href=&quot;https://airbyte.com/community&quot;&gt;Airbyte&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/community.html&quot;&gt;Trino&lt;/a&gt; communities to find more about how
others are consolidating and querying their data.&lt;/p&gt;</content>

      
        <author>
          <name>Abhi Vaidyanatha</name>
        </author>
      

      <summary>Maximizing your experience with zero choices. I’m publishing this blog post in partnership with the Trino community to go along a lightning talk I’m giving for their event, Cinco de Trino. This article was originally published on Abhi’s Medium site “My data is all over the place and attempting to analyze or query it is not only time consuming and expensive, but also emotionally taxing.”</summary>

      
      
    </entry>
  
    <entry>
      <title>36: Trino plans to jump to Java 17</title>
      <link href="https://trino.io/episodes/36.html" rel="alternate" type="text/html" title="36: Trino plans to jump to Java 17" />
      <published>2022-05-19T00:00:00+00:00</published>
      <updated>2022-05-19T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/36</id>
      <content type="html" xml:base="https://trino.io/episodes/36.html">&lt;h2 id=&quot;releases-379-to-381&quot;&gt;Releases 379 to 381&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-379.html&quot;&gt;Trino 379&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New MariaDB connector&lt;/li&gt;
  &lt;li&gt;Performance improvements for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for Google Cloud Storage in the Delta Lake connector&lt;/li&gt;
  &lt;li&gt;Support for Pinot 0.10&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-380.html&quot;&gt;Trino 380&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Update Cassandra connector to support v5 and v6 protocols.&lt;/li&gt;
  &lt;li&gt;Rename properties controlling Hive view parsing.&lt;/li&gt;
  &lt;li&gt;Allow changing file and table format with the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add support for bulk data insertion in SQL Server connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-381.html&quot;&gt;Trino 381&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Experimental support for table functions.&lt;/li&gt;
  &lt;li&gt;Support for exchange spooling on Azure Blob Storage.&lt;/li&gt;
  &lt;li&gt;Support reading snapshot tables and materialized views in BigQuery connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Next is exchange spooling on &lt;a href=&quot;https://github.com/trinodb/trino/pull/12360&quot;&gt;Google Cloud Storage&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Framework for table functions is in place, implementations in connectors are coming.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ldap.ssl-trust-certificate&lt;/code&gt; as legacy config removes upgrade failures.&lt;/li&gt;
  &lt;li&gt;Introduce the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;least-waste&lt;/code&gt; low memory task killer policy.&lt;/li&gt;
  &lt;li&gt;Disable auto-suggestion in CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-379.html&quot;&gt;Trino 379&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-380.html&quot;&gt;Trino 380&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-381.html&quot;&gt;Trino 381&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;cinco-de-trino-recap-blog-post&quot;&gt;Cinco de Trino recap blog post&lt;/h3&gt;

&lt;p&gt;Check out this blog post that details all the cool talks that took place at 
&lt;a href=&quot;/blog/2022/05/17/cinco-de-trino-recap.html&quot;&gt;Cinco de Trino&lt;/a&gt; and
includes video resources. This was a mini version of the Trino Summit which will
take place later this year.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-will-trino-be-making-a-vectorized-c-version-of-trino-workers&quot;&gt;Question of the episode: Will Trino be making a vectorized C++ version of Trino workers?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/CFLB9AMBN/p1638450883102500&quot;&gt;Full question from Trino Slack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; Writing a C++ worker would require each plugin to be implemented in
C++ as well. However, you don’t need C++ for vectorization. Java already does a
technique called &lt;a href=&quot;https://web.archive.org/web/20211111020334/http://daniel-strecker.com/blog/2020-01-14_auto_vectorization_in_java/&quot;&gt;auto-vectorization&lt;/a&gt;
which we will demonstrate later in the show! Java 17 also introduces the new 
&lt;a href=&quot;https://openjdk.java.net/jeps/414&quot;&gt;Vector API&lt;/a&gt; which unlocks complex usage 
patterns that we can invest in moving forward. However, there’s so much more to
making operations faster than just bare metal speed that we are going to focus
on.&lt;/p&gt;

&lt;p&gt;To demonstrate this, I’d like to use an analogy about how I think of this. 
Comparing C++ and Java implementation is like comparing the two fastest men in 
the world. Usain Bolt holds the most world records for mens track to this
date, and teammate Yohan Blake holds many of the second place titles. Most of us
know Usain Bolt is the fastest of the two, and you may not have known or
remembered Yohan’s name before. Want to hear something crazy, Yohan has beaten 
Usain Bolt in a few races. The two are so close in speed, it’s seconds to 
milliseconds difference. The main difference in this analogy is that speed is
the only thing that matters in an olymic race. Howver, programming languages and
frameworks have a lot more tradeoffs.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/usain-bolt-yohan-blake.webp&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The point is, Java is fast and more importantly, it removes a lot of burden
maintaining and scaling out the code. This is conducive to a healthy open-source
project, and lowers the barrier for collaboration. Rather than go against this 
and take on the feat of having to rewrite an entire system in C++, why not lean
into the incredible innovation recent Java features have to offer to improve
performance even more.&lt;/p&gt;

&lt;p&gt;Another important aspect is rather than chasing the fastest bare metal speed,
it’s also incredibly important to dedicate time into ensuring that Trino’s
optimizer is producing the best possible plans to avoid doing unnecessary work.
To continue with the analogy, in a 100m race on a 400m track, imagine we have
Usain and Yohan go head to head. We may expect that Usain will likely win, given
his track record. However, if Usain is given the wrong instructions and runs in
the wrong direction (300m), my bets are that Yohan will win the race.&lt;/p&gt;

&lt;p&gt;In essence, the direction of Trino while still including bare metal performance
improvements in the JVM, will instead focus on not wasting time with suboptimal
query plans before or during runtime. There are so many optimizations that are
constantly being added to every release that ultimately makes for a 
work-smarter-not-harder query engine.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-java-17-and-rearchitecting-trino&quot;&gt;Concept of the episode: Java 17 and rearchitecting Trino&lt;/h2&gt;

&lt;p&gt;As Trino prepares to &lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;update to Java 17&lt;/a&gt;,
we wanted to give a glimpse at what has happened between the current required
JDK version, JDK 11, and future version JDK 17. Both of these versions are
long-term support versions, and in the four years from 11 to 17 
&lt;a href=&quot;https://openjdk.java.net/projects/jdk/17/jeps-since-jdk-11&quot;&gt;a lot of exciting improvements were added&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;java-17-updates&quot;&gt;Java 17 updates&lt;/h3&gt;

&lt;p&gt;Here are some &lt;a href=&quot;https://openjdk.java.net/projects/jdk/17/jeps-since-jdk-11&quot;&gt;updates coming up in Java 17&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;performance&quot;&gt;Performance&lt;/h4&gt;

&lt;p&gt;There were several JDK Enhancement Proposals (JEP) that improve performance as
well as many small changes to the JVM:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/339&quot;&gt;JEP 339&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/352&quot;&gt;JEP 352&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/356&quot;&gt;JEP 356&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/387&quot;&gt;JEP 387&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/412&quot;&gt;JEP 412&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Performance is a multifaceted topic that includes factors like throughput, 
latency, memory footprint, startup, ramp up, pause times, and shut down time.&lt;/p&gt;

&lt;p&gt;You can used standardized benchmarks like 
&lt;a href=&quot;https://www.spec.org/jbb2015/&quot;&gt;SPECjbb® 2015&lt;/a&gt; to test a Java application in 
most of these performance factors. Aside from the formalized benchmarks, it’s 
interesting to see the Java community come up with microbenchmarks to test 
relative speedups of JVMs on their own applications.
&lt;a href=&quot;https://www.optaplanner.org/blog/2021/09/15/HowMuchFasterIsJava17.html&quot;&gt;This user benchmark&lt;/a&gt;
found an 8.66% improvement in speed when using hte G1 garbage collector. They
isolated modules of their application to measure each microbenchmark separately.&lt;/p&gt;

&lt;p&gt;Martin did a similar test late last year, and reported anywhere from 10-15% 
improvement in speed in Java 17 using the G1 garbage collector. This is an 
exciting development and we hope to publish more about this as we get closer to 
updating.&lt;/p&gt;

&lt;h4 id=&quot;garbage-collectors&quot;&gt;Garbage collectors&lt;/h4&gt;

&lt;p&gt;Although garbage collectors are performance enhancements in their own right, 
there is a lot of exciting changes around garbage collectors in Java 17 since 
Java 11 which earns garbage collectors their own section.&lt;/p&gt;

&lt;p&gt;First not one, but two concurrent garbage collectors have made their way out of
incubation, and are ready for use.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/377&quot;&gt;JEP 377: ZGC: A Scalable Low-Latency Garbage Collector&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/379&quot;&gt;JEP 379: Shenandoah: A Low-Pause-Time Garbage Collector&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aside from that, there are a bunch of big improvements to G1.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/344&quot;&gt;JEP 344: Abortable Mixed Collections for G1&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/345&quot;&gt;JEP 345: NUMA-Aware Memory Allocation for G1&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/346&quot;&gt;JEP 346: Promptly Return Unused Committed Memory from G1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;fantastic writeup and benchmark&lt;/a&gt;
by Stefan Johansson, they ran the &lt;a href=&quot;https://www.spec.org/jbb2015/&quot;&gt;SPECjbb® 2015&lt;/a&gt;
to evaluate the improvements of different garbage collectors over different LTS
versions.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/throughput.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/latency.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Pay attention to this chart, as it showcases the advantage of having a 
concurrent garbage collector like ZGC or Shenandoah that doesn’t interfere with
your application code. It’s incredible that 99% of the GC operations only took 
0.1ms. Wild!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/p99-pause.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/footprint.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html&quot;&gt;Stefan Johansson&apos;s Blog&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Take particular note of the massive improvement of G1. This is especially 
exciting because G1 is recommended for Trino usage. It’s still too early to 
determine if ZGC or Shenendoah will have overall better performance depending on
the context in which the JVM is running. One thing to look forward to is the 
incredible drop in memory footprint over the different versions!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/36/g1-memory-footprint.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://www.youtube.com/watch?v=0BpY132mKm0&quot;&gt;Java YouTube Channel&lt;/a&gt;
&lt;/p&gt;

&lt;h4 id=&quot;vector-api-2nd-incubator-status&quot;&gt;Vector API (2nd incubator status)&lt;/h4&gt;

&lt;p&gt;One available capability that is still incubating is the 
&lt;a href=&quot;https://openjdk.java.net/jeps/414&quot;&gt;Vector API&lt;/a&gt;. Trino currently takes advantage
of the auto-vectorization that comes for free when the compiler detects that a
loop like this one used from Daniel Strecker’s
&lt;a href=&quot;https://web.archive.org/web/20211111020334/http://daniel-strecker.com/blog/2020-01-14_auto_vectorization_in_java/&quot;&gt;auto-vectorization blog&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/**
 * Run with this command to show native assembly:&amp;lt;br/&amp;gt;
 * Java -XX:+UnlockDiagnosticVMOptions
 * -XX:CompileCommand=print,VectorizationMicroBenchmark.square
 * VectorizationMicroBenchmark
 */&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;VectorizationMicroBenchmark&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;square&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// line 11&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// repeatedly invoke the method under test. this&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// causes the JIT compiler to optimize the method&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;square&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Without auto-vectorization, a command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vmulss&lt;/code&gt; (multiply scalar 
single-precision) versus with auto-vectorization the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vmulps&lt;/code&gt; (multiply packed
single-precision) which is a SIMD instruction the JIT compiler updated for us
without manual intervention.&lt;/p&gt;

&lt;p&gt;However, this isn’t always so straightforward to detect. As you can see from the
comments in the example, special criteria need to be met. For this, you can use
the Vector API to directly interface with SIMD and GPU instructions. We will 
show more on this in the demo.&lt;/p&gt;

&lt;h4 id=&quot;language-features&quot;&gt;Language features&lt;/h4&gt;

&lt;p&gt;Beyond the performance improvements, Java 17 includes some exciting new Java 
language updates and improvements. While some may not consider this as exciting
as performance boosts, language enhancements make it easier to write higher 
quality and maintainable code. This is especially important for an open source 
project that is maintained by many individuals.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A very useful change for Trino is the new support for 
&lt;a href=&quot;https://openjdk.java.net/jeps/378&quot;&gt;multiline text blocks&lt;/a&gt;. This allows you to 
go from having to write a SQL query represented in a one-dimensional string 
literal like this:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  String query = &quot;SELECT \&quot;emp_id\&quot;, \&quot;last_name\&quot; FROM \&quot;employee\&quot;\n&quot; +
                 &quot;WHERE \&quot;city\&quot; = &apos;Indianapolis&apos;\n&quot; +
                 &quot;ORDER BY \&quot;emp_id\&quot;, \&quot;last_name\&quot;;\n&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;to a much more readable two-dimensional string block like this:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  String query = &quot;&quot;&quot;
                 SELECT &quot;emp_id&quot;, &quot;last_name&quot; FROM &quot;employoee&quot;
                 WHERE &quot;city&quot; = &apos;Indianapolis&apos;
                 ORDER BY &quot;emp_id&quot;, &quot;last_name&quot;;
                 &quot;&quot;&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The new &lt;a href=&quot;https://openjdk.java.net/jeps/361&quot;&gt;switch expressions&lt;/a&gt; remove the
difficult-to-read syntax of switches that led to many bugs and confusing code
in the past. Particularly the ambiguity of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;break;&lt;/code&gt; statement logic:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  switch (day) {
      case MONDAY:
      case FRIDAY:
      case SUNDAY:
          System.out.println(6);
          break;
      case TUESDAY:
          System.out.println(7);
          break;
      case THURSDAY:
      case SATURDAY:
          System.out.println(8);
          break;
      case WEDNESDAY:
          System.out.println(9);
          break;
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;is made much easier to reason about using a functional clause to define the
  correct code to execute for a set of labels:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  switch (day) {
      case MONDAY, FRIDAY, SUNDAY -&amp;gt; System.out.println(6);
      case TUESDAY                -&amp;gt; System.out.println(7);
      case THURSDAY, SATURDAY     -&amp;gt; System.out.println(8);
      case WEDNESDAY              -&amp;gt; System.out.println(9);
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Always having to do a cast after checking for a type has always been an
annoyance to many Java developers. 
&lt;a href=&quot;https://openjdk.java.net/jeps/394&quot;&gt;Pattern Matching for instanceof&lt;/a&gt; makes this
go away. Look at this example you may be familiar with:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  if (obj instanceof String) {
      String s = (String) obj;    // grr...
      ...
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;Now imagine, you don’t have to have a cast statement for every one of these
  laying around in your codebase:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  if (obj instanceof String s) {
      // Let pattern matching do the work!
      ...
  }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://openjdk.java.net/jeps/358&quot;&gt;Helpful NullPointerExceptions&lt;/a&gt; are
particularly exciting as the ever confusing nulls for no reason don’t come up, 
and require you to chase down where it happened in the code. Instead there is
new information added to the message that ideally gives you a more unique
message.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;rearchitecting-trino&quot;&gt;Rearchitecting Trino&lt;/h3&gt;

&lt;p&gt;With all these exciting changes, what does this mean for Trino? Let’s first dive 
into the thing that many of our users dread…upgrading.&lt;/p&gt;

&lt;h4 id=&quot;upgrade-to-java-17-when-its-time&quot;&gt;Upgrade to Java 17 (When it’s time)&lt;/h4&gt;

&lt;p&gt;As mentioned before, Java 17 is the current LTS version, following Java 11. Java
17 provides significant improvements that we outlined before. We believe that 
once we update, everyone should be running version 17 to get the best experience
out of Trino. Moving to Java 17 allows us to take advantage of many improvements
to the JDK and the Java language that were introduced since Java 11. There are 
some reasons people say they can’t update.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Updating Java in all the clients and code that calls Trino is tedious.&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;You luckily only need to update the server that Trino is running on. The
 client or CLI can still run any version of Java.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;There are conflicting Java versions on the node Trino servers run.&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;If you are running another application depending on Java you shouldn’t be.
 Ideally Trino runs on its own servers. If there’s a smaller application to,
 for example, monitor Trino, then you should be able to install a separate
 version of Trino.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;There is a company policy requiring specific JDKs be installed on all 
 servers.&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;You can have side-by-side installs of multiple versions of the JDK and use 
 the appropriate one. You just need to launch Trino with the correct Java&lt;br /&gt;
 command. If your company is against using a newer JDK, you can point out the
 arguments above to update the policy to at least include JDK17.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id=&quot;iterating-and-improving-trino&quot;&gt;Iterating and improving Trino&lt;/h4&gt;

&lt;p&gt;We’re also in the process of revamping the core execution engine, which 
enables us to implement the following improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Perform adaptive evaluation of expressions based on runtime cost.&lt;/li&gt;
  &lt;li&gt;Specialize evaluation for different data encodings (rle, dictionary, etc).&lt;/li&gt;
  &lt;li&gt;Implement tighter evaluation loops that make it easier for the VM to vectorize
automatically and generate better machine code.&lt;/li&gt;
  &lt;li&gt;Implement evaluation of certain operations more efficiently by taking 
advantage of SIMD or GPU-based processing.&lt;/li&gt;
  &lt;li&gt;Columnar evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;project-hummingbird&quot;&gt;Project Hummingbird&lt;/h4&gt;

&lt;p&gt;Just as we did with the efforts around 
&lt;a href=&quot;/blog/2022/05/05/tardigrade-launch.html&quot;&gt;Project Tardigrade&lt;/a&gt; we
want to centralize these efforts under a project name that includes a set
of motivated community members and give it a cool name.&lt;/p&gt;

&lt;p&gt;After some discussion, we would like to announce &lt;em&gt;*Project Hummingbird*&lt;/em&gt; is the
new banner for the efforts around improving performance and concentrated updates
to the core of Trino.&lt;/p&gt;

&lt;p&gt;We chose hummingbirds as mascots because they are adaptive, light, and fast. 
Hummingbirds are the only birds with the incredible capability to fly in any 
direction and are super fast. It made sense as Trino evolves into a query engine
that is capable of adapting to its environment during query runtime, it is akin
to these agile and beautiful creatures.&lt;/p&gt;

&lt;h4 id=&quot;vectorization-is-not-a-silver-bullet&quot;&gt;Vectorization is not a silver bullet&lt;/h4&gt;

&lt;p&gt;There are many ways to parallelize the operations that we run on the Trino
server. There’s inter-node parallelization which split data to be operated on
across nodes. There’s intra-node parallelization, which generally refers to
multithreading across a CPU.&lt;/p&gt;

&lt;p&gt;As we start to move towards vectorizations, we start to become hardware 
dependent and just like with any other hardware setting, your mileage may vary
depending on the limitations of the resources Trino is running on.&lt;/p&gt;

&lt;p&gt;Further, any time parallelization is applied, there is generally some overhead 
to coordinate lookups, shuffling more data across processors, etc..&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-pr-4649-disable-jit-byte-code-recompilation-cutoffs-in-default-jvmconfig&quot;&gt;Pull requests of the episode: PR 4649: Disable JIT byte code recompilation cutoffs in default jvm.config&lt;/h2&gt;

&lt;p&gt;This episodes &lt;a href=&quot;https://github.com/trinodb/trino/pull/4649&quot;&gt;pull request&lt;/a&gt; was
added by &lt;a href=&quot;https://github.com/shubhamtagra&quot;&gt;Shubham Tagra&lt;/a&gt; to increase the amount
of memory needed to avoid JIT recompilation cutoffs for large methods in the
JVM. If these limits are hit, the JIT compiler calls an uncommon_trap to
deoptimize the code. If the function is continually retried, continuous deopt or
a “deopt storm” can occur, and can cause a large CPU loss. The handling of this
is actually a bug in the JVM so this pull request provided a workaround.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This had been reported by multiple companies from 
&lt;a href=&quot;/blog/2021/10/06/jvm-issues-at-comcast.html&quot;&gt;Comcast&lt;/a&gt; to 
&lt;a href=&quot;https://shopify.engineering/faster-trino-query-execution-infrastructure&quot;&gt;Shopify&lt;/a&gt; 
that had these “random slowness” issues that were resolved when these JVM
settings were added.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-fizzbuzz---simd-style&quot;&gt;Demo of the episode: FizzBuzz - SIMD style!&lt;/h2&gt;

&lt;p&gt;Today I’m stealing, no wait, borrowing a project created by our friend 
&lt;a href=&quot;https://twitter.com/gunnarmorling&quot;&gt;Gunnar Morling&lt;/a&gt;. This showcases the well
known &lt;a href=&quot;https://www.morling.dev/blog/fizzbuzz-simd-style/&quot;&gt;FizzBuzz&lt;/a&gt; game, but
programmatically generates the resulting patterns from the game.&lt;/p&gt;

&lt;p&gt;Make sure you &lt;a href=&quot;https://stackoverflow.com/questions/52524112&quot;&gt;install JDK 17&lt;/a&gt; 
before running this code.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/simd-fizzbuzz.git

mvn clean verify

java --add-modules=jdk.incubator.vector -jar target/benchmarks.jar -f 1 -wi 5 -i 5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and Documentation&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://openjdk.java.net/projects/jdk/17/jeps-since-jdk-11&quot;&gt;JEPs in JDK 17 integrated since JDK 11&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://shopify.engineering/faster-trino-query-execution-infrastructure&quot;&gt;Shopify’s Path to a Faster Trino Query Execution: Infrastructure&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=yQqBqix7yTA&quot;&gt;Vector API and Record Serialization&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=1JeoNr6-pZw&quot;&gt;The Vector API in JDK 17&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=0BpY132mKm0&quot;&gt;JDK 8 to JDK 18 in Garbage Collection: 10 Releases, 2000+ Enhancements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=e2lXj_t7ZBc&quot;&gt;Concurrent Garbage collectors: ZGC &amp;amp; Shenandoah&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Releases 379 to 381</summary>

      
      
    </entry>
  
    <entry>
      <title>Cinco de Trino recap: Learn how to build an efficient data lake</title>
      <link href="https://trino.io/blog/2022/05/17/cinco-de-trino-recap.html" rel="alternate" type="text/html" title="Cinco de Trino recap: Learn how to build an efficient data lake" />
      <published>2022-05-17T00:00:00+00:00</published>
      <updated>2022-05-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/05/17/cinco-de-trino-recap</id>
      <content type="html" xml:base="https://trino.io/blog/2022/05/17/cinco-de-trino-recap.html">&lt;p&gt;When Trino (formerly PrestoSQL) arrived on the scene almost 10 years ago, it
immediately became known as the much faster alternative to the data warehouse
of big data, Apache Hive. The use cases that you, as the community, have built
had far exceeded anything we had imagined in complexity. Together we’ve made 
Trino not only the fastest way to interactively query large data sets, but also
a convenient way to run federated queries across data sources to make moving all
the data optional.&lt;/p&gt;

&lt;p&gt;At Cinco de Trino, we came full circle back to the next iteration of analytics 
architecture with the data lake.  This conference offers advice from industry 
thought leaders about how to use best lakehouse tools with Trino to manage that 
data complexity. Hear from industry thought leaders like Martin Traverso 
(Trino), Dain Sundstrom (Trino), James Campbell (Great Expectations), Jeremy 
Cohen (DBT Labs), Ryan Blue (Iceberg), Denny Lee (Delta Lake), Vinoth Chandar 
(Hudi). You can watch the talks on-demand on the 
&lt;a href=&quot;https://www.youtube.com/playlist?list=PLFnr63che7wYDHjUsmp43THLmAlqPDHlM&quot;&gt;Cinco de Trino playlist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this post, I’d like to cover the key items from each talk you won’t want to 
miss.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h3 id=&quot;keynote-trino-as-a-data-lakehouse&quot;&gt;Keynote: Trino as a data lakehouse&lt;/h3&gt;

&lt;p&gt;Trino co-creator, Martin Traverso, covers where Trino fits into the data lake 
and brings you a sneak peak of the future of a Trino. Polymorphic Table 
Functions, adaptive query planning, are some of the many exciting features 
Martin walks us through.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/gwV3smFiGEg&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;project-tardigrade&quot;&gt;Project Tardigrade&lt;/h3&gt;

&lt;p&gt;If you have one takeaway from the conference, let it be this: there’s a new way
in town to get 60% cost savings on your Trino deployment. Cory Darby walks
through how utilizing the fault-tolerant execution architecture has enabled
BlueCat to auto-scale their Trino clusters, and run over spot instances, which 
yielded massive cost savings. Zebing Lin goes through how this happens behind
the scenes, and how you can run resource-intensive ETL jobs using failure 
recovery delivered by the team behind Project Tardigrade.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/MYBoeB_lQmo&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://trino.io/blog/2022/05/05/tardigrade-launch.html&quot;&gt;Learn more in the Project Tardigrade blog »&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/kubernetes/tardigrade-eks&quot;&gt;Try Project Tardigrade Yourself »&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;starburst-galaxy-lab&quot;&gt;Starburst Galaxy lab&lt;/h3&gt;

&lt;p&gt;Starburst Galaxy enables you to get Trino up and running rather than spending
your time focusing on the setup, scaling, and maintaining the infrastructure.
Trino co-creator, Dain Sundstrom, walks you through a fun-filled lab that
demonstrates how to use Trino as a service solution, Starburst Galaxy, to
generate &lt;a href=&quot;https://db-engines.com/en/ranking&quot;&gt;database rankings&lt;/a&gt; by ingesting,
cleaning, and analyzing Twitter and Stack Overflow data.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/WQNqqkBd_Jo&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;engineering-data-reliability-with-great-expectations&quot;&gt;Engineering data reliability with Great Expectations&lt;/h3&gt;

&lt;p&gt;Let’s be honest: when we claim to have run “tests” for our data pipelines, we 
usually mean we checked that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;input !=NULL&lt;/code&gt;, or that the dashboard isn’t broken. 
James Campbell showcases the Great Expectations connector for Trino. The
Great Expectations connector is officially launched as the new way to write
expectations (data quality checks) for your code.&lt;/p&gt;

&lt;p&gt;What excites us the most?&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The ability to take advantage of far more sophisticated data quality tests
than what any of us would write.&lt;/li&gt;
  &lt;li&gt;Having a really awesome UI to manage expectations.&lt;/li&gt;
  &lt;li&gt;The data source view that makes it easy to dynamically test your custom
data quality checks against backends.&lt;/li&gt;
&lt;/ol&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/9HE6LawCHP8&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;bring-your-data-into-your-data-lake-with-airbyte&quot;&gt;Bring your data into your data lake with Airbyte&lt;/h3&gt;

&lt;p&gt;The first step of doing any analytics is bringing your data into the data lake.
Ingestion engines are a gamechanger for centralizing your data in the data lake.
Up until recently, there were no open software to choose from in this category.
In just 10 minutes, Abhi Vaidyanatha takes us through the journey of taking in 
data from various places into your choice of data lake.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/3E0jb4d2p0U&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://abhi-vaidyanatha.medium.com/an-opinionated-guide-to-consolidating-your-data-b09386b2b9b5&quot;&gt;Read Abhi’s article about Airbyte + Trino »&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;transforming-your-data-with-dbt&quot;&gt;Transforming your data with dbt&lt;/h3&gt;

&lt;p&gt;Ever had 300 lines of SQL in front of you, and wasted lots of time sifting 
through the code to find which part of the code to edit to check for duplicate 
customers?&lt;/p&gt;

&lt;p&gt;Imagine having to update decimal precision used frequently throughout that SQL
statement? What we &amp;lt;3 the most about DBT is that data engineering becomes much 
more like software engineering, where you code in a much more modular way. Along
the way, you get many benefits: the one we love the most? Data lineage graph and
automatic documentation. That’s stuff we always say is important, but never do.&lt;/p&gt;

&lt;p&gt;Even for dbt experts, there’s something new to learn. Jeremy Cohen goes through
new capabilities Trino brings to dbt, while showcasing cool features like
macros: a flexible alternative to SQL defined functions.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/UYS75sjTziU&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/dbt-labs/trino-dbt-tpch-demo&quot;&gt;Check out Jeremy’s demo repo »&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;choosing-the-best-data-lakehouse-format-for-you&quot;&gt;Choosing the best data lakehouse format for you&lt;/h2&gt;

&lt;p&gt;Ever wonder about all the hype with the new table formats? Why is everyone 
choosing Iceberg, Delta Lake, Hudi, over Hive? The founders of each of these 
modern table formats showcase each of these table formats and let you be the
judge of which format makes more sense to your architecture. Below are the 
highlights:&lt;/p&gt;

&lt;h3 id=&quot;iceberg&quot;&gt;Iceberg&lt;/h3&gt;

&lt;p&gt;Ryan Blue dives into important elements of your data lakehouse architecture that
affect daily operations and slow down developer efficiency. He then covers how
Iceberg is the solution he realized to solve those issues.&lt;/p&gt;

&lt;p&gt;The two special elements of Iceberg is that it intentionally breaks 
compatibility with the Hive format to bring you features like same table 
partition and schema evolution. I’m the surface this may seem trivial as we’ve 
conditioned our minds to accepting the limitations of hive-like formats.&lt;/p&gt;

&lt;p&gt;The second special element is that Iceberg also builds a community-driven 
specification that enables anyone to build out the same calls to use Iceberg 
library.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/1oXmBbB77ak&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;delta-lake&quot;&gt;Delta Lake&lt;/h3&gt;

&lt;p&gt;90% of the time that our Trino data pipelines break, it was because someone 
committed a bad upstream change. With Delta Lake time travel (coming soon!), you
won’t need to spend a whole day pinpointing that bad change: just travel back in
time and identify which change that was. Denny Lee gives us a compelling 
argument for why users desire ACID guarantees in their data lakehouse and how
Delta Lake solves for that.&lt;/p&gt;

&lt;p&gt;Similar to Iceberg, Delta lake offers optimistic concurrency, which allows there
to be multiple writers to the same Delta Lake table while maintaining ACID
constrains on the data.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/TB9Dxv71LxQ&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;hudi-coming-soon-to-trino&quot;&gt;Hudi [Coming Soon to Trino]&lt;/h3&gt;

&lt;p&gt;The coolest part of the talk? Open up a world of new possibilities with near 
real-time analytics in Trino with Hudi. With Hudi, you get to serve real-time 
production systems, debug live issues, and more.&lt;/p&gt;

&lt;p&gt;Vinoth Chandar showcasing the compelling use cases that drove innovation around
Hudi at Uber. He then covers how he views the architecture of data lakes and
lakehouses are starting to merge and the implications this has on the open 
versus proprietary architectures.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/r-fF9uqzUdE&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;h3 id=&quot;touch-talk-and-see-your-data-with-tableau&quot;&gt;Touch, talk, and see your data with Tableau&lt;/h3&gt;

&lt;p&gt;Tableau is our favorite data visualization tool, and in this session, Vlad 
Usatin of Tableau shares how to use Tableau to directly visualize your Trino 
data.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/b6kKqNIMvuM&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;Thank you to all who attended or viewed, we hope to see you again at our
upcoming events later this year. Continue the conversation in our 
&lt;a href=&quot;https://join.slack.com/t/trinodb/shared_invite/zt-18acr4bvr-0DtaCwiLOrv1zetGnV_w~w&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Brian Zhan</name>
        </author>
      

      <summary>When Trino (formerly PrestoSQL) arrived on the scene almost 10 years ago, it immediately became known as the much faster alternative to the data warehouse of big data, Apache Hive. The use cases that you, as the community, have built had far exceeded anything we had imagined in complexity. Together we’ve made Trino not only the fastest way to interactively query large data sets, but also a convenient way to run federated queries across data sources to make moving all the data optional. At Cinco de Trino, we came full circle back to the next iteration of analytics architecture with the data lake. This conference offers advice from industry thought leaders about how to use best lakehouse tools with Trino to manage that data complexity. Hear from industry thought leaders like Martin Traverso (Trino), Dain Sundstrom (Trino), James Campbell (Great Expectations), Jeremy Cohen (DBT Labs), Ryan Blue (Iceberg), Denny Lee (Delta Lake), Vinoth Chandar (Hudi). You can watch the talks on-demand on the Cinco de Trino playlist. In this post, I’d like to cover the key items from each talk you won’t want to miss.</summary>

      
      
    </entry>
  
    <entry>
      <title>Project Tardigrade delivers ETL at Trino speeds to early users</title>
      <link href="https://trino.io/blog/2022/05/05/tardigrade-launch.html" rel="alternate" type="text/html" title="Project Tardigrade delivers ETL at Trino speeds to early users" />
      <published>2022-05-05T00:00:00+00:00</published>
      <updated>2022-05-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/05/05/tardigrade-launch</id>
      <content type="html" xml:base="https://trino.io/blog/2022/05/05/tardigrade-launch.html">&lt;p&gt;After six months of challenging work on Project Tardigrade, we are ready to
launch. With the project we improved the user experience of running resource
intensive queries that are common in the Extract, Transform, Load (ETL) and
batch processing space. It required some significant and fascinating
engineering to get us to the current status. The latest Trino release includes
all the work from Project Tardigrade. Read on to learn how it all works, and
how to enable the fault-tolerant execution in Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot; width=&quot;100%&quot;&gt;
    &lt;img width=&quot;50%&quot; src=&quot;/assets/blog/tardigrade-launch/tardigrade-logo.png&quot; /&gt;
&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;what-is-project-tardigrade&quot;&gt;What is Project Tardigrade?&lt;/h2&gt;

&lt;p&gt;What we love most about Trino is that you get fast query speeds, and you can
iterate fast with intuitive error messages, interactive experience, and query
federation.&lt;/p&gt;

&lt;p&gt;One of the big problems that persisted a long time is that configuring, tuning,
and managing Trino for long-running ETL workloads is very difficult. Following
are just some of the problems you have to deal with:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Reliable landing times:&lt;/em&gt; Queries that run for hours can fail. Restarting
them from scratch wastes resources and makes it hard for you to meet
your completion time requirements.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Cost-efficient clusters:&lt;/em&gt; Trino queries that need terabytes of distributed
memory require extremely large clusters due to the lack of iterative
execution.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Concurrency:&lt;/em&gt; Multiple independent clients may submit their queries
concurrently. Due to the lack of available resources at a certain moment some
of these queries may need to be killed and restarted from zero after a
while. This makes the landing time even more unpredictable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://engineering.salesforce.com/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36&quot;&gt;Structuring your workload&lt;/a&gt;
to avoid these problems can be done by a team of experts. But that is not
accessible to most Trino users.&lt;/p&gt;

&lt;p&gt;The goal of Project Tardigrade is to provide an “out of the box” solution for the
problems mentioned above. We’ve designed a new
&lt;a href=&quot;https://github.com/trinodb/trino/wiki/Fault-Tolerant-Execution&quot;&gt;fault-tolerant execution architecture&lt;/a&gt;
that allows us to implement an advanced resource-aware scheduling with granular
retries.&lt;/p&gt;

&lt;p&gt;Following are some of the benefits and results:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;When your long-running queries experience a failure, they don’t have to start
from scratch.&lt;/li&gt;
  &lt;li&gt;When queries require more memory than currently available in the cluster
they are still able to succeed.&lt;/li&gt;
  &lt;li&gt;When multiple queries are submitted concurrently they are able to share
resources in a fair way, and make steady progress.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino does all the hard work of allocating, configuring, and maintaining query
processing behind the scenes. Instead of spending time tuning Trino clusters to
match your workload requirements, or reorganizing your workload to match your
Trino cluster capabilities, you can spend your time on analytics and delivering
business value. And most importantly, your heart won’t skip a beat when you
wake up in the morning wondering whether that query landed on time.&lt;/p&gt;

&lt;h2 id=&quot;what-did-we-test-so-far&quot;&gt;What did we test so far?&lt;/h2&gt;

&lt;p&gt;Since there’s no publicly available testing query set for ETL use cases, we
handcrafted more than a hundred ETL-like queries based on the
&lt;a href=&quot;https://github.com/trinodb/trino-verifier-queries/tree/main/src/main/resources/queries/tpch/etl&quot;&gt;TPC-H&lt;/a&gt;
and
&lt;a href=&quot;https://github.com/trinodb/trino-verifier-queries/tree/main/src/main/resources/queries/tpcds/etl&quot;&gt;TPC-DS&lt;/a&gt;
datasets.&lt;/p&gt;

&lt;p&gt;To simulate real world settings, we deployed a cluster
&lt;a href=&quot;https://trino.io/docs/current/admin/fault-tolerant-execution.html&quot;&gt;configured for fault-tolerant execution&lt;/a&gt;
of 15 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m5.8xlarge&lt;/code&gt; nodes and repeatedly executed thousands of queries over
datasets of different sizes (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10GB&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1TB&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10TB&lt;/code&gt;). The queries were
executed sequentially as well as with concurrency factors of 5, 10, and 20.
Failure recovery capabilities were tested by crashing a random node in a
cluster every couple of minutes while streaming a live workload.&lt;/p&gt;

&lt;p&gt;To validate new resource management capabilities we submitted all 22
&lt;a href=&quot;https://github.com/trinodb/trino-verifier-queries/tree/main/src/main/resources/queries/tpch/etl&quot;&gt;TPC-H&lt;/a&gt;
based queries simultaneously with fault-tolerant execution enabled and disabled.
With fault-tolerant execution disabled only two of them succeeded, while the 
remaining twenty queries failed with resource-related issues, such as
running out of memory. With fault tolerant execution enabled all of the
queries succeeded with no issues.&lt;/p&gt;

&lt;h2 id=&quot;how-do-i-enable-fault-tolerant-execution&quot;&gt;How do I enable fault-tolerant execution?&lt;/h2&gt;

&lt;p&gt;Fault-tolerant execution can only be enabled for an entire cluster.&lt;/p&gt;

&lt;p&gt;In general, we recommend splitting your long-running ETL queries and
short-running interactive workloads and use cases to run on different cluster.
This ensures that long running ETL queries do not impact interactive workloads
and cause a bad user experience. Also note that any short-running,
interactive queries on a fault-tolerant cluster may experience higher latencies
due to the checkpoint mechanism.&lt;/p&gt;

&lt;h3 id=&quot;1-add-an-s3-bucket-for-checkpointing&quot;&gt;1. Add an S3 bucket for checkpointing&lt;/h3&gt;

&lt;p&gt;First you need to create an S3 bucket for spooling. We recommend configuring a
bucket lifecycle rule to automatically expire abandoned objects in the event of
a node crash. You can configure these rules using the 
&lt;a href=&quot;https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-lifecycle-configuration.html&quot;&gt;s3api&lt;/a&gt; 
which is included in the tutorial below.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
    &quot;Rules&quot;: [
        {
            &quot;Expiration&quot;: {
                &quot;Days&quot;: 1
            },
            &quot;ID&quot;: &quot;Expire&quot;,
            &quot;Filter&quot;: {},
            &quot;Status&quot;: &quot;Enabled&quot;,
            &quot;NoncurrentVersionExpiration&quot;: {
                &quot;NoncurrentDays&quot;: 1
            },
            &quot;AbortIncompleteMultipartUpload&quot;: {
                &quot;DaysAfterInitiation&quot;: 1
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;2-configure-the-trino-exchange-manager&quot;&gt;2. Configure the Trino exchange manager&lt;/h3&gt;

&lt;p&gt;Second you need to configure exchange manager. Add a the file 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exchange-manager.properties&lt;/code&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc&lt;/code&gt; folder of your Trino installation on
the coordinator and all workers with the following content:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;exchange-manager.name=filesystem
exchange.base-directories=s3://&amp;lt;bucket-name&amp;gt;
exchange.s3.region=us-east-1
exchange.s3.aws-access-key=&amp;lt;access-key&amp;gt;
exchange.s3.aws-secret-key=&amp;lt;secret-key&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;3-enable-task-level-retries&quot;&gt;3. Enable task level retries&lt;/h3&gt;

&lt;p&gt;Lastly, you need to configure and enable task level retries by adding the
following properties to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.properties&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;retry-policy=TASK
query.hash-partition-count=50
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note: more than 50 partitions is currently not supported by the filesystem
exchange implementation.&lt;/p&gt;

&lt;h3 id=&quot;4-optional-recommended-settings&quot;&gt;4. Optional recommended settings&lt;/h3&gt;

&lt;p&gt;It is also recommended to enable compression to reduce the amount of data spooled
on S3 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exchange.compression-enabled=true&lt;/code&gt;) as well as reduce the low memory
killer delay to allow the resource manager to unblock nodes running short on memory
faster (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.low-memory-killer.delay=0s&lt;/code&gt;). Additionally, we recommend enabling
automatic writer scaling to optimize output file size for tables created with
Trino (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scale-writers=true&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;To increase overall throughput and reduce resource-related task retries, we
recommend adjusting the concurrency settings based on the hardware
configuration you have chosen.&lt;/p&gt;

&lt;p&gt;Following are the settings for the hardware used in our testing (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;32&lt;/code&gt; vCPUs,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;128GB&lt;/code&gt; memory and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10Gbit/s&lt;/code&gt; network):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;task.concurrency=8
task.writer-count=4
fault-tolerant-execution-target-task-input-size=4GB
fault-tolerant-execution-target-task-split-count=64
fault-tolerant-execution-task-memory=5GB
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;By default Trino is configured to wait up to five minutes for task to recover
before considering it lost and rescheduling. This timeout
can be increased or reduced as necessary by adjusting the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.remote-task.max-error-duration&lt;/code&gt; configuration property. For example:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.remote-task.max-error-duration=1m&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;deploying-on-aws-with-helm-and-kubernetes&quot;&gt;Deploying on AWS with Helm and Kubernetes&lt;/h2&gt;

&lt;p&gt;To test out Tardigrade features, you need at least a cluster with a dedicated
coordinator and two workers for a minimal level of parallelism and performance.
The quickest and easiest way to provide all of these specifications we mentioned
above is by using the
&lt;a href=&quot;https://artifacthub.io/packages/helm/trino/trino&quot;&gt;Trino helm chart&lt;/a&gt; with a
provided &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;values.yml&lt;/code&gt; below and deploying a cluster to the AWS EKS cloud
service. If you are not familiar with deploying Trino on Kubernetes, we
recommend you take a look at the Trino Community Broadcast episodes covering
&lt;a href=&quot;https://trino.io/episodes/24.html&quot;&gt;local Trino on Kubernetes&lt;/a&gt; and
&lt;a href=&quot;https://trino.io/episodes/31.html&quot;&gt;deploying Trino on EKS&lt;/a&gt;.&lt;/p&gt;

&lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/4isawxYjDnE&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/kubernetes/tardigrade-eks&quot;&gt;Try Project Tardigrade Yourself »&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;closing-notes&quot;&gt;Closing notes&lt;/h2&gt;

&lt;p&gt;Project Tardigrade has been a great success for us already. We learned a lot
and significantly improved Trino. Now we are really ready to share this with
you all, and look forward to fix anything you find. We really want you to push
the limits, and let us know what you find.&lt;/p&gt;

&lt;p&gt;If running fast batch jobs on the fastest state-of-the-art query engine 
interests you, consider playing around with the tutorial above and giving us 
your feedback. You can reach us on the &lt;a href=&quot;https://bit.ly/3IFlNXy&quot;&gt;#project-tardigrade&lt;/a&gt; 
channel in our &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you would like to write about your experience and results, or become a
contributor, also let us know on the &lt;a href=&quot;https://bit.ly/3IFlNXy&quot;&gt;#project-tardigrade&lt;/a&gt;
channel. We are happy to send you Tardigrade swag as a thank you.&lt;/p&gt;

&lt;p&gt;Thanks for reading and learning with us today. Happy Querying!&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://www.reddit.com/r/dataengineering/comments/uj2aez/etl_at_trino_speeds_and_a_stepbystep_tutorial_on/&quot;&gt;Discuss on Reddit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;btn btn-pink btn-md waves-effect waves-light&quot; href=&quot;https://news.ycombinator.com/item?id=31276058&quot;&gt;Discuss On Hacker News&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Andrii Rosa, Brian Olsen, Brian Zhan, Lukasz Osipiuk, Martin Traverso, Zebing Lin</name>
        </author>
      

      <summary>After six months of challenging work on Project Tardigrade, we are ready to launch. With the project we improved the user experience of running resource intensive queries that are common in the Extract, Transform, Load (ETL) and batch processing space. It required some significant and fascinating engineering to get us to the current status. The latest Trino release includes all the work from Project Tardigrade. Read on to learn how it all works, and how to enable the fault-tolerant execution in Trino.</summary>

      
      
    </entry>
  
    <entry>
      <title>35: Packaging and modernizing Trino</title>
      <link href="https://trino.io/episodes/35.html" rel="alternate" type="text/html" title="35: Packaging and modernizing Trino" />
      <published>2022-04-21T00:00:00+00:00</published>
      <updated>2022-04-21T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/35</id>
      <content type="html" xml:base="https://trino.io/episodes/35.html">&lt;h2 id=&quot;releases-375-to-378&quot;&gt;Releases 375 to 378&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-375.html&quot;&gt;Trino 375&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for table comments in the MySQL connector.&lt;/li&gt;
  &lt;li&gt;Improved predicate pushdown for PostgreSQL.&lt;/li&gt;
  &lt;li&gt;Performance improvements for aggregations with filters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 376&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Better performance when reading Parquet data.&lt;/li&gt;
  &lt;li&gt;Join pushdown for MySQL.&lt;/li&gt;
  &lt;li&gt;Aggregation pushdown for Oracle.&lt;/li&gt;
  &lt;li&gt;Support table and column comments in ClickHouse connector.&lt;/li&gt;
  &lt;li&gt;Support for adding and deleting schemas in Accumulo connector.&lt;/li&gt;
  &lt;li&gt;Support system truststore in CLI and JDBC driver.&lt;/li&gt;
  &lt;li&gt;Two-way TLS/SSL certificate validation with LDAP authentication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 377&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for standard SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim&lt;/code&gt; syntax.&lt;/li&gt;
  &lt;li&gt;Better performance for Glue metastore.&lt;/li&gt;
  &lt;li&gt;Join pushdown for SQL Server connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 378&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_base32&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_base32&lt;/code&gt; functions.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;expire_snapshots&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_orphan_files&lt;/code&gt; table procedures for Iceberg.&lt;/li&gt;
  &lt;li&gt;Faster planning of queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; predicates.&lt;/li&gt;
  &lt;li&gt;Faster query planning for Hive, Delta Lake, Iceberg, MySQL, PostgreSQL, and
SQL Server connectors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights worth a mention according to Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Generally lots of improvements on Hive, Delta Lake, Iceberg, and main
JDBC-based connectors.&lt;/li&gt;
  &lt;li&gt;Full Iceberg v2 table format support for read and later read and write
operations is getting closer and closer.&lt;/li&gt;
  &lt;li&gt;Table statistics support for PostgreSQL, MySQL, and SQL Server connector
including automatic join pushdown.&lt;/li&gt;
  &lt;li&gt;Fix failure of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT .. LIMIT&lt;/code&gt; operator when input data is dictionary
encoded.&lt;/li&gt;
  &lt;li&gt;Add new page to display the runtime information of all workers in the cluster
in Web UI.&lt;/li&gt;
  &lt;li&gt;Remove &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user&lt;/code&gt; property requirement in JDBC driver.&lt;/li&gt;
  &lt;li&gt;Require &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;internal-communication.shared-secret&lt;/code&gt; value with authentication
usage, breaking change for many users that have not set that secret.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the release notes for
&lt;a href=&quot;https://trino.io/docs/current/release/release-375.html&quot;&gt;Trino 375&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-376.html&quot;&gt;Trino 376&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-377.html&quot;&gt;Trino 377&lt;/a&gt;,
and
&lt;a href=&quot;https://trino.io/docs/current/release/release-378.html&quot;&gt;Trino 378&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-packaging-trino&quot;&gt;Concept of the episode: Packaging Trino&lt;/h2&gt;

&lt;p&gt;To adopt Trino you typically need to run it on a cluster of machines. These can
be bare metal servers, virtual machines, or even containers. The Trino project
provides a few binary packages to allow you to install Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;tarball&lt;/li&gt;
  &lt;li&gt;rpm&lt;/li&gt;
  &lt;li&gt;container image&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of them include a bunch of Java libraries that constitute Trino
and all the plugins. As a result there are only a few requirements. You need a
Linux operating system, since some of the libraries and code require Linux
indirectly, and a Java 11 runtime.&lt;/p&gt;

&lt;p&gt;Beyond that is just the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bin/launcher&lt;/code&gt; script, which is highly recommended, but
not required. It can be used as a service script or for manual
starts/stop/status of Trino, and only needs Python.&lt;/p&gt;

&lt;h3 id=&quot;tarball&quot;&gt;Tarball&lt;/h3&gt;

&lt;p&gt;The tarball, is a gz compressed tar archive. For installation you just need to
extract the archive anywhere. It contains the following directory structure.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bin&lt;/code&gt;, the launcher script and related files&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lib&lt;/code&gt;, all globally needed libraries&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plugins&lt;/code&gt;, connectors and other plugins with their own libraries each in
separate sub-directories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need to create the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc&lt;/code&gt; directory with the needed configuration, since the
tarball does not include any defaults, and you can not start the application
without those.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog/*.properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/config.properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/jvm.config&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/log.properties&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/node.properties&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that all the files are within the created directory.&lt;/p&gt;

&lt;h3 id=&quot;rpm&quot;&gt;RPM&lt;/h3&gt;

&lt;p&gt;The RPM archive is suitable for RPM-based Linux distributions, but testing is
not very thorough across different versions and distributions.&lt;/p&gt;

&lt;p&gt;It adapts the tarball content to the Linux file system hierarchy, hooks the
launcher script up as daemon script, and adds default configuration files. That
allows you to start Trino after installing the archive, as well as with system
restarts.&lt;/p&gt;

&lt;p&gt;Locations used are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/trino&lt;/code&gt;, and others. These are
configured via the launcher script parameters.&lt;/p&gt;

&lt;p&gt;In a nutshell the RPM adds some convenience, but narrows down the supported
Linux distributions. It still requires Java and Python installation and
management.&lt;/p&gt;

&lt;h3 id=&quot;container-image&quot;&gt;Container image&lt;/h3&gt;

&lt;p&gt;The container image for Trino adds the necessary Linux, Java, and Python, and
adapts Trino to the container setup.&lt;/p&gt;

&lt;p&gt;The container adds even more convenience, since it is ready to use out of the
box. It allows usage on Kubernetes with the help of the &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;Helm
charts&lt;/a&gt;, and includes the required operating
system and application parts automatically.&lt;/p&gt;

&lt;h3 id=&quot;customization&quot;&gt;Customization&lt;/h3&gt;

&lt;p&gt;All three package Trino ships are just defaults. They all require further
configuration to adapt Trino to your specific needs in terms of hardware,
connected data sources, security configuration, and so on. All of these can be
done manually or with many existing tools.&lt;/p&gt;

&lt;p&gt;However, you can also take it a step further and create your own package suited
to your needs. The tarball can be used as source for any customization to create
your own package. In the following is a list of options and scenarios:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Use the tarball, but remove unused plugins.&lt;/li&gt;
  &lt;li&gt;Use the tarball as source to create your own specific package. For example a
deb archive for usage with Ubuntu, or an Alpine package for that same distro.&lt;/li&gt;
  &lt;li&gt;Create your own RPM similar to &lt;a href=&quot;https://github.com/simpligility/trino-packages&quot;&gt;Manfred’s proof of
concept&lt;/a&gt; that pulls out the
Trino RPM package creation into a separate project.&lt;/li&gt;
  &lt;li&gt;Create your own container image with different base distro, custom set of
plugins, and even with all your configuration baked into the image.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;others&quot;&gt;Others&lt;/h3&gt;

&lt;p&gt;You can also use &lt;a href=&quot;https://formulae.brew.sh/formula/trino&quot;&gt;brew on MacOS&lt;/a&gt;, but
that is not suitable for production usage. More for convenience to get a local
Trino for playing around.&lt;/p&gt;

&lt;h2 id=&quot;additional-topic-of-the-episode-modernizing-trino-with-java-17&quot;&gt;Additional topic of the episode: Modernizing Trino with Java 17&lt;/h2&gt;

&lt;p&gt;Currently Java 11 is required for Trino. Java 17 is the latest and greatest Java
LTS release with lots of good performance, security, and language improvements.
The community has been working hard to make Java 17 support a reality. At this
stage core Trino fully supports Java 17. Starburst Galaxy for example uses Java
17.&lt;/p&gt;

&lt;p&gt;The maintainers and contributors would like to move to fully support and also
require Java 17 soon. Here is where your input comes in, and we ask that you
let us know your thoughts about questions such as the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Are you looking forward to the new Java 17 language features and other
improvements as a contributor to Trino?&lt;/li&gt;
  &lt;li&gt;Are you already using Java 17 with Trino? In production or just testing?&lt;/li&gt;
  &lt;li&gt;If we require Java 17 in the next months, can you update to use Java 17 with
Trino?&lt;/li&gt;
  &lt;li&gt;If not, what are some of the hurdles?&lt;/li&gt;
  &lt;li&gt;Are you okay with staying at an older release, until you can use Java 17?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let us know on the #dev channel on Trino Slack or ping us directly. You can also
chime in on the &lt;a href=&quot;https://github.com/trinodb/trino/issues/9876&quot;&gt;roadmap issue&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-worker-stats-in-the-web-ui&quot;&gt;Pull requests of the episode: Worker stats in the Web UI&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/issues/11653&quot;&gt;PR of the episode&lt;/a&gt; was
submitted &lt;a href=&quot;https://github.com/whutpencil&quot;&gt;Github user whutpencil&lt;/a&gt;, and adds a
significant new feature to the web UI. It exposes the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.runtimes.nodes&lt;/code&gt;
information, so statistics for each worker, in brand new pages. What a great
effort! Special thanks also go out to &lt;a href=&quot;https://github.com/dedep&quot;&gt;Dawid Adamek
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dedep&lt;/code&gt;&lt;/a&gt; for the review.&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-episode-tarball-installation-and-new-web-ui-feature&quot;&gt;Demo of the episode: Tarball installation and new Web UI feature&lt;/h2&gt;

&lt;p&gt;In the demo of the month Manfred shows a worker installation to add to a local
tarball install of a coordinator, and then demos the Web UI with the new feature
from the pull request of the month.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-are-write-operations-in-delta-lake-supported-for-tables-stored-on-hdfs&quot;&gt;Question of the episode: Are write operations in Delta Lake supported for tables stored on HDFS?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/CGB0QHWSW/p1650331073409229&quot;&gt;Full question from Slack&lt;/a&gt;:
I was trying the Delta Lake connector. I noticed that write operations are
supported for tables stored on Azure ADLS Gen2, S3 and S3-compatible storage.
Does that mean write operations are not supported for tables stored on HDFS?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Answer:&lt;/em&gt; HDFS is always implicitly supported for data lake connectors. It isn’t
called out because it is assumed.&lt;/p&gt;

&lt;p&gt;The confusion actually came from an error message used when the user tried to
insert into a Delta Lake table they created in Spark. Then they tried inserting
a record into the table through IntelliJ IDEA and received the following error
message:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Unsupported target SQL type: -155
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;They thought the problem might be the wrong data type of birthday. Then used
statement below to insert a record into the table.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO
  presto.people10m (id, firstname, middlename, lastname, gender, birthdate, ssn, salary)
VALUES (1, &apos;a&apos;, &apos;b&apos;, &apos;c&apos;, &apos;male&apos;, timestamp &apos;1990-01-01 00:00:00 +00:00&apos;, &apos;d&apos;, 10);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;However, I got an error message like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query 20220419_031201_00015_8qe76 failed:
Cannot write to table in hdfs://masters/presto.db/people10m; hdfs not supported
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This was an issue on the IntelliJ client.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburst.io/info/cinco-de-trino/&quot;&gt;Cinco de Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/285087048/&quot;&gt;Constructing an Intelligent Data Trellis from your Data Mesh&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Releases 375 to 378</summary>

      
      
    </entry>
  
    <entry>
      <title>34: A big delta for Trino</title>
      <link href="https://trino.io/episodes/34.html" rel="alternate" type="text/html" title="34: A big delta for Trino" />
      <published>2022-03-17T00:00:00+00:00</published>
      <updated>2022-03-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/34</id>
      <content type="html" xml:base="https://trino.io/episodes/34.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;p&gt;In this episode Manfred has the pleasure to chat with two colleagues, who
are working on making Trino better every day:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/claudiusli&quot;&gt;Claudius Li&lt;/a&gt;, Product Manager at Starburst&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jhlodin&quot;&gt;Joe Lodin&lt;/a&gt;, Information Engineer at Starburst&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Brian is out to add another member to his family!&lt;/p&gt;

&lt;h2 id=&quot;releases-372-373-and-374&quot;&gt;Releases 372, 373, and 374&lt;/h2&gt;

&lt;p&gt;Official highlights from Martin Traverso:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-372.html&quot;&gt;Trino 372&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trim_array&lt;/code&gt; function.&lt;/li&gt;
  &lt;li&gt;Support for reading ZSTD-compressed Avro files.&lt;/li&gt;
  &lt;li&gt;Support for column comments in Iceberg.&lt;/li&gt;
  &lt;li&gt;Support for Kerberos authentication in Kudu connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-373.html&quot;&gt;Trino 373&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New Delta Lake connector.&lt;/li&gt;
  &lt;li&gt;Improved performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIKE&lt;/code&gt; when querying Elasticsearch and PostgreSQL.&lt;/li&gt;
  &lt;li&gt;Improved performance when querying partitioned Hive tables.&lt;/li&gt;
  &lt;li&gt;Support access to S3 via HTTP proxy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-374.html&quot;&gt;Trino 374&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Faster &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Vim/Emacs editing mode for CLI.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; in Cassandra connector.&lt;/li&gt;
  &lt;li&gt;Support &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uint&lt;/code&gt; types in ClickHouse.&lt;/li&gt;
  &lt;li&gt;Support for Glue Metastore in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE/DROP SCHEMA&lt;/code&gt;, table and column comments in MongoDB&lt;/li&gt;
  &lt;li&gt;Improved pushdown for PostgreSQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional highlights from Manfred&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Timeout configuration for LDAP authentication.&lt;/li&gt;
  &lt;li&gt;Values related to fault-tolerant execution in Web UI.&lt;/li&gt;
  &lt;li&gt;JDBC &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Driver.getProperties&lt;/code&gt; enables more client applications like DBVisualizer.&lt;/li&gt;
  &lt;li&gt;Vi and Emacs editing modes for interactive CLI usage.&lt;/li&gt;
  &lt;li&gt;Performance improvements in PostgreSQL connector.&lt;/li&gt;
  &lt;li&gt;SingleStore JDBC driver usage, end of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;memsql&lt;/code&gt; name.&lt;/li&gt;
  &lt;li&gt;Documentation for the atop connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the
&lt;a href=&quot;https://trino.io/docs/current/release/release-372.html&quot;&gt;Trino 372&lt;/a&gt;,
&lt;a href=&quot;https://trino.io/docs/current/release/release-373.html&quot;&gt;Trino 373&lt;/a&gt;, and
&lt;a href=&quot;https://trino.io/docs/current/release/release-374.html&quot;&gt;Trino 374&lt;/a&gt; release
notes.&lt;/p&gt;

&lt;h2 id=&quot;project-tardigrade-update&quot;&gt;Project Tardigrade update&lt;/h2&gt;

&lt;p&gt;The team around Project Tardigrade joined us in &lt;a href=&quot;./32.html&quot;&gt;episode 32&lt;/a&gt; to talk
about fault tolerant execution of queries in Trino. Now they have posted a
&lt;a href=&quot;/blog/2022/02/16/tardigrade-project-update.html&quot;&gt;status update on our blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It looks like things are really coming along well, and Joe has joined the effort
to &lt;a href=&quot;../docs/current/admin/fault-tolerant-execution.html&quot;&gt;create a first user-facing documentation
set&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The team has also posted a status update on the #project-tardigrade Slack
channel. Everything is ready for the community to perform first real world
testing, and help us make this a great feature set for Trino.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-episode-a-new-connector-for-delta-lake-object-storage&quot;&gt;Concept of the episode: A new connector for Delta Lake object storage&lt;/h2&gt;

&lt;p&gt;It is great to have a new connector in Trino, but what does that even mean?
Let’s find out.&lt;/p&gt;

&lt;h3 id=&quot;what-is-a-connector&quot;&gt;What is a connector?&lt;/h3&gt;

&lt;p&gt;Just a quick refresher. Trino allows you to query many different data sources
with SQL statements. You enable that by creating a &lt;em&gt;catalog&lt;/em&gt; that contains the
configuration to connect to a specific &lt;em&gt;data source&lt;/em&gt;. The data source can be a
relational database, a NoSQL database, and an object storage. A &lt;em&gt;connector&lt;/em&gt; is
the translation layer that maps the concepts in the data source to the Trino
concepts of schema, tables, rows, columns, data types and so on. The connector
needs to know how to retrieve the data itself from data source, and also how to
interact with the metadata.&lt;/p&gt;

&lt;p&gt;Here are some examples metadata questions to answer:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What are the available tables in schema &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xyz&lt;/code&gt;?&lt;/li&gt;
  &lt;li&gt;What columns does table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;abc&lt;/code&gt; have and what are the data types?&lt;/li&gt;
  &lt;li&gt;What file format is used by the storage for table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;efg&lt;/code&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And some queries about the actual data:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Give me the top 100 rows from table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Give me all files in partition &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; in the directory &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So having a connector for your data source in Trino is a big deal. A connector
unlocks the data to all your SQL analytics powered by Trino, and the underlying
data source doesn’t even have to support SQL.&lt;/p&gt;

&lt;h3 id=&quot;what-is-delta-lake&quot;&gt;What is Delta Lake?&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://delta.io/&quot;&gt;Delta Lake&lt;/a&gt; is an evolution of the Hive/Hadoop object
storage data source. It is an open-source storage format. Data is stored in
files, typically using binary formats such as Parquet or ORC. Metadata is stored
in a Hive Metastore Service (HMS).&lt;/p&gt;

&lt;p&gt;Delta Lake supports ACID transactions, time travel, and many other features that
are lacking in the legacy Hive/Hadoop setup. This combination of traditional
data lake storage with data warehouse features is often called a lake house.&lt;/p&gt;

&lt;h3 id=&quot;history-of-the-new-connector&quot;&gt;History of the new connector&lt;/h3&gt;

&lt;p&gt;Delta Lake is fully open source, and part of the larger enterprise platform for
a lake house offered by &lt;a href=&quot;https://databricks.com/&quot;&gt;Databricks&lt;/a&gt;.
&lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt; has supported Delta Lake users with a
connector for &lt;a href=&quot;https://docs.starburst.io/index.html#sep&quot;&gt;Starburst Enterprise&lt;/a&gt;
for nearly two years. To foster further adoption and innovation with the
community, the connector was &lt;a href=&quot;https://docs.starburst.io/blog/2022-03-15-delta-lake.html&quot;&gt;donated to Trino
373&lt;/a&gt; and continues to
be improved.&lt;/p&gt;

&lt;h2 id=&quot;pull-requests-of-the-episode-add-delta-lake-connector-and-documentation&quot;&gt;Pull requests of the episode: Add Delta Lake connector and documentation&lt;/h2&gt;

&lt;p&gt;Over 25 developers helped &lt;a href=&quot;https://github.com/jirassimok&quot;&gt;Jakob&lt;/a&gt; with the effort
to &lt;a href=&quot;https://github.com/trinodb/trino/pull/10897&quot;&gt;open-source the connector&lt;/a&gt;. It
is a heavy lift to migrate a such a full featured connectors into Trino. By
comparison the &lt;a href=&quot;https://github.com/trinodb/trino/pull/11229&quot;&gt;documentation was
easy&lt;/a&gt;, but it is very important to
enable you. Well done everyone!&lt;/p&gt;

&lt;p&gt;Let’s have a look at the code in a bit more detail. A couple of key facts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The Delta Lake connector is just another plugin like all other connectors.&lt;/li&gt;
  &lt;li&gt;This is a feature-rich connector supporting read and write operations.&lt;/li&gt;
  &lt;li&gt;It shares implementation details with Hive and Iceberg connectors such as HMS
access, Parquet and ORC file readers, and so on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-of-the-episode-delta-lake-connector-in-action&quot;&gt;Demo of the episode: Delta Lake connector in action&lt;/h2&gt;

&lt;p&gt;Now let’s have a look at all this in action. In the demo Claudius uses
docker-compose to start up a HMS as metastore, MinIO as object storage, and of
course Trino as the query engine.&lt;/p&gt;

&lt;p&gt;If you want to follow along, all resources used for the demo are &lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/community_tutorials/delta-lake&quot;&gt;available on
our getting started
repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here is the sample catalog &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta.properties&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;py&quot;&gt;connector.name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;delta-lake&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.metastore.uri&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;thrift://hive-metastore:9083&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;http://minio:9000&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.aws-access-key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minio&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.aws-secret-key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;minio123&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;hive.s3.path-style-access&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;delta.enable-non-concurrent-writes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once everything is up and running we can start playing.&lt;/p&gt;

&lt;p&gt;Verify that the catalog is available:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CATALOGS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Check if there are any schemas:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCHEMAS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Lets create a new schema:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SCHEMA&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;s3a://claudiustestbucket/myschema&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create a table, insert some records, and then verify:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;John&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;Jane&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run a query to get more data and insert it into a new table:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now for some data manipulation:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;Jonathan&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And finally, lets clean up:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;ALTER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXECUTE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;file_size_threshold&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;10MB&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myothertable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mytable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SCHEMA&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;delta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myschema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see with Trino and Delta Lake you get full create, read, update, and
delete operations on your lake house.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-episode-how-do-i-secure-the-connection-from-a-trino-cluster-to-the-data-source&quot;&gt;Question of the episode: How do I secure the connection from a Trino cluster to the data source&lt;/h2&gt;

&lt;p&gt;Since we talked about connectors earlier, you already know that the
configuration for accessing a data source is assembled to create a catalog. This
approach uses a properties file in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog&lt;/code&gt;. For example, let’s look at the
recently updated &lt;a href=&quot;../docs/current/connector/sqlserver.html&quot;&gt;SQL Server connector
documentation&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-properties highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;py&quot;&gt;connector.name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sqlserver&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;jdbc:sqlserver://&amp;lt;host&amp;gt;:&amp;lt;port&amp;gt;;database=&amp;lt;database&amp;gt;;encrypt=false&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-user&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;root&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;connection-password&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The connector uses username and password authentication. It connects using the
JDBC driver, which in turn enables TLs by default. A number of other connectors
also use JDBC drivers with username and password authentication, but the details
vary a lot. However, for all of them you can use &lt;a href=&quot;../docs/current/security/secrets.html&quot;&gt;secrets support in
Trino&lt;/a&gt; to use environment variable
references instead of hardcoding passwords.&lt;/p&gt;

&lt;p&gt;When it comes to other connectors the details of securing a connection vary even
more. Ultimately the answer to how to secure the connection, and if that is even
possible, is the usual “It depends”. Luckily you can check the documentation for
each connector to find out more and ping us on Slack if you need more help.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.starburst.io/blog/2022-03-15-delta-lake.html&quot;&gt;Starburst donates the Delta Lake connector to Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/282794002/&quot;&gt;Operating Trino at Scale at Robinhood&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from
O’Reilly. You can download
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>33: Trino becomes highly available for high demand</title>
      <link href="https://trino.io/episodes/33.html" rel="alternate" type="text/html" title="33: Trino becomes highly available for high demand" />
      <published>2022-02-17T00:00:00+00:00</published>
      <updated>2022-02-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/33</id>
      <content type="html" xml:base="https://trino.io/episodes/33.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Ramesh Bhanan, Vice President, at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/ramesh-bhanan-byndoor/&quot;&gt;@ramesh-bhanan-byndoor&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Sambit Dikshit, Managing Director, Tech Fellow at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/sambitdixit/&quot;&gt;@sambitdixit&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Siddhant Chadha, Senior Data Engineer at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/siddhant-chadha-838136142/&quot;&gt;@siddhant-chadha&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Suman Baliganahalli Narayan Murthy, Vice President at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/suman-b-n-08-03-1990/&quot;&gt;@suman-b-n&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Sumit Halder, Vice President at &lt;a href=&quot;https://developer.gs.com/discover/home&quot;&gt;Goldman Sachs&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/sumit-halder-a3732482/&quot;&gt;@sumit-halder&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;releases-369-370-and-371&quot;&gt;Releases 369, 370, and 371&lt;/h2&gt;

&lt;p&gt;Trino 369&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Experimental support for task level retries.&lt;/li&gt;
  &lt;li&gt;Support for groups in OAuth2 claims.&lt;/li&gt;
  &lt;li&gt;Column comments in ClickHouse connector.&lt;/li&gt;
  &lt;li&gt;Write Bloom filters in ORC files.&lt;/li&gt;
  &lt;li&gt;Procedure for optimizing Iceberg tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino 370&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add CLI support for ARM64.&lt;/li&gt;
  &lt;li&gt;Improved performance for ORC.&lt;/li&gt;
  &lt;li&gt;Improved performance for map and row types.&lt;/li&gt;
  &lt;li&gt;Reduced latency for OAuth2.0 authentication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino 371&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for secrets and user group selector in resource group manager.&lt;/li&gt;
  &lt;li&gt;Support AWS role session name in S3 security mapping configuration.&lt;/li&gt;
  &lt;li&gt;Many bug fixes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notes from Manfred&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add support for using PostgreSQL and Oracle as backend database for resource
groups.&lt;/li&gt;
  &lt;li&gt;Remove &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spill-order-by&lt;/code&gt;,  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spill-window-operator&lt;/code&gt;, and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... SET PROPERTIES&lt;/code&gt; in the engine.&lt;/li&gt;
  &lt;li&gt;Prevent hanging query execution on failures with phased execution policy.&lt;/li&gt;
  &lt;li&gt;Support for renaming schemas in PostgreSQL and Redshift connectors.&lt;/li&gt;
  &lt;li&gt;Lots of improvements on Clickhouse connector, thanks Yuya!&lt;/li&gt;
  &lt;li&gt;Update to newer ClickHouse version removed support for Altinity 20.3.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$properties&lt;/code&gt; table and other hidden tables in Iceberg connector, including
docs.&lt;/li&gt;
  &lt;li&gt;Automatically adjust &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ulimit&lt;/code&gt; setting when using the RPM package.&lt;/li&gt;
  &lt;li&gt;Docker images changes to UBI.&lt;/li&gt;
  &lt;li&gt;Remove support/need for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;allow-drop-table&lt;/code&gt; catalog property in JDBC connectors.&lt;/li&gt;
  &lt;li&gt;A bunch of SPI changes.&lt;/li&gt;
  &lt;li&gt;DML with Iceberg connector with fault tolerant mode and more Tardigrade improvements.&lt;/li&gt;
  &lt;li&gt;Drop support for Kudu 1.13.0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-369.html&quot;&gt;Trino
369&lt;/a&gt;, &lt;a href=&quot;https://trino.io/docs/current/release/release-370.html&quot;&gt;Trino
370&lt;/a&gt;, and &lt;a href=&quot;https://trino.io/docs/current/release/release-371.html&quot;&gt;Trino
371&lt;/a&gt; release notes.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-month-high-availability-with-trino&quot;&gt;Concept of the month: High availability with Trino&lt;/h2&gt;

&lt;p&gt;Goldman Sachs uses Trino to reduce last-mile ETL, and provide a unified way of 
accessing data through federated joins. Making a variety of data sets from 
different sources available in one spot for our data science team was a tall 
order. Data must be quickly accessible to data consumers, and systems like Trino
must be reliable for users to trust this singular access point for their data.&lt;/p&gt;

&lt;p&gt;In order for analysts and data scientists to use these services, they first need
to trust in the system. It was vital to Goldman Sachs that Trino has high 
availability. In the event of any failure, another Trino cluster is available to
process requests.&lt;/p&gt;

&lt;h3 id=&quot;integrating-trino-into-the-goldman-sachs-internal-ecosystem&quot;&gt;Integrating Trino into the Goldman Sachs internal ecosystem&lt;/h3&gt;

&lt;p&gt;Before high availability was a concern, the team had to first integrate Trino to
meet their requirements. This included integrating with internal security 
systems, observability systems, and credential stores. It also meant
adding integration with their governance services that manage cataloguing
services and data discovery engines. Finally, while many of the Trino connectors
that the team intended to use exist, there were many missing features and 
performance enhancements that would lead to a better user experience and more 
adoption. The team has since taken it upon themselves to work on these features
and contribute them back to Trino. We will cover some of these contributions in
the PR segment of this show.&lt;/p&gt;

&lt;h3 id=&quot;achieving-scaling-and-high-availability&quot;&gt;Achieving scaling and high availability&lt;/h3&gt;

&lt;p&gt;Once the team had much of Trino running for some initial use cases, the next 
step was to improve support for more simultaneous use cases and highly 
concurrent workloads. The team wanted trust in the system and so as they scaled
the ability to run blue-green deployments, enable resources isolation, and
have highly available clusters through failures became much more pertinant.&lt;/p&gt;

&lt;h3 id=&quot;trino-ecosystem-at-goldman-sachs&quot;&gt;Trino ecosystem at Goldman Sachs&lt;/h3&gt;

&lt;p&gt;Here is an overview of the Goldman Sachs ecosystem. It showcases the preexisting
services that needed to connect to Trino, the catalogs supported, and the method
in which Goldman Sachs achieves high availability through supporting multiple
clusters in various groups.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/33/trinoecosystem.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Goldman Sachs Blog&lt;/a&gt;
&lt;/p&gt;

&lt;h3 id=&quot;dynamic-query-routing&quot;&gt;Dynamic query routing&lt;/h3&gt;

&lt;p&gt;In order to ensure that all the clusters receive an even distribution the team
created services that enable dynamic query routing across the different cluster
groups.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/33/trinodynamicqueryrouting.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Goldman Sachs Blog&lt;/a&gt;
&lt;/p&gt;

&lt;h3 id=&quot;query-routing-components&quot;&gt;Query routing components&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.envoyproxy.io/&quot;&gt;Envoy Proxy&lt;/a&gt; - open source edge and service proxy
that provides features such as routing, traffic management, load balancing, 
external authorization, rate limiting, and more.&lt;/li&gt;
&lt;/ul&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/33/trinocontrolplane.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Goldman Sachs Blog&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Cluster Groups - cluster group is a set of various Trino clusters that can
be assigned traffic by the&lt;/li&gt;
  &lt;li&gt;Cluster Metadata Service - a service that provides the Envoy routers with all
the cluster related configurations&lt;/li&gt;
  &lt;li&gt;Router Service
    &lt;ul&gt;
      &lt;li&gt;Envoy Control Plane - The Envoy Control Plane is an xDs gRPC-based service, 
that is responsible for providing dynamic configurations to Envoy.&lt;/li&gt;
      &lt;li&gt;Upstream Cluster Selection - Envoy provides HTTP filters to parse and modify
both request and response headers. We use a custom Lua filter to parse the 
request and extract the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x-trino-user&lt;/code&gt; header. Then, we call the router 
service, which returns the upstream cluster address.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-month-pr-8956-add-support-for-external-db-for-schema-management-in-mongodb-connector&quot;&gt;PR of the month: PR 8956 Add support for external db for schema management in MongoDB connector&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://github.com/trinodb/trino/pull/8956&quot;&gt;PR of the month&lt;/a&gt; comes
from today’s guest Siddhant to solve &lt;a href=&quot;https://github.com/trinodb/trino/issues/8887&quot;&gt;this issue related to the MongoDB connector&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Siddhant created the issue in response to the common problem that MongoDB 
connector users face when they don’t have write capability in the Mongo system.
Since MongoDB has no implicit schema, Trino uses a schema definition that is
written to a special MongoDB database. This PR enables users without write access
to create an external location to store their schema to avoid this issue.&lt;/p&gt;

&lt;p&gt;Thanks Siddhant for raising this issue, as it’s a common issue beginners using
the MongoDB connector face commonly.&lt;/p&gt;

&lt;h2 id=&quot;bonus-pr-of-the-month-pr-8202-metadata-for-alias-in-elasticsearch-connector-only-uses-the-first-mapping&quot;&gt;Bonus PR of the month: PR 8202 Metadata for alias in Elasticsearch connector only uses the first mapping&lt;/h2&gt;

&lt;p&gt;This bonus &lt;a href=&quot;https://github.com/trinodb/trino/pull/8202&quot;&gt;PR of the month&lt;/a&gt; comes
from another one of today’s guests, Suman. It solves multiple issues, meaning 
this feature is in high demand!&lt;/p&gt;

&lt;p&gt;The problem brought up by these issues also have to do with how we are mapping
schemas over NoSQL databases that don’t implicitely have a schema. In this case
Elasticsearch stores it’s schema in an object called a mapping. This mapping can
be strict or dynamic for various portions of the document that gets inserted.
The object that correlates to a table in Elasticsearch is called an index. To
keep Elasticsearch fast, multiple indexes are created periodically to support a
given document type similar to partitioning in a database. In general, these 
index follow a very common mapping for a given type, but the reality is that 
Elasticsearch allows you to vary from the mapping. Trino currently simplifies
the way this is done by only reading the first mapping and assuming that all
indexes and documents follow this schema. This pull request addresses this issue
by scanning a much larger sample of mappings and merging the schema to handle
any conflicts. It then goes further to cache these merged mappings for a given
amount of time.&lt;/p&gt;

&lt;p&gt;Thanks for all of your continued work on this Suman! It will help a lot!&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-month-trino-fiddle-a-tool-for-easy-online-testing-and-sharing-of-trino-sql-problems-and-their-solutions&quot;&gt;Demo of the month: Trino Fiddle: A tool for easy online testing and sharing of Trino SQL problems and their solutions&lt;/h2&gt;

&lt;p&gt;This months demo showcases a tool that Brian modified from &lt;a href=&quot;http://sqlfiddle.com/&quot;&gt;SQL Fiddle&lt;/a&gt;
tool called Trino Fiddle. This tool will allow Trino users to share problems
and answer questions that other Trino users are facing.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-month-does-trino-support-carbondata&quot;&gt;Question of the month: Does Trino support CarbonData?&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://www.trinoforum.org/t/142&quot;&gt;question of the month&lt;/a&gt; 
comes from &lt;a href=&quot;https://www.trinoforum.org/u/masayyed/summary&quot;&gt;Mahebub Sayyed&lt;/a&gt; on 
Trino Forum. Mahebub asks, “Does Trino support CarbonData?”&lt;/p&gt;

&lt;p&gt;The answer is a little tricky, but it can be done!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://carbondata.apache.org/&quot;&gt;CarbonData&lt;/a&gt; currently maintains a connector 
called &lt;a href=&quot;https://mvnrepository.com/artifact/org.apache.carbondata/carbondata-presto&quot;&gt;carbondata-presto&lt;/a&gt; 
that works with an older version of Trino, version 333 (an io.prestosql version 
&lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;before the rename&lt;/a&gt;). 
Someone has already opened &lt;a href=&quot;https://github.com/apache/carbondata/pull/4198&quot;&gt;a PR to update this connector to a current Trino version&lt;/a&gt; 
that they worked on in the middle of 2021 and hasn’t made much progress 
recently.&lt;/p&gt;

&lt;p&gt;That being said, you could build and use 
&lt;a href=&quot;https://github.com/czy006/carbondata/tree/trino-358-alpha/integration/trino&quot;&gt;the Trino version of the connector&lt;/a&gt; 
this person was working on, and see if it works for you. If you are running on a 
version of Trino that is older than 351, you should be able to use the existing 
carbondata-presto connector.&lt;/p&gt;

&lt;p&gt;If anyone feels motivated, it would be wonderful if you could help get this 
contributed to the CarbonData project, or even work with them to have it land
in the Trino project!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs&quot;&gt;Enabling Highly Available Trino Clusters at Goldman Sachs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=-5mlZGjt6H4&quot;&gt;Video: Building a Federated Cost-Effective Highly Efficient Query Platform&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://developer.gs.com/blog/posts&quot;&gt;Goldman Sachs Developer Blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.goldmansachs.com/careers/&quot;&gt;Goldman Sachs Careers Page&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/gsdeveloper&quot;&gt;Follow @GSDeveloper on Twitter&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>Tardigrade Project Update</title>
      <link href="https://trino.io/blog/2022/02/16/tardigrade-project-update.html" rel="alternate" type="text/html" title="Tardigrade Project Update" />
      <published>2022-02-16T00:00:00+00:00</published>
      <updated>2022-02-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2022/02/16/tardigrade-project-update</id>
      <content type="html" xml:base="https://trino.io/blog/2022/02/16/tardigrade-project-update.html">&lt;p&gt;Over the last couple of months we’ve added support for full query retries, landed experimental support 
for task level retries and provided a proof of concept implementation of a distributed exchange plugin 
(description below). We are still working on improving scheduling algorithms as well as optimizing 
exchange plugin implementation to make the task level retries fully usable.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Here is a quick summary of our progress so far:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for &lt;a href=&quot;https://github.com/trinodb/trino/pull/9361&quot;&gt;automatic query retries&lt;/a&gt;. This functionality 
is ready to use and can be enabled by setting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry_policy=QUERY&lt;/code&gt; session property. Now 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/10507&quot;&gt;it is possible&lt;/a&gt; to enable automatic retries for queries that 
produce more than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;32MB&lt;/code&gt; of output. Dynamic filtering is now also 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/10274&quot;&gt;fully supported&lt;/a&gt; with automatic query retries enabled.&lt;/li&gt;
  &lt;li&gt;Landed an &lt;a href=&quot;https://github.com/trinodb/trino/pull/9818&quot;&gt;initial set of changes&lt;/a&gt; to support task level retries. 
To be enabled, a plugin implementing the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-spi/src/main/java/io/trino/spi/exchange/ExchangeManager.java&quot;&gt;ExchangeManager&lt;/a&gt; 
interface has to be installed.&lt;/li&gt;
  &lt;li&gt;Landed a &lt;a href=&quot;https://github.com/trinodb/trino/pull/10823&quot;&gt;proof of concept implementation&lt;/a&gt; of the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-spi/src/main/java/io/trino/spi/exchange/ExchangeManager.java&quot;&gt;ExchangeManager&lt;/a&gt; 
interface. The implementation is fully functional, however we are still &lt;a href=&quot;https://github.com/trinodb/trino/issues/11050&quot;&gt;working on optimizing the read path&lt;/a&gt;. 
Also for now, only S3 compatible file systems are supported.&lt;/li&gt;
  &lt;li&gt;Added support for automatic retries in &lt;a href=&quot;https://github.com/trinodb/trino/issues/10252&quot;&gt;Hive&lt;/a&gt; and &lt;a href=&quot;https://github.com/trinodb/trino/pull/10622&quot;&gt;Iceberg&lt;/a&gt;. 
Supporting automatic retries for &lt;a href=&quot;https://github.com/trinodb/trino/issues/10254&quot;&gt;JDBC based connectors&lt;/a&gt; is up for grabs.&lt;/li&gt;
  &lt;li&gt;Implemented &lt;a href=&quot;https://github.com/trinodb/trino/pull/10837&quot;&gt;weight based split assignment&lt;/a&gt; for balanced work distribution between fault tolerant tasks.&lt;/li&gt;
  &lt;li&gt;Working on &lt;a href=&quot;https://github.com/trinodb/trino/pull/11023&quot;&gt;adaptive sizing strategy for intermediate tasks&lt;/a&gt; to minimize scheduling overhead 
while keeping the cost of a single task failure at minimum.&lt;/li&gt;
  &lt;li&gt;Making progress on introducing an &lt;a href=&quot;https://github.com/trinodb/trino/pull/10432&quot;&gt;advanced memory aware scheduling&lt;/a&gt; that would allow us 
to better support memory intensive queries, improve resource utilization and ensure fair resource allocation between queries.&lt;/li&gt;
  &lt;li&gt;Started working on &lt;a href=&quot;https://github.com/trinodb/trino/issues/9935&quot;&gt;supporting dynamic filtering&lt;/a&gt; for queries with task level retries enabled.&lt;/li&gt;
  &lt;li&gt;Working on &lt;a href=&quot;https://github.com/trinodb/trino/issues/10734&quot;&gt;accommodating failed attempts&lt;/a&gt; in various internal statistics reported by 
the engine (e.g.: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QueryInfo&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QueryCompletedEvent&lt;/code&gt;). &lt;a href=&quot;https://github.com/trinodb/trino/issues/10754&quot;&gt;UI changes&lt;/a&gt; will come next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over the next couple of weeks we are planning to focus on:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/11050&quot;&gt;Optimizing read path for the reference implementation of the exchange plugin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Landing &lt;a href=&quot;https://github.com/trinodb/trino/pull/10432&quot;&gt;memory aware scheduling for fault tolerant execution&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Landing &lt;a href=&quot;https://github.com/trinodb/trino/pull/11023&quot;&gt;adaptive sizing for intermediate tasks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/10734&quot;&gt;Accommodating failed attempts into query statistics reporting&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Making progress on &lt;a href=&quot;https://github.com/trinodb/trino/issues/9935&quot;&gt;supporting dynamic filtering&lt;/a&gt; for queries with task level retries enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current state of development can be tracked by following this &lt;a href=&quot;https://github.com/trinodb/trino/issues/9101&quot;&gt;issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned!&lt;/p&gt;</content>

      
        <author>
          <name>Andrii Rosa</name>
        </author>
      

      <summary>Over the last couple of months we’ve added support for full query retries, landed experimental support for task level retries and provided a proof of concept implementation of a distributed exchange plugin (description below). We are still working on improving scheduling algorithms as well as optimizing exchange plugin implementation to make the task level retries fully usable.</summary>

      
      
    </entry>
  
    <entry>
      <title>32: Trino Tardigrade: Try, try, and never die</title>
      <link href="https://trino.io/episodes/32.html" rel="alternate" type="text/html" title="32: Trino Tardigrade: Try, try, and never die" />
      <published>2022-01-20T00:00:00+00:00</published>
      <updated>2022-01-20T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/32</id>
      <content type="html" xml:base="https://trino.io/episodes/32.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Andrii Rosa, Software Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/andrii-rosa-79578561/&quot;&gt;@andrii-rosa-79578561&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Brian Zhan, Product Manager at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://twitter.com/brianzhan1&quot;&gt;@brianzhan1&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Lukasz Osipiuk, Software Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://twitter.com/losipiuk&quot;&gt;@losipiuk&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Martin Traverso, Trino &amp;amp; Presto Co-founder and CTO at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;@mtraverso&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Zebing Lin, Software Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/linzebing/&quot;&gt;@linzebing&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;If you missed &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;Trino Summit 2021&lt;/a&gt;,
you can watch it on demand, for free!&lt;/p&gt;

&lt;h2 id=&quot;releases-367-and-368&quot;&gt;Releases 367 and 368&lt;/h2&gt;

&lt;p&gt;Martin’s official announcements merged into one:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lineage tracking for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH&lt;/code&gt; clauses and subqueries.&lt;/li&gt;
  &lt;li&gt;Option to hide inaccessible columns in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT *&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flush_metadata_cache()&lt;/code&gt; procedure for the Hive connector.&lt;/li&gt;
  &lt;li&gt;Improve performance of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;File-based access control for the Iceberg connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIME&lt;/code&gt; type in the SingleStore connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BINARY&lt;/code&gt; type in the Phoenix connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Prevent data loss on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; in Hive and Iceberg connectors.&lt;/li&gt;
  &lt;li&gt;New default query execution policy &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phased&lt;/code&gt; brings performance improvements.&lt;/li&gt;
  &lt;li&gt;And finally, numerous smaller improvements around memory management and query
processing for our project Tardigrade.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-367.html&quot;&gt;Trino
367&lt;/a&gt; and &lt;a href=&quot;https://trino.io/docs/current/release/release-368.html&quot;&gt;Trino
368&lt;/a&gt; release notes.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-month-introducing-project-tardigrade&quot;&gt;Concept of the month: Introducing Project Tardigrade&lt;/h2&gt;

&lt;p&gt;Before we jump right into the project, lets cover some of the history of ETL and
data warehousing to better understand the problems that Tardigrade solves.&lt;/p&gt;

&lt;h3 id=&quot;why-do-people-want-to-do-etl-in-trino&quot;&gt;Why do people want to do ETL in Trino?&lt;/h3&gt;

&lt;p&gt;Trino is used for Extract, Transform, Load (ETL) workloads in many companies,
like Salesforce, Shopify, Slack, and older versions of Trino at Facebook.&lt;/p&gt;

&lt;p&gt;First, the most important thing is query speed. Queries run a lot faster in 
Trino. Open data stack technologies like Hive and Spark retry the query from 
intermediate checkpoints when something fails. However, there’s a performance 
cost to this. Trino has always been focused on delivering query results as 
quickly as possible. Now, Trino performs task-level retries enabling failure 
recovery where needed for the more long-running queries. More on this later 
though.&lt;/p&gt;

&lt;p&gt;Second, most companies have widely dispersed and fragmented data. It’s typical
for most companies to have different storage systems for different use cases.
This only becomes more commonplace when a merger and acquisition happens, and
you have a ton of data stored in yet another location. The acquiring company 
ends up having key information living in a bunch of different places. The net 
result is that the data engineer ends up spending weeks to write that simple 
dashboard. The data scientist trying to understand a trend gets impeded whenever
trying to draw data from a new source and gives up.&lt;/p&gt;

&lt;p&gt;Third, data engineers want to spend their time writing business logic, not 
moving SQL between engines. Unfortunately, this is where they end up spending 
much of their time. Many do their ad-hoc analytics in Trino, because it provides
a far more interactive experience than any other engine. If they don’t just use
Trino, they have a 1,000 line SQL ETL job that they now need to convert into
another dialect. You just need to search “convert Spark Presto SQL Stack 
Overflow” to see the numerous challenges that people face moving between 
engines.&lt;/p&gt;

&lt;p&gt;Whether it’s the optimizations in one engine not working in the other, a UDF in
Trino not existing in Spark, strange differences in the SQL dialect tripping 
people up, or being extremely difficult to debug, these factors always cause a 
delay in completing their tasks. Data engineers are especially paranoid about 
converting SQL correctly. Imagine reporting an incorrect revenue metric 
externally, billing a user of your platform the incorrect amount, or delivering
the wrong content to users due to any of these issues.&lt;/p&gt;

&lt;h3 id=&quot;why-are-people-reluctant-to-do-their-etl-in-trino&quot;&gt;Why are people reluctant to do their ETL in Trino?&lt;/h3&gt;

&lt;p&gt;Before the drive for big data and technologies like Hadoop showed up on the 
scene, systems like Teradata, Netezza, and Oracle were used to run ETL pipelines
in a largely offline manner. If a query failed, you simply had to restart it. 
Systems would brag about the low failure rate of their systems.&lt;/p&gt;

&lt;p&gt;As Big Data came to the forefront, systems like the &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf&quot;&gt;Google File System&lt;/a&gt;,
that largely inspired the design for the Hadoop Distributed File System, aimed 
to build large distributed systems that supported fault-tolerance. In essence,
faults were expected, and if a node in the system failed, no data would be lost.&lt;/p&gt;

&lt;p&gt;At this same time, compute and storage systems were becoming separate systems. 
Just as storage was built with fault-tolerance, compute systems like MapReduce
that processed and transformed data was also &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf&quot;&gt;built with fault tolerance in mind&lt;/a&gt;.
Apache Hive is a syntax and metadata layer that enables generating MapReduce 
jobs without having to write code. Apache Spark came on the analytics scene
by &lt;a href=&quot;https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf&quot;&gt;introducing lineage&lt;/a&gt; 
as a way for engineers to have more control over how and when their datasets
are flushed to disk. This technique, while novel, still took a very pessimistic
view that allowing faults was the worst case scenario to avoid.&lt;/p&gt;

&lt;p&gt;When Trino was created, it was designed with speed in mind. Trino creators 
Martin, Dain, and David chose not to add fault-tolerance to Trino as they
recognized the tradeoff of fast analytics. Due to the nature of the streaming 
exchange in Trino all tasks are interconnected. A failure of any task results in
a query failure. To support long running queries Trino has to be able to 
tolerate task failures.&lt;/p&gt;

&lt;p&gt;Having an all-or-nothing architecture makes it significantly more difficult to 
tolerate faults, regardless of how rare they are. The likelihood of a failure 
grows with the time it takes to complete a query. This risk also increases as 
the resource demands, such as memory requirements of a query, grow. It’s 
impossible to know the exact memory requirements for processing a query upfront.
In addition to increased likelihood of a failure, the impact of failing a long 
running query is much higher, as it often results in a significant waste of time
and resources.&lt;/p&gt;

&lt;p&gt;You may think all-or-nothing is a model destined to fail, especially when 
scaling to petabytes of data. On the contrary, Trino’s predecessor Presto was 
commonly used to execute batch workloads at this scale at Facebook. Even today,
companies like &lt;a href=&quot;https://medium.com/salesforce-engineering/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36&quot;&gt;Salesforce&lt;/a&gt;, 
&lt;a href=&quot;https://www.starburst.io/resources/trino-summit/?wchannelid=2ug6mgs5ao&amp;amp;wmediaid=j1eq196a4y&quot;&gt;Doordash&lt;/a&gt;, 
and many others, use Trino at Petabyte scale to handle ETL workloads. While it 
is possible, scaling Trino to run petabyte scale ETL pipelines, you really have
to know what you’re doing.&lt;/p&gt;

&lt;p&gt;Resource management is another challenge. Users don’t know exactly what 
resource utilization to expect from a query they submit. It is challenging to 
properly size the cluster and to avoid resource related failures.&lt;/p&gt;

&lt;p&gt;In essence, most people avoid using Trino for ETL because they lack the 
understanding of how to correctly configure Trino at scale.&lt;/p&gt;

&lt;h3 id=&quot;what-are-the-limitations-of-the-current-architecture&quot;&gt;What are the limitations of the current architecture?&lt;/h3&gt;

&lt;p&gt;In the current architecture Trino plans all tasks for processing a specific 
query upfront. These tasks interconnect with one another as the results from
one task are the input for the next. This interdependency is necessary but 
if any task fails along the way, it breaks the entire chain.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/interconnected-tasks.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Data is streamed through task graph with no intermediate checkpointing. The 
query execution has just internal, volatile state of operators running within 
tasks.&lt;/p&gt;

&lt;p&gt;As stated before, this architecture has advantages. Most notably high throughput
and low latency. Yet it implies some limitations too. Probably the most natural
one is that it does not allow for granular failure recovery. If one of the tasks
dies there is no way to restart processing from some intermediary state. The 
only option is to rerun the whole query from the very beginning.&lt;/p&gt;

&lt;p&gt;The other notable limitation is around memory consumption. With static task 
placement we have little control over resource utilization on nodes.&lt;/p&gt;

&lt;p&gt;Finally, the current architecture makes many decisions upfront during query
planning. The engine creates a query plan based on incomplete data using table 
statistics, or blindly, if statistics are not available. After the coordinator 
creates the plan, and query processing started, there aren’t many ways to adapt.
We have much more information during query execution at runtime. For example, we
cannot change the number of tasks for a stage. If we observe data skew, we can’t 
move tasks away from the overworked node, so the affected tasks have more 
resources at hand. We cannot change the plan for a subquery, if we notice that 
decision already made is not optimal.&lt;/p&gt;

&lt;h3 id=&quot;trino-engine-improvements-with-project-tardigrade&quot;&gt;Trino engine improvements with Project Tardigrade&lt;/h3&gt;

&lt;p&gt;Project Tardigrade aims to break the all-or-nothing execution barriers. It opens
many new opportunities around resource management, adaptive query optimization,
and failure recovery. We will use a technique called spooling that stores 
intermediate data in an efficient buffering layer at stage boundaries. The 
buffer stores intermediate results for the duration of a query or a stage, 
depending on the context. The project is named after the microscopic &lt;a href=&quot;https://en.wikipedia.org/wiki/Tardigrade&quot;&gt;Tardigrades&lt;/a&gt;
that are the world’s most indestructible creatures, akin to the resiliency we 
are adding to Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/tardigrade-logo.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Buffering intermediate results makes it possible to execute queries iteratively.
For example, the engine can process one or several tasks at a time, effectively 
reducing memory pressure, and allow memory intensive queries to succeed without 
a need to expand the cluster. Tardigrade can significantly lower cost of 
operation, specifically for the situation when only a small number of queries 
requires more memory than available.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/tardigrade-buffers.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h4 id=&quot;adaptive-planning&quot;&gt;Adaptive planning&lt;/h4&gt;

&lt;p&gt;The engine may also decide to re-optimize the query at stage boundaries. When&lt;br /&gt;
the engine buffers the intermediate data, it is possible to get better insight
into the nature of the data as it’s processed and adapt query plans accordingly.
For example, when the cost based optimizer makes a bad decision, because of 
incorrect statistics or estimates, it can pick the wrong type of join, or a 
suboptimal join order. The engine can then suspend the query, re-optimize the 
plan, and resume processing. Additionally, it may allow the engine to discover 
skewed datasets, and change query plans accordingly. This may significantly 
improve efficiency and landing time for workloads that are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; heavy.&lt;/p&gt;

&lt;h4 id=&quot;resource-management&quot;&gt;Resource management&lt;/h4&gt;

&lt;p&gt;Iterative query processing allows us to be more flexible at resource management.
Resource allocation can be adjusted as the queries run. For example, when a 
cluster is idle, we may allow a single query to utilize all available resources
on a cluster. When more workload kicks in, the resource allocation for the 
initial query can be gradually reduced, and available resources can be granted
to newly submitted workloads. With this model it is also significantly easier to
implement auto scaling. When the submitted workload requires more resources than
currently available in the cluster, the engine can request more nodes. Or the
opposite, if the cluster is underutilized it is easier to return resources when 
there’s no need to wait for slow running tasks. Being able to better manage 
available resources, and adjust the resource pool based on the current workload 
submitted, would make the engine significantly more cost effective.&lt;/p&gt;

&lt;h4 id=&quot;fine-grained-failure-recovery&quot;&gt;Fine-grained failure recovery&lt;/h4&gt;

&lt;p&gt;Last, but not least, with project Tardigrade we are going to provide 
fine-grained failure recovery. The buffering introduced at stage boundaries 
allows for a transparent restart of failed tasks. Fine grained failure recovery
would make completion time for ETL pipelines significantly more predictable. 
Also, it opens the opportunity of running ETL workloads on much cheaper, widely 
available spot instances that can further optimize operational costs.&lt;/p&gt;

&lt;h3 id=&quot;opportunities-that-tardigrade-opens&quot;&gt;Opportunities that Tardigrade opens&lt;/h3&gt;

&lt;p&gt;In summary, in Project Tardigrade we work on the following improvements to Trino:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Predictable query completion times.&lt;/li&gt;
  &lt;li&gt;The ability to scale up or down to match the workload at runtime.&lt;/li&gt;
  &lt;li&gt;Fine grained resource management.&lt;/li&gt;
  &lt;li&gt;Non-homogenous hardware.&lt;/li&gt;
  &lt;li&gt;Adaptive resource limits for tasks.&lt;/li&gt;
  &lt;li&gt;Graceful Shutdown improvement.&lt;/li&gt;
  &lt;li&gt;Cheaper compute costs using spot instances that have lower failure guarantees.&lt;/li&gt;
  &lt;li&gt;Enables adaptive query replanning during runtime as context changes.&lt;/li&gt;
  &lt;li&gt;Handle situations where certain tasks are affected by data skew.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;efficient-exchange-data-buffering-implementation&quot;&gt;Efficient exchange data buffering implementation&lt;/h3&gt;

&lt;p&gt;This all sounds incredible, but it begs the question of how to best implement
these buffers? Enabling task-level retry requires us to store intermediate 
exchange data to a “distributed buffer”. In order to minimize the level of 
disturbance buffering has on the query performance, there needs to be careful 
design consideration.&lt;/p&gt;

&lt;p&gt;A naive implementation is to use a cloud object storage as intermediate storage.
This allows you to scale without maintaining a separate service. This is the 
initial option we are using as a prototype buffer. It is intended as a 
proof-of-concept and should be good enough for small clusters of ten to twenty
nodes. This option can be slow and won’t support high-cardinality exchanges. The
number of files grows quadratically with the number of partitions. Trino then 
has keep track of the metadata of all these files in order to plan and schedule
which tasks require which files for the query. With the high amount of files, 
there is memory cost to hold that metadata. There is also a penalty for the time
and bandwidth it takes on the network to list them all. This is a well know many
small files problem in big data.&lt;/p&gt;

&lt;h4 id=&quot;distributed-memory-with-spilling-as-a-buffer&quot;&gt;Distributed memory with spilling as a buffer&lt;/h4&gt;

&lt;p&gt;This solution requires a long-running managed service, but improves performance.
Depending on the design we choose, we can use write-ahead buffers to output data 
belonging to the same partition and provide sequential I/O to downstream tasks.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;70%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/32/buffer-implementation.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-month-task-retries-with-project-tardigrade&quot;&gt;Demo of the month: Task retries with Project Tardigrade&lt;/h2&gt;

&lt;p&gt;In this months demo, Zebing showcases task retries using Project Tardigrade 
after throwing his EC2 instance out the window! See what happens next…&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Tnd-QsDCd2Q&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&quot;pr-of-the-month-pr-10319-trino-lineage-fails-for-aliasedrelation&quot;&gt;PR of the month: PR 10319 Trino lineage fails for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AliasedRelation&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://github.com/trinodb/trino/pull/10319&quot;&gt;PR of the month&lt;/a&gt; was
created to resolve &lt;a href=&quot;https://github.com/trinodb/trino/issues/10272&quot;&gt;an issue&lt;/a&gt; 
reported by Lyft Data Infrasturcture Engineer, Arup Malakar (&lt;a href=&quot;https://github.com/amalakar&quot;&gt;@amalakar&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Arup reported that Trino lineage fails to capture upstream columns when join and
transformation is used. This issue more generally applied to any column used 
with function where its argument are from a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AliasedRelation&lt;/code&gt;. Starburst
engineer, Praveen Krishna (&lt;a href=&quot;https://github.com/Praveen2112&quot;&gt;@Praveen2112&lt;/a&gt;), 
resolved the issue two days later, and with the help of Arup and the Lyft team,
tested the fix works!&lt;/p&gt;

&lt;p&gt;Thanks to both Arup and Praveen for the fix!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-month-how-do-you-cast-json-to-varchar-with-trino&quot;&gt;Question of the month: How do you cast JSON to varchar with Trino?&lt;/h2&gt;

&lt;p&gt;This month’s &lt;a href=&quot;https://stackoverflow.com/questions/70701325&quot;&gt;question of the month&lt;/a&gt; 
comes from &lt;a href=&quot;https://stackoverflow.com/users/10924136&quot;&gt;Borislav Blagoev&lt;/a&gt; on Stack
Overflow. He asks, “How do you cast JSON to varchar with Trino?”&lt;/p&gt;

&lt;p&gt;This was answered by &lt;a href=&quot;https://stackoverflow.com/users/2501279&quot;&gt;Guru Stron&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Use &lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json_format&quot;&gt;json_format&lt;/a&gt;/
&lt;a href=&quot;https://trino.io/docs/current/functions/json.html#json_parse&quot;&gt;json_parse&lt;/a&gt; to handle json object conversions instead of casting:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;select json_parse(&apos;{&quot;property&quot;: 1}&apos;) objstring_to_json, json_format(json &apos;{&quot;property&quot;: 2}&apos;) jsonobj_to_string
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Output:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;objstring_to_json&lt;/th&gt;
      &lt;th&gt;jsonobj_to_string&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;{“property”:1}&lt;/td&gt;
      &lt;td&gt;{“property”:2}&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/salesforce-engineering/how-to-etl-at-petabyte-scale-with-trino-5fe8ac134e36&quot;&gt;How to ETL at Petabyte-Scale with Trino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino 2021 Wrapped: A Year of Growth</title>
      <link href="https://trino.io/blog/2021/12/31/trino-2021-a-year-of-growth.html" rel="alternate" type="text/html" title="Trino 2021 Wrapped: A Year of Growth" />
      <published>2021-12-31T00:00:00+00:00</published>
      <updated>2021-12-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/12/31/trino-2021-a-year-of-growth</id>
      <content type="html" xml:base="https://trino.io/blog/2021/12/31/trino-2021-a-year-of-growth.html">&lt;p&gt;As we reflect on Trino’s journey in 2021, one thing stands out. Compared to 
previous years we have seen even further accelerated, tremendous growth. Yes,
this is what all these year-in-retrospect blog posts say, but this has some 
special significance to it. This week marked the one-year anniversary since the 
project &lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;dropped the Presto name and moved to the Trino name&lt;/a&gt;.
Immediately after the announcement, the &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;Trino GitHub repository&lt;/a&gt;
started trending in number of stargazers. Up until this point, the PrestoSQL
GitHub repository had only amassed 1,600 stargazers in the two years since it 
had split from the PrestoDB repository. However, within four months after the 
renaming, the number of stargazers had doubled. GitHub stars, issues, pull 
requests and commits started growing at a new trajectory.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;a href=&quot;https://twitter.com/bitsondatadev/status/1344028682126565381&quot; target=&quot;_blank&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/2021-review/trending.png&quot; /&gt;
 &lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;At the time of writing, we just hit 4,600 stargazers on GitHub. This means, we 
have grown by over 3,000 stargazers in the last year, a 187% increase. While we 
are on the subject, let’s talk about the health of the Trino community.&lt;/p&gt;

&lt;h2 id=&quot;2021-by-the-numbers&quot;&gt;2021 by the numbers&lt;/h2&gt;

&lt;p&gt;Let’s take a look at the Trino project growth by the numbers:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;3679 new commits 💻 in GitHub&lt;/li&gt;
  &lt;li&gt;3015 new stargazers ⭐ in GitHub&lt;/li&gt;
  &lt;li&gt;2450 new members 👋 in Slack&lt;/li&gt;
  &lt;li&gt;1979 pull requests merged ✅ in GitHub&lt;/li&gt;
  &lt;li&gt;1213 issues 📝 created in GitHub&lt;/li&gt;
  &lt;li&gt;988 new followers 🐦 on Twitter&lt;/li&gt;
  &lt;li&gt;525 average weekly members 💬 in Slack&lt;/li&gt;
  &lt;li&gt;491 new subscribers 📺 in YouTube&lt;/li&gt;
  &lt;li&gt;23 Trino Community Broadcast ▶️ episodes&lt;/li&gt;
  &lt;li&gt;17 Trino 🚀 releases&lt;/li&gt;
  &lt;li&gt;13 blog ✍️ posts&lt;/li&gt;
  &lt;li&gt;10 Trino 🍕 meetups&lt;/li&gt;
  &lt;li&gt;1 Trino ⛰️ Summit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Along with the growth we’ve seen in GitHub, we have seen a 47% growth of &lt;a href=&quot;https://twitter.com/trinodb&quot;&gt;the Trino Twitter&lt;/a&gt; 
followers this year. &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;The Trino Slack community&lt;/a&gt;,
where a large amount of troubleshooting and development discussions occur, saw a
75% growth, nearing 6,000 members. Finally, &lt;a href=&quot;https://www.youtube.com/c/TrinoDB&quot;&gt;the Trino YouTube channel&lt;/a&gt;
has seen an impressive 280% growth in subscribers.&lt;/p&gt;

&lt;p&gt;A lot of the increase on this channel was due to the &lt;a href=&quot;/broadcast/&quot;&gt;Trino Community Broadcast&lt;/a&gt;, 
that brought users and contributors from the community to cover 23 episodes
about the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;7 episodes on the Trino ecosystem (dbt, Amundsen, Debezium, Superset)&lt;/li&gt;
  &lt;li&gt;4 episodes on the Trino project (Renaming Trino, Intro to Trino, Trinewbies)&lt;/li&gt;
  &lt;li&gt;4 episodes on Trino connectors (Iceberg, Druid, Pinot)&lt;/li&gt;
  &lt;li&gt;4 episodes on Trino internals (Distributed Hash-Joins, Dynamic Filtering, Views)&lt;/li&gt;
  &lt;li&gt;2 episodes on Trino using Kubernetes (Trinetes series)&lt;/li&gt;
  &lt;li&gt;2 episodes on Trino users (LinkedIn, Resurface)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While stargazers, subscribers, episodes, and followers tell the story of the 
growing awareness of the Trino project with the new name, what about the actual
rate of development on the project?&lt;/p&gt;

&lt;p&gt;At the start of the year, there were 21,924 commits. This year, we pushed 3,679 
commits to the repository, sitting at over 25,600 now. Looking at the graph, this
keeps us pretty consistent with 2020’s throughput.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; src=&quot;/assets/blog/2021-review/commits.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;With the project’s trajectory displayed in numbers, let’s examine the top 
features that landed in Trino this year.&lt;/p&gt;

&lt;h2 id=&quot;features&quot;&gt;Features&lt;/h2&gt;

&lt;p&gt;Here’s a high-level list of the most exciting features that made their way into
Trino in 2021. For details and to keep up you can check out the &lt;a href=&quot;https://trino.io/docs/current/release.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;sql-language-improvements&quot;&gt;SQL language improvements&lt;/h3&gt;

&lt;p&gt;SQL language support is crucial for the increasing complexities of queries and 
usage of Trino. In 2021 we added numerous new language features and 
improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2021/05/19/row_pattern_matching.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;&lt;/a&gt;
a feature that allows for complex analysis across multiple rows. To learn more 
about this feature watch &lt;a href=&quot;/episodes/23.html&quot;&gt;the Community Broadcast show&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#window-clause&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt;&lt;/a&gt; clause.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2021/03/10/introducing-new-window-features.html#new%20features&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt;&lt;/a&gt;
keyword for usage within a window function.&lt;/li&gt;
  &lt;li&gt;Time travel support and syntax, like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR VERSION AS OF&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FOR TIMESTAMP AS OF&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/update.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt;&lt;/a&gt; is supported.&lt;/li&gt;
  &lt;li&gt;Subquery expressions that return multiple columns. Example: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT x = (VALUES (1, &apos;a&apos;))&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW&lt;/code&gt; … &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RENAME TO&lt;/code&gt; …&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/geospatial.html#from_geojson_geometry&quot;&gt;from_geojson_geometry/to_geojson_geometry&lt;/a&gt; functions.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/ipaddress.html#ip-address-contains&quot;&gt;contains&lt;/a&gt; 
function for checking if a CIDR contains an IP address.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/aggregate.html#listagg&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;listagg&lt;/code&gt;&lt;/a&gt;
function returns concatenated values seperated by a specified separator.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/string.html#soundex&quot;&gt;soundex&lt;/a&gt; function
that checks phonetic similarity of two strings.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/functions/conversion.html#format_number&quot;&gt;format_number&lt;/a&gt; function.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/set-time-zone.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt;&lt;/a&gt; to set the
 current time zone for the session.&lt;/li&gt;
  &lt;li&gt;Arbitrary queries in &lt;a href=&quot;https://trino.io/docs/current/sql/show-stats.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW STATS&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_CATALOG&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_SCHEMA&lt;/code&gt; session functions.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; which allows for a more efficient delete.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DENY&lt;/code&gt; statement, which enables you to remove a user or groups access via SQL.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN &amp;lt;catalog&amp;gt;&lt;/code&gt; clause to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE ROLE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP ROLE&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GRANT ROLE&lt;/code&gt;, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REVOKE ROLE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET ROLE&lt;/code&gt; to specify the target catalog of the statement 
instead of using the current session catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;query-processing-improvements&quot;&gt;Query processing improvements&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for automatic query retries (this feature is very experimental
with some limitations for now).&lt;/li&gt;
  &lt;li&gt;Transparent query retries.&lt;/li&gt;
  &lt;li&gt;Updated the behavior of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; cast to produce &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; objects instead
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; arrays.&lt;/li&gt;
  &lt;li&gt;Column and table lineage tracking in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QueryCompletedEvent&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;performance-improvements&quot;&gt;Performance improvements&lt;/h2&gt;

&lt;p&gt;Improved performance for the following operations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Querying Parquet data for files containing column indexes.&lt;/li&gt;
  &lt;li&gt;Reading dictionary-encoded Parquet files.&lt;/li&gt;
  &lt;li&gt;Queries using &lt;a href=&quot;https://trino.io/docs/current/functions/window.html#rank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rank()&lt;/code&gt;&lt;/a&gt; window function.&lt;/li&gt;
  &lt;li&gt;Queries using &lt;a href=&quot;https://trino.io/docs/current/functions/aggregate.html#sum&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum()&lt;/code&gt;&lt;/a&gt;
and &lt;a href=&quot;https://trino.io/docs/current/functions/aggregate.html#avg&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;avg()&lt;/code&gt;&lt;/a&gt; for 
decimal types.&lt;/li&gt;
  &lt;li&gt;Queries using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; with single grouping column.&lt;/li&gt;
  &lt;li&gt;Aggregation on decimal values.&lt;/li&gt;
  &lt;li&gt;Evaluation of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Computing the product of decimal values with precision larger than 19.&lt;/li&gt;
  &lt;li&gt;Queries that process row or array data.&lt;/li&gt;
  &lt;li&gt;Queries that contain a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Reduced memory usage and improved performance of joins.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY LIMIT&lt;/code&gt; performance was improved when data was pre-sorted.&lt;/li&gt;
  &lt;li&gt;Node-local Dynamic Filtering&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;p&gt;Added the following improvements and features relevant for authentication, 
authorization and integration with other security systems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Automatic configuration of TLS for 
&lt;a href=&quot;https://trino.io/docs/current/security/internal-communication.html&quot;&gt;secure internal communication&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Handling of Server Name Indication (SNI) for multiple TLS certificates.
This removes the need to provision per-worker TLS certificates.&lt;/li&gt;
  &lt;li&gt;Access control for materialized views.&lt;/li&gt;
  &lt;li&gt;OAuth2/OIDC &lt;a href=&quot;https://trino.io/docs/current/security/oauth2.html&quot;&gt;opaque access tokens&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Configuring HTTP proxy for OAuth2 authentication.&lt;/li&gt;
  &lt;li&gt;Configuring &lt;a href=&quot;https://trino.io/docs/current/security/authentication-types.html#multiple-password-authenticators&quot;&gt;multiple password authentication plugins&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Hiding inaccessible columns from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT *&lt;/code&gt; statement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;data-sources&quot;&gt;Data Sources&lt;/h2&gt;

&lt;h3 id=&quot;bigquery-connector&quot;&gt;BigQuery connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP TABLE&lt;/code&gt; support.&lt;/li&gt;
  &lt;li&gt;Added support for case insensitive name matching for BigQuery views.&lt;/li&gt;
  &lt;li&gt;Support reading &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bignumeric&lt;/code&gt; type whose precision is less than or equal to 
38.&lt;/li&gt;
  &lt;li&gt;Added support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE SCHEMA&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP SCHEMA&lt;/code&gt; statements.&lt;/li&gt;
  &lt;li&gt;Improved support for BigQuery datetime and timestamp types.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;cassandra-connector&quot;&gt;Cassandra connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Mapped Cassandra &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid&lt;/code&gt; type to Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Added support for Cassandra &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tuple&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Changed minimum number of speculative executions from two to one.&lt;/li&gt;
  &lt;li&gt;Support for reading user-defined types.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;clickhouse-connector&quot;&gt;Clickhouse connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added &lt;a href=&quot;https://trino.io/docs/current/connector/clickhouse.html&quot;&gt;ClickHouse connector&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Improved performance of aggregation queries by computing aggregations within 
ClickHouse. Currently, the following aggregate functions are eligible for
pushdown: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;count&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;min&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;avg&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Added support for dropping columns.&lt;/li&gt;
  &lt;li&gt;Map ClickHouse &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; columns as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; type in Trino instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;hdfs-s3-azure-and-cloud-object-storage-systems&quot;&gt;HDFS, S3, Azure and cloud object storage systems&lt;/h3&gt;

&lt;p&gt;A core use case of Trino uses the Hive and Iceberg connectors to connect to
a data lake. These connectors differ from most as Trino is the sole query engine
as opposed to the client calling another system. Here are some changes that
for these connectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Enabled Glue statistics to support better query planning when using AWS.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; support for ACID tables&lt;/li&gt;
  &lt;li&gt;A lot of Hive view improvements.&lt;/li&gt;
  &lt;li&gt;Parquet column indexes.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;target_max_file_size&lt;/code&gt; configuration to control the file size of data written
by Trino.&lt;/li&gt;
  &lt;li&gt;Streaming uploads to S3 by default to improve performance and reduce disk usage.&lt;/li&gt;
  &lt;li&gt;Improved performance for tables with small files and partitioned tables.&lt;/li&gt;
  &lt;li&gt;Transparent redirection from a Hive catalog to Iceberg catalog if the table is
an Iceberg table.&lt;/li&gt;
  &lt;li&gt;Updated to Iceberg 0.11.0 behavior for transforms of dates and timestamps
before 1970.&lt;/li&gt;
  &lt;li&gt;Added procedure &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.flush_metadata_cache()&lt;/code&gt; to flush metadata caches.&lt;/li&gt;
  &lt;li&gt;Avoid generating splits for empty files.&lt;/li&gt;
  &lt;li&gt;Sped up Iceberg query performance when dynamic filtering can be leveraged.&lt;/li&gt;
  &lt;li&gt;Increased Iceberg performance when reading timestamps from Parquet files.&lt;/li&gt;
  &lt;li&gt;Improved Iceberg performance for queries on nested data through dereference
pushdown.&lt;/li&gt;
  &lt;li&gt;Added support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt; operations on S3-backed tables.&lt;/li&gt;
  &lt;li&gt;Made the Iceberg &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid&lt;/code&gt; type available.&lt;/li&gt;
  &lt;li&gt;Trino views made available in Iceberg.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;elasticsearch-connector&quot;&gt;Elasticsearch connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for reading fields as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; values.&lt;/li&gt;
  &lt;li&gt;Fixed failure when documents contain fields of unsupported types.&lt;/li&gt;
  &lt;li&gt;Added support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaled_float&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Added support for assuming an IAM role.&lt;/li&gt;
  &lt;li&gt;Added retry requests with backoff when Elasticsearch is overloaded.&lt;/li&gt;
  &lt;li&gt;Better support for Elastic Cloud.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;mongodb-connector&quot;&gt;MongoDB connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added &lt;a href=&quot;https://trino.io/docs/current/connector/mongodb.html#timestamp_objectid&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp_objectid()&lt;/code&gt;&lt;/a&gt;
function.&lt;/li&gt;
  &lt;li&gt;Enabled &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mongodb.socket-keep-alive&lt;/code&gt; config property by default.&lt;/li&gt;
  &lt;li&gt;Add support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Support reading MongoDB &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DBRef&lt;/code&gt; type.&lt;/li&gt;
  &lt;li&gt;Allow skipping creation of an index for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_schema&lt;/code&gt; collection, if it 
already exists.&lt;/li&gt;
  &lt;li&gt;Added support to redact the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mongodb.credentials&lt;/code&gt; in the server log.&lt;/li&gt;
  &lt;li&gt;Added support for dropping columns.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;mysql-connector&quot;&gt;MySQL connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Added support for reading and writing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; values with precision higher
than three.&lt;/li&gt;
  &lt;li&gt;Added support for predicate pushdown on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp&lt;/code&gt; columns.&lt;/li&gt;
  &lt;li&gt;Exclude an internal &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sys&lt;/code&gt; schema from schema listings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;pinot-connector&quot;&gt;Pinot connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Updated Pinot connector to be compatible with versions &amp;gt;= 0.8.0 and drop 
support for older versions.&lt;/li&gt;
  &lt;li&gt;Added support for pushdown of filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varbinary&lt;/code&gt; columns to Pinot.&lt;/li&gt;
  &lt;li&gt;Fixed incorrect results for queries that contain aggregations and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT IN&lt;/code&gt; filters over varchar columns.&lt;/li&gt;
  &lt;li&gt;Fixed failure for queries with filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; columns having 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+Infinity&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Infinity&lt;/code&gt; values.&lt;/li&gt;
  &lt;li&gt;Implemented aggregation pushdown.&lt;/li&gt;
  &lt;li&gt;Allowed HTTPS URLs in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.controller-urls&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;phoenix-connector&quot;&gt;Phoenix connector&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Phoenix 5 support was added.&lt;/li&gt;
  &lt;li&gt;Reduced memory usage for some queries.&lt;/li&gt;
  &lt;li&gt;Improved performance by adding ability to parallelize queries within Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;features-added-to-various-connectors&quot;&gt;Features added to various connectors&lt;/h3&gt;

&lt;p&gt;In addition to the above some more features were added that apply to connectors
that use common code. These features improve performance using:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-352.html#mysql-connector&quot;&gt;Statistical aggregate function pushdown &lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-353.html&quot;&gt;TopN pushdown and join pushdown&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-353.html&quot;&gt;Improved planning times by reducing number of connections opened&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;Improved performance by improving metadata caching hit rate&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-357.html&quot;&gt;Rule based identifier mapping support&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-360.html&quot;&gt;DELETE, non-transactional inserts and write-batch-size &lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-361.html&quot;&gt;Metadata cache max size&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-365.html&quot;&gt;TRUNCATE TABLE&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-366.html&quot;&gt;Improved handling of Gregorian - Julian switch for date type&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Ensured correctness when pushing down predicates and topN to remote system 
that is case-insensitive or sorts differently from Trino.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;runtime-improvements&quot;&gt;Runtime improvements&lt;/h2&gt;

&lt;p&gt;There are a lot of performance improvements to list from the &lt;a href=&quot;https://trino.io/docs/current/release.html&quot;&gt;release notes&lt;/a&gt;.
Here are a few examples:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved coordinator CPU utilization.&lt;/li&gt;
  &lt;li&gt;Improved query performance by reducing CPU overhead of repartitioning data 
across worker nodes.&lt;/li&gt;
  &lt;li&gt;Reduced graceful shutdown time for worker nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;everything-else&quot;&gt;Everything else&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/admin/event-listeners-http.html&quot;&gt;HTTP Event listener&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Added support for ARM64 in the &lt;a href=&quot;https://hub.docker.com/r/trinodb/trino&quot;&gt;Trino Docker image&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Added &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;clear&lt;/code&gt; command to the Trino CLI to clear the screen.&lt;/li&gt;
  &lt;li&gt;Improved tab completion for the Trino CLI.&lt;/li&gt;
  &lt;li&gt;Custom connector metrics.&lt;/li&gt;
  &lt;li&gt;Fixed many, many, many bugs!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit&quot;&gt;Trino Summit&lt;/h2&gt;

&lt;p&gt;In 2021 we also enjoyed a successful inaugural Trino Summit, hosted by 
Starburst, with well over 500 attendees. There were wonderful talks
given at this event from companies like Doordash, EA, LinkedIn, Netflix, 
Robinhood, Stream Native, and Tabular. If you missed this event, we have the 
&lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;recordings and slides available&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As a teaser, the event started with Commander Bun Bun playing guitar to AC/DC’s,
“Back In Black”.&lt;/p&gt;

&lt;iframe src=&quot;https://www.youtube.com/embed/c_qUp0SGeKE&quot; width=&quot;800&quot; height=&quot;500&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;

&lt;h2 id=&quot;renaming-from-prestosql-to-trino&quot;&gt;Renaming from PrestoSQL to Trino&lt;/h2&gt;

&lt;p&gt;As mentioned above, we renamed the project this year. What followed, was an 
outpouring of support and shock from the larger tech community. Community 
members immediately got to work. The project had to change the namespace 
practically overnight from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.prestosql&lt;/code&gt; namespace to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.trino&lt;/code&gt; and a 
&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;migration blog post&lt;/a&gt;
was published. Due to the hasty nature of the Linux Foundation to enforce the
Presto trademark, users had to adapt quickly.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;a href=&quot;https://twitter.com/trinodb/status/1343330429684703232?s=20&quot; target=&quot;_blank&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;100%&quot; src=&quot;/assets/blog/2021-review/tweets.png&quot; /&gt;
 &lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This &lt;a href=&quot;https://stackoverflow.com/questions/67414714&quot;&gt;confused many in the community&lt;/a&gt;,
especially once the ownership of old PrestoSQL accounts were taken down by the
Linux Foundation. The &lt;a href=&quot;https://prestosql.io&quot;&gt;https://prestosql.io&lt;/a&gt; site had broken documentation links,
JDBC urls had to change from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:presto&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:trino&lt;/code&gt;, header protocol
names had to be changed from prefix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Presto-&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Trino-&lt;/code&gt;, and various other
user impacting changes had to be made in the matter of weeks. Even the legacy 
Docker images were removed from the &lt;a href=&quot;https://hub.docker.com/r/prestosql/presto&quot;&gt;prestosql/presto Docker repository&lt;/a&gt;,
causing disruptions for many users who immediately had to upgrade to the 
&lt;a href=&quot;https://hub.docker.com/r/trinodb/trino&quot;&gt;trinodb/trino Docker repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We reached out to multiple projects to update compatibility to
Trino.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dbeaver/dbeaver/pull/10925&quot;&gt;DBeaver&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pinterest/querybook/issues/509&quot;&gt;QueryBook&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Homebrew/homebrew-core/pull/83185&quot;&gt;Homebrew&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dbt-labs/dbt-presto/issues/39&quot;&gt;dbt&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dungdm93/sqlalchemy-trino/issues/20&quot;&gt;sqlalchemy&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/sqlpad/sqlpad/pull/974&quot;&gt;sqlpad&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/superset/pull/13105&quot;&gt;Apache Superset&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/getredash/redash/pull/5411&quot;&gt;Redash&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/akullpp/awesome-java/pull/917&quot;&gt;Awesome Java&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/MunGell/awesome-for-beginners/pull/933&quot;&gt;Awesome For Beginners&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/airflow/pull/15187&quot;&gt;Airflow&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lyft/presto-gateway/issues/134&quot;&gt;trino-gateway&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/metabase/metabase/issues/17532&quot;&gt;Metabase&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;and so much more…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite the breaking changes, once the immediate hurdles fell behind, not only 
was the community excited and supportive about the brand change, but
particularly they were all loving the new mascot. Our adorable bunny was soon 
after &lt;a href=&quot;/episodes/10.html&quot;&gt;named Commander Bun Bun by the community&lt;/a&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;a href=&quot;https://twitter.com/jtannady/status/1346888143459545092&quot; target=&quot;_blank&quot;&gt;
   &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/2021-review/cbb.png&quot; /&gt;
 &lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;2022-roadmap-project-tardigrade&quot;&gt;2022 Roadmap: Project Tardigrade&lt;/h2&gt;

&lt;p&gt;One of the interesting developments that came out of Trino Summit was a feature
Trino co-creator, Martin, talked about in &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/?wchannelid=2ug6mgs5ao&amp;amp;wmediaid=o264qw85dj&quot;&gt;the State of Trino presentation&lt;/a&gt;.
He proposed adding granular fault-tolerance and features to improve performance 
in the core engine. While Trino has been proven to run batch analytics workloads
at scale, many have avoided long-running batch jobs in fear of a query failure. 
The fault-tolerance feature introduces a first step for the Trino project to 
gain first-class support for long-running batch queries at massive scale.&lt;/p&gt;

&lt;p&gt;The granular fault-tolerance is being thoughtfully crafted to maintain the 
speed advantage that Trino has over other query engines, while increasing the 
resiliency of queries. In other words, rather than when a query runs out of
resources or fails for any other reason, a subset of the query is
retried. To support this intermediate stage data is persisted to replicated RAM 
or SSD.&lt;/p&gt;

&lt;p&gt;&lt;a title=&quot;Schokraie E, Warnken U, Hotz-Wagenblatt A, Grohme MA, Hengherr S, et al. (2012), CC BY 2.5 &amp;lt;https://creativecommons.org/licenses/by/2.5&amp;gt;, via Wikimedia Commons&quot; href=&quot;https://commons.wikimedia.org/wiki/File:SEM_image_of_Milnesium_tardigradum_in_active_state_-_journal.pone.0045682.g001-2.png&quot;&gt;&lt;img width=&quot;512&quot; alt=&quot;SEM image of Milnesium tardigradum in active state - journal.pone.0045682.g001-2&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/SEM_image_of_Milnesium_tardigradum_in_active_state_-_journal.pone.0045682.g001-2.png/512px-SEM_image_of_Milnesium_tardigradum_in_active_state_-_journal.pone.0045682.g001-2.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project to introduce granular fault-tolerance into Trino is called
Project Tardigrade. It is a focus for many contributors now, and we will 
introduce you to details in the coming months. The project is named after the 
microscopic Tardigrades that are the worlds most indestructible creatures, akin
to the resiliency we are adding to Trino’s queries. We look forward to telling 
you more as features unfold.&lt;/p&gt;

&lt;p&gt;Along with Project Tardigrade will be a series of changes focused around faster
performance in the query engine using columnar evaluation, adaptive planning,
and better scheduling for SIMD and GPU processors. We also will be working on
dynamically resolved functions, MERGE support, Time Travel queries in data lake
connectors, Java 17, improved caching mechanisms, and much much more!&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In summary, living this first year under the banner of Trino was nothing short
of a wild endeavor. Any engineer knows that naming things is hard, and renaming
things is all the more difficult.&lt;/p&gt;

&lt;p&gt;As we head into 2022, we can be certain of one thing. Trino will be reaching 
into newer areas of development and breaking norms just as it did as Presto in 
previous eras. The adoption of native fault-tolerance to a lightning fast query
engine will bring Trino to a new level of adoption. Keep your eyes peeled for 
more about Project Tardigrade.&lt;/p&gt;

&lt;p&gt;Along with Project Tardigrade, we are looking forward to another year filled
with features, issues, and suggestions from our amazing and passionate community.
Thank you all for an incredible year. We can’t wait to see what you all bring in
2022!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen, Martin Traverso, Manfred Moser</name>
        </author>
      

      <summary>As we reflect on Trino’s journey in 2021, one thing stands out. Compared to previous years we have seen even further accelerated, tremendous growth. Yes, this is what all these year-in-retrospect blog posts say, but this has some special significance to it. This week marked the one-year anniversary since the project dropped the Presto name and moved to the Trino name. Immediately after the announcement, the Trino GitHub repository started trending in number of stargazers. Up until this point, the PrestoSQL GitHub repository had only amassed 1,600 stargazers in the two years since it had split from the PrestoDB repository. However, within four months after the renaming, the number of stargazers had doubled. GitHub stars, issues, pull requests and commits started growing at a new trajectory.</summary>

      
      
    </entry>
  
    <entry>
      <title>31: Trinites II: Trino on AWS Kubernetes Service</title>
      <link href="https://trino.io/episodes/31.html" rel="alternate" type="text/html" title="31: Trinites II: Trino on AWS Kubernetes Service" />
      <published>2021-12-16T00:00:00+00:00</published>
      <updated>2021-12-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/31</id>
      <content type="html" xml:base="https://trino.io/episodes/31.html">&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;If you missed &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;Trino Summit 2021&lt;/a&gt;,
you can watch it on demand, for free!&lt;/p&gt;

&lt;h2 id=&quot;releases-365-and-366&quot;&gt;Releases 365 and 366&lt;/h2&gt;

&lt;p&gt;Martin’s official announcement mentioned the following highlights:&lt;/p&gt;

&lt;p&gt;Trino 365&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Aggregations in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Compatibility with Pinot 0.8.0&lt;/li&gt;
  &lt;li&gt;HTTP proxy support for OAuth2 authentication&lt;/li&gt;
  &lt;li&gt;Many improvements to Iceberg connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Release notes: &lt;a href=&quot;https://trino.io/docs/current/release/release-365.html&quot;&gt;https://trino.io/docs/current/release/release-365.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Trino 366&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for automatic query retries&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DENY&lt;/code&gt; security rules&lt;/li&gt;
  &lt;li&gt;Performance optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Release notes: &lt;a href=&quot;https://trino.io/docs/current/release/release-366.html&quot;&gt;https://trino.io/docs/current/release/release-366.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Cool new SQL like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt; and support for time travel&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;contains&lt;/code&gt; function for IP check in CIDR&lt;/li&gt;
  &lt;li&gt;Lots of performance and correctness fixes on Hive and Iceberg connectors&lt;/li&gt;
  &lt;li&gt;Drop support for old Pinot versions&lt;/li&gt;
  &lt;li&gt;Support for Hive to Iceberg redirects&lt;/li&gt;
  &lt;li&gt;Automatic TLS for internal communication support for Java 17&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And a last note, full Java 17 support is becoming a reality.&lt;/p&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-365.html&quot;&gt;365&lt;/a&gt;
and &lt;a href=&quot;https://trino.io/docs/current/release/release-366.html&quot;&gt;366&lt;/a&gt; release notes.&lt;/p&gt;

&lt;p&gt;To play around with query retries, you need to set the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;retry_policy&lt;/code&gt; session
variable to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QUERY&lt;/code&gt; with the following command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET SESSION retry_policy=QUERY;&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;log4shell&quot;&gt;Log4Shell&lt;/h2&gt;

&lt;p&gt;There’s a new vulnerability in town that has the potential to affect Java
projects that use some Log4j2 versions. It is called Log4Shell, and it does not
affect Trino. Read &lt;a href=&quot;https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino.html&quot;&gt;the blog for more details&lt;/a&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/episode/31/log4shell.jpeg&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-month-replicasets-deployments-and-services&quot;&gt;Concept of the month: ReplicaSets, Deployments, and Services&lt;/h2&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/24.html&quot;&gt;the first installment of Trinetes&lt;/a&gt;, we talked about what 
containerization is and why we use it. We covered the difference between tools
like docker-compose and container orchestration systems like Kubernetes (k8s).
Finally, we went over the first k8s object called a &lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/pods/&quot;&gt;&lt;em&gt;pod&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As a reminder, a pod is the basic unit of deployment in a k8s cluster. In this
episode, we cover how to scale, deploy, and connect these pods. If you are 
missing some context, you should review &lt;a href=&quot;/episodes/24.html&quot;&gt;the first installment of this series&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;replicasets&quot;&gt;ReplicaSets&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Replicas&lt;/em&gt; make one or more instances based on the same pod definitions. In k8s,
the object used to manage replication is a &lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/&quot;&gt;&lt;em&gt;ReplicaSet&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;ReplicaSets provide high availability by managing multiple instances based on a 
pod definition in the k8s cluster. Kubernetes automatically brings up any failed
pod instances that go down in a ReplicaSets based on the number of replicas you
specify in the definition.&lt;/p&gt;

&lt;p&gt;Replication also enables load balancing IO traffic over multiple pods. You gain 
the flexibility to scale up or down as traffic increases or decreases without 
any downtime.&lt;/p&gt;

&lt;p&gt;To scale the number of pods in a live ReplicaSet, you can update the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;replicas&lt;/code&gt; 
value in the ReplicaSet definition file, then running the following command to
update it:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl replace -f replicaset-definition.yml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can also edit the live ReplicaSet without changing the local file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl edit replicaset &amp;lt;replicaset-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;labels-and-selectors&quot;&gt;Labels and selectors&lt;/h3&gt;

&lt;p&gt;Kubernetes objects have &lt;a href=&quot;https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/&quot;&gt;labels&lt;/a&gt; 
which are just key/value properties used to identify and dynamically group k8s
objects. Labels should be meaningful and relevant to k8s users to easily 
comprehend things like which application, version, component, and environment 
certain objects belong to. Labels are shared across instances, and so they are 
not unique.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors&quot;&gt;Selectors&lt;/a&gt;
specify the grouping of instance to target a set of objects when deploying or 
applying other operations over these objects. For example, a ReplicaSet that 
identifies a set of pods with its selector to manage. When creating the 
ReplicaSet, k8s creates new pods defined in the ReplicaSet’s selector 
definition. If the pods crash, k8s brings up new pods and associates the new
pods with the ReplicaSet.&lt;/p&gt;

&lt;h3 id=&quot;deployments&quot;&gt;Deployments&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/controllers/deployment/&quot;&gt;Deployment&lt;/a&gt;
objects allow you to take a ReplicaSet, and perform actions on that set 
like creation, a rolling update, rollback, pod update, and so on.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/31/deployment.png&quot; /&gt;&lt;br /&gt;
 Source: https://www.udemy.com/course/learn-kubernetes/
&lt;/p&gt;

&lt;p&gt;The best way to start making sense of these concepts is to look at the k8s
configuration files.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;helm template tcb trino/trino --version 0.3.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Below is the generated deployment configuration, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino/templates/deployment-worker.yaml&lt;/code&gt; with comments that delineate where
different sections of the configuration are defining.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#-------------------------Deployment-----------------------------
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tcb-trino-worker
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    component: worker
spec:
#-------------------------ReplicaSet-----------------------------
  replicas: 2
  selector:
    matchLabels:
      app: trino
      release: tcb
      component: worker
  template:
#----------------------------Pod---------------------------------
    metadata:
      labels:
        app: trino
        release: tcb
        component: worker
    spec:
      volumes:
        - name: config-volume
          configMap:
            name: tcb-trino-worker
        - name: catalog-volume
          configMap:
            name: tcb-trino-catalog
      imagePullSecrets:
        - name: registry-credentials
      containers:
        - name: trino-worker
          image: &quot;trinodb/trino:latest&quot;
          imagePullPolicy: IfNotPresent
          env:
            []
          volumeMounts:
            - mountPath: /etc/trino
              name: config-volume
            - mountPath: /etc/trino/catalog
              name: catalog-volume
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /v1/info
              port: http
          readinessProbe:
            httpGet:
              path: /v1/info
              port: http
          resources:
            {}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;configmap&quot;&gt;ConfigMap&lt;/h3&gt;

&lt;p&gt;You may have noticed that the pods define volumes that are referring to an
object called &lt;a href=&quot;https://kubernetes.io/docs/concepts/configuration/configmap/&quot;&gt;&lt;em&gt;ConfigMap&lt;/em&gt;&lt;/a&gt;.
This is a way to store non-confidential data in the form of key-value pairs.&lt;/p&gt;

&lt;p&gt;ConfigMaps are how the Trino chart loads the &lt;a href=&quot;https://trino.io/docs/current/installation/deployment.html#configuring-trino&quot;&gt;Trino configurations&lt;/a&gt; 
in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino&lt;/code&gt; directory on the containers. The ConfigMap file, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino/templates/configmap-worker.yaml&lt;/code&gt;, defines the files loaded into the 
worker nodes. The only real difference of the ConfigMap is in the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.properites&lt;/code&gt; file specifying if the node is a coordinator or not.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-worker
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    component: worker
data:
  node.properties: |
    node.environment=production
    node.data-dir=/data/trino
    plugin.dir=/usr/lib/trino/plugin

  jvm.config: |
    -server
    -Xmx8G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    -Djdk.attach.allowAttachSelf=true
    -XX:-UseBiasedLocking
    -XX:ReservedCodeCacheSize=512M
    -XX:PerMethodRecompilationCutoff=10000
    -XX:PerBytecodeRecompilationCutoff=10000
    -Djdk.nio.maxCachedBufferSize=2000000

  config.properties: |
    coordinator=false
    http-server.http.port=8080
    query.max-memory=4GB
    query.max-memory-per-node=1GB
    query.max-total-memory-per-node=2GB
    memory.heap-headroom-per-node=1GB
    discovery.uri=http://tcb-trino:8080

  log.properties: |
    io.trino=INFO
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The only other ConfigMap defines the &lt;a href=&quot;https://trino.io/docs/current/installation/deployment.html#catalog-properties&quot;&gt;catalog properties files&lt;/a&gt;
in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino/catalog&lt;/code&gt; folder. This ConfigMap only defines two catalogs.
They expose the TPC-H and TPC-DS benchmark datasets.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-catalog
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    role: catalogs
data:
  tpch.properties: |
    connector.name=tpch
    tpch.splits-per-node=4
  tpcds.properties: |
    connector.name=tpcds
    tpcds.splits-per-node=4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;networking&quot;&gt;Networking&lt;/h3&gt;

&lt;p&gt;Unlike in the Docker world, where it runs on the host directly where you can 
expose the container, pods in a k8s cluster run in a private network. 
Kubernetes exposes the internal IP address of the pod with the IP address of the
k8s node and a unique port.&lt;/p&gt;

&lt;p&gt;These IP addresses can be used to address pods internally, it’s not a good idea 
as these IP addresses are dynamic and subject to change upon termination and 
recreation. For this, you set up routing that handles addressing via pod name vs
IP address.&lt;/p&gt;

&lt;p&gt;When you have multiple k8s nodes, you have multiple IP addresses set up for
the nodes. The routing software must be set up to handle the assignment of the 
internal networks to each nodes to avoid conflicts across the cluster. This type
of  functionality exists in cloud services, such as Amazon EKS, Google GKE, and 
Azure AKS.&lt;/p&gt;

&lt;h3 id=&quot;services&quot;&gt;Services&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://kubernetes.io/docs/concepts/services-networking/service/&quot;&gt;&lt;em&gt;Services&lt;/em&gt;&lt;/a&gt; 
establish connectivity between different pods and can make pods available 
from the external k8s node IP address. This enables loose coupling between 
microservices in applications.&lt;/p&gt;

&lt;p&gt;The above example is showing a NodePort service. There are three service types.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;ClusterIP - the service creates a virtual IP inside the cluster to enable 
communication between different services. This service is the default when you
don’t specify a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type&lt;/code&gt; value under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spec&lt;/code&gt; in the configuration.&lt;/li&gt;
  &lt;li&gt;NodePort - is used to expose the internal address of a pod using the IP 
address and port of the node it is running on.&lt;/li&gt;
  &lt;li&gt;Load Balancer - this service creates a load balancer for the application in 
supported cloud providers. We won’t cover this one, but this is used when 
we create our cluster in EKS using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s a diagram of the ClusterIP networking between different ReplicaSets.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/31/clusterip.png&quot; /&gt;&lt;br /&gt;
 Source: https://www.udemy.com/course/learn-kubernetes/
&lt;/p&gt;

&lt;p&gt;NodePort’s establish connectivity to a specific ReplicaSet of pod instances. It 
cannot make a generically accessible IP address for services to communicate 
between one another.&lt;/p&gt;

&lt;p&gt;In our case, we configure an external IP address for the coordinator.
The Helm chart defines a ClusterIP service to accomplish this. Notice the
selector targets the Trino app, the release label, and only the coordinator 
component, which we know is one node.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apiVersion: v1
kind: Service
metadata:
  name: tcb-trino
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: trino
    release: tcb
    component: coordinator
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;nodeport&quot;&gt;NodePort&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport&quot;&gt;&lt;em&gt;NodePort&lt;/em&gt;&lt;/a&gt; 
Service type, creates a proxy service to forward traffic to a specific port on 
the node from the pod.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/31/service.png&quot; /&gt;&lt;br /&gt;
 Source: https://www.udemy.com/course/learn-kubernetes/
&lt;/p&gt;

&lt;p&gt;There are three ports when setting up a NodePort.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;TargetPort - is the port number on the pod itself, where the service forwards to.&lt;/li&gt;
  &lt;li&gt;Port - is the port used by the service.&lt;/li&gt;
  &lt;li&gt;NodePort - is the port that is exposed by the worker node and made available 
externally. NodePorts can only be in the range of 30000 - 32767.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The only required port to set is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;port&lt;/code&gt;. By default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;targetPort&lt;/code&gt; is the 
same as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;port&lt;/code&gt; and nodePort is automatically assigned a free port in the 
allowed range. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ports&lt;/code&gt; is also an array which is why the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-&lt;/code&gt; char is used.&lt;/p&gt;

&lt;h3 id=&quot;amazon-eks-elastic-kubernetes-service&quot;&gt;Amazon EKS (Elastic Kubernetes Service)&lt;/h3&gt;

&lt;p&gt;Amazon EKS is a managed container service to run and scale Kubernetes 
applications in the cloud. EKS provides k8s clusters in the cloud for you 
without your having to manage the whole k8s services and platform. Unlike with
your own k8s cluster, you can’t log into the control plane node in EKS, although
you won’t need to. You are able to access workers which are usually EC2 nodes.&lt;/p&gt;

&lt;p&gt;There are &lt;a href=&quot;https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html&quot;&gt;many steps involved in setting up a Kubernetes cluster&lt;/a&gt; 
on EKS, unless you use a simple command line tool called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt; that
provisions the cluster for you.&lt;/p&gt;

&lt;h3 id=&quot;eksctl&quot;&gt;eksctl&lt;/h3&gt;

&lt;p&gt;From the &lt;a href=&quot;https://eksctl.io/&quot;&gt;eksctl website&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt; is a simple CLI tool for creating and managing clusters on EKS - 
Amazon’s managed Kubernetes service for EC2. It is written in Go, uses 
CloudFormation, was created by Weaveworks and it welcomes contributions from 
the community. Create a basic cluster in minutes with just one command.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;demo-of-the-month-deploy-trino-k8s-to-amazon-eks&quot;&gt;Demo of the month: Deploy Trino k8s to Amazon EKS&lt;/h2&gt;

&lt;p&gt;First, you’ll need to install the following tools if you haven’t done so already:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/weaveworks/eksctl&quot;&gt;eksctl&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://kubernetes.io/docs/tasks/tools/&quot;&gt;kubectl&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://helm.sh/docs/intro/install/&quot;&gt;helm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you need to add your IAM credentials to the 
&lt;a href=&quot;https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-where&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~/.aws/credentials&lt;/code&gt; file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check the latest k8s version that is available on EKS.
&lt;a href=&quot;https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html&quot;&gt;https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;eksctl create cluster \
 --name tcb-cluster \
 --version 1.21 \
 --region us-east-1 \
 --nodegroup-name k8s-tcb-cluster \
 --node-type t2.large \
 --nodes 2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The command completed in 10 to 15 minutes. This is the first output you
see:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2021-12-16 01:25:17 [ℹ]  eksctl version 0.76.0
2021-12-16 01:25:17 [ℹ]  using region us-east-1
2021-12-16 01:25:17 [ℹ]  setting availability zones to [us-east-1a us-east-1e]
2021-12-16 01:25:17 [ℹ]  subnets for us-east-1a - public:192.168.0.0/19 private:192.168.64.0/19
2021-12-16 01:25:17 [ℹ]  subnets for us-east-1e - public:192.168.32.0/19 private:192.168.96.0/19
2021-12-16 01:25:17 [ℹ]  nodegroup &quot;k8s-tcb-cluster&quot; will use &quot;&quot; [AmazonLinux2/1.21]
2021-12-16 01:25:17 [ℹ]  using Kubernetes version 1.21
2021-12-16 01:25:17 [ℹ]  creating EKS cluster &quot;tcb-cluster&quot; in &quot;us-east-1&quot; region with managed nodes
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After some time, you notice that two ec2 instances have come up. The final 
output of the tool should look like this.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2021-12-16 02:00:17 [ℹ]  waiting for at least 2 node(s) to become ready in &quot;k8s-tcb-cluster&quot;
2021-12-16 02:00:17 [ℹ]  nodegroup &quot;k8s-tcb-cluster&quot; has 2 node(s)
2021-12-16 02:00:17 [ℹ]  node &quot;ip-192-168-2-123.ec2.internal&quot; is ready
2021-12-16 02:00:17 [ℹ]  node &quot;ip-192-168-55-167.ec2.internal&quot; is ready
2021-12-16 02:00:18 [ℹ]  kubectl command should work with &quot;~/.kube/config&quot;, try &apos;kubectl get nodes&apos;
2021-12-16 02:00:18 [✔]  EKS cluster &quot;tcb-cluster&quot; in &quot;us-east-1&quot; region is ready
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Take special note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;eksctl&lt;/code&gt; overwrote your k8s configuration to point you to 
the EKS cluster instead of a local cluster. To test that you can connect, run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You should see two nodes running. Now everything is simple. All you have to do
to install Trino is reuse the Helm chart that we used to locally deploy Trino.
Now, with the exact same command, you deploy to EKS since the tool updated
your settings.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;helm install tcb trino/trino --version 0.3.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After you’ve installed the Helm chart, wait a minute or two for the Trino 
service to fully start and run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get deployments
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You should see the output that the coordinator and both workers are available.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tcb-trino-coordinator   1/1     1            1           67s
tcb-trino-worker        2/2     2            2           67s
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To connect to the cluster, the Helm output gives pretty good instructions on how
to create a tunnel from the cluster to your local laptop.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l &quot;app=trino,release=tcb,component=coordinator&quot; -o jsonpath=&quot;{.items[0].metadata.name}&quot;)
  echo &quot;Visit http://127.0.0.1:8080 to use your application&quot;
  kubectl port-forward $POD_NAME 8080:8080
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run that, then go to &lt;a href=&quot;http://127.0.0.1:8080&quot;&gt;http://127.0.0.1:8080&lt;/a&gt;, and you should see the Trino UI.&lt;/p&gt;

&lt;p&gt;To clear out the Helm install, run:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl delete service --all
kubectl delete deployment --all
kubectl delete configmap --all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To tear down the entire k8s cluster, run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;eksctl delete cluster --name test-cluster --region us-east-1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;pr-of-the-month-pr-8921-support-truncate-table-statement&quot;&gt;PR of the month: PR 8921: Support TRUNCATE TABLE statement&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/trinodb/trino/issues/8921&quot;&gt;PR of the month&lt;/a&gt;
implements &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;. This command is very similar to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; statements,
with the exception that it does not perform deletes on individual rows. This 
ends up becoming a much faster operation that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; as it uses fewer system 
and logging resources.&lt;/p&gt;

&lt;p&gt;Thanks to Yuya Ebihira for adding the support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TRUNCATE TABLE&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-month-how-do-i-run-systemsync_partition_metadata-with-different-catalogs&quot;&gt;Question of the month: How do I run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.sync_partition_metadata&lt;/code&gt; with different catalogs?&lt;/h2&gt;

&lt;p&gt;This week’s &lt;a href=&quot;https://trinodb.slack.com/archives/CFLB9AMBN/p1639094856214800&quot;&gt;question of the month&lt;/a&gt; 
comes from Yu on Slack. Yu asks:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Hi team, in the following system procedure, how can we specify the catalog name?
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.sync_partition_metadata(schema_name, table_name, mode, case_sensitive)&lt;/code&gt;
We are using multiple catalogs and we need to call this procedure against 
non-default catalog.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I answered this with a link back to our &lt;a href=&quot;/episodes/5.html&quot;&gt;fifth episode&lt;/a&gt; :&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;You need to set the catalog either in the jdbc string as I do in the video, or
you need to set the session catalog variable,
&lt;a href=&quot;https://trino.io/docs/current/sql/set-session.html&quot;&gt;https://trino.io/docs/current/sql/set-session.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://normanlimxk.com/2021/12/07/run-trino-presto-on-minikube-on-aws/&quot;&gt;Run Trino/Presto on Minikube on AWS&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/episodes/24.html&quot;&gt;Trinetes I: Trino on Kubernetes TCB episode&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sbakiu.medium.com/diy-analytics-platform-66638cc6a92f&quot;&gt;DIY Analytics Platform&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=p6xDCz00TxU&quot;&gt;AWS EKS - Create Kubernetes cluster on Amazon EKS: the easy way&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Trino Summit 2021</summary>

      
      
    </entry>
  
    <entry>
      <title>Log4Shell does not affect Trino</title>
      <link href="https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino.html" rel="alternate" type="text/html" title="Log4Shell does not affect Trino" />
      <published>2021-12-13T00:00:00+00:00</published>
      <updated>2021-12-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2021/12/13/log4shell-does-not-affect-trino.html">&lt;p&gt;In the last few days we had a surge of folks in our community reaching out with
concerns over the &lt;a href=&quot;https://www.lunasec.io/docs/blog/log4j-zero-day/&quot;&gt;Log4Shell exploit&lt;/a&gt;
(&lt;a href=&quot;https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228&quot;&gt;CVE-2021-44228&lt;/a&gt;),
and we want to inform you that &lt;strong&gt;Trino is not affected&lt;/strong&gt;. Trino does not use log4j
in the core engine or runtime classes. There are some connectors that include 
the log4j dependency from client dependencies, but are either not used or are 
not versions affected by the Log4Shell vulnerability. Regular security reviews, 
including code and dependency analysis, are part of the regular development 
process. As we learn more we will update the code to keep vulnerabilities out of
the code.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; src=&quot;/assets/blog/log4shell/log4shell.jpeg&quot; /&gt;
&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;trino-connectors-with-the-log4j-dependency&quot;&gt;Trino connectors with the Log4j dependency&lt;/h2&gt;

&lt;p&gt;If you do a search in the Trino repository, you’ll notice two direct 
dependencies of the log4j dependency shows up in two of the connectors, Accumulo
and Elasticsearch.&lt;/p&gt;

&lt;h3 id=&quot;accumulo&quot;&gt;Accumulo&lt;/h3&gt;

&lt;p&gt;The Accumulo connector depends on log4j 1.2.17, which although isn’t vulnerable
to Log4Shell, has other vulnerabilities. These vulnerabilities do not apply to 
how we’ve used the loggers in the connector code. To be clear, despite the small
use of this logger in the Accumulo connector, there is still no threat even if 
you are using it. We are &lt;a href=&quot;https://github.com/trinodb/trino/issues/8781&quot;&gt;working on removing&lt;/a&gt;
the uses of this log4j library to avoid any confusion in an upcoming release.&lt;/p&gt;

&lt;h3 id=&quot;elasticsearch&quot;&gt;Elasticsearch&lt;/h3&gt;

&lt;p&gt;The Elasticsearch connector did have an affected dependency 
&lt;a href=&quot;https://github.com/trinodb/trino/commit/2018a94253d48cfdce283538855ee65950f9be3d&quot;&gt;that was recently removed&lt;/a&gt;.
Log4j was not being used in the connector. So despite the existence of the 
dependency in the Elasticsearch connector, there is no direct use of the 
vulnerable library.&lt;/p&gt;

&lt;h2 id=&quot;avoiding-future-introduction-of-log4shell&quot;&gt;Avoiding future introduction of Log4Shell&lt;/h2&gt;

&lt;p&gt;We take security seriously on the Trino project, as it provides a single point 
of access to your data sources. We’re taking precautionary measures to protect 
against the vulnerability from creeping its way into future versions. In version
366, we’re removing that dependency and &lt;a href=&quot;https://github.com/trinodb/trino/commit/10ba96c63ed3875d9dcca335e49bc73f5c0a6a8c&quot;&gt;adding a dedicated rule&lt;/a&gt;
to the build process to ban log4j as a direct dependency.&lt;/p&gt;

&lt;h2 id=&quot;what-should-you-do&quot;&gt;What should you do?&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Rest assured that there is no vulnerability in your Trino cluster.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If you’ve created your own plugin with one of the affected log4j libraries, 
you should upgrade as quickly as possible to 2.15.0 or higher.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;In the coming weeks, upgrade to the 366 release at your convenience.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We know there can be a lot of concern when vulnerabilities come up. We wish you
all the best of luck while you work hard to mitigate the risk of exploits in 
your systems. If you have any questions, reach out on the &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Trino Slack&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>In the last few days we had a surge of folks in our community reaching out with concerns over the Log4Shell exploit (CVE-2021-44228), and we want to inform you that Trino is not affected. Trino does not use log4j in the core engine or runtime classes. There are some connectors that include the log4j dependency from client dependencies, but are either not used or are not versions affected by the Log4Shell vulnerability. Regular security reviews, including code and dependency analysis, are part of the regular development process. As we learn more we will update the code to keep vulnerabilities out of the code.</summary>

      
      
    </entry>
  
    <entry>
      <title>30: Trino and dbt, a hot data mesh</title>
      <link href="https://trino.io/episodes/30.html" rel="alternate" type="text/html" title="30: Trino and dbt, a hot data mesh" />
      <published>2021-11-17T00:00:00+00:00</published>
      <updated>2021-11-17T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/30</id>
      <content type="html" xml:base="https://trino.io/episodes/30.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;José Cabeda, Data Engineer at &lt;a href=&quot;https://www.talkdesk.com&quot;&gt;Talkdesk&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/jecabeda&quot;&gt;@jecabeda&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Przemek Denkiewicz, Cloud Ecosystem Engineer at &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;
  (&lt;a href=&quot;https://twitter.com/hovaesco&quot;&gt;@hovaesco&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;If you missed &lt;a href=&quot;https://www.starburst.io/resources/trino-summit/&quot;&gt;Trino Summit 2021&lt;/a&gt;,
you can watch it on demand, for free!&lt;/p&gt;

&lt;h2 id=&quot;release-364&quot;&gt;Release 364&lt;/h2&gt;

&lt;p&gt;Trino 364 shipped on the first of November, just after our last episode. 
Martin’s official announcement mentioned the following highlights:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for dynamic filtering in Iceberg connector&lt;/li&gt;
  &lt;li&gt;Performance improvements when querying small files&lt;/li&gt;
  &lt;li&gt;Procedure to merge small files in Hive tables&lt;/li&gt;
  &lt;li&gt;Support for Cassandra UUID type&lt;/li&gt;
  &lt;li&gt;Support for MemSQL datetime and timestamp types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... RENAME TO&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;A whole bunch of performance improvements&lt;/li&gt;
  &lt;li&gt;Elasticsearch connector no longer fails with unsupported types&lt;/li&gt;
  &lt;li&gt;A lot of improvements on Hive and Iceberg connectors&lt;/li&gt;
  &lt;li&gt;Hive connector has optimize procedure now!&lt;/li&gt;
  &lt;li&gt;Parquet and avro fixes and improvements&lt;/li&gt;
  &lt;li&gt;Web UI performance improvement for long query texts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More detailed information is available in the &lt;a href=&quot;https://trino.io/docs/current/release/release-364.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-and-dbt-a-hot-data-mesh&quot;&gt;Concept of the week: Trino and dbt, a hot data mesh&lt;/h2&gt;

&lt;p&gt;Data mesh, the buzzword that follows data lakehouse, may feel rather irrelevant
for many. This is especially true for those that just want to move from a Hive 
and HDFS cluster to storing data in object store, or from a cloud data warehouse
and query it with Trino.&lt;/p&gt;

&lt;p&gt;While data mesh is certainly in the hype cycle phase, it’s actually not a new
idea and has very sound principles. Many companies have written their own 
software and created organizational policies that align with the strategies 
outlined by the data mesh principles. In essence, these principles aim to make
data management for analytics platforms decentralized. This means decentralizing
the infrastructure and data engineers managing it to different domains (or 
products) within a company.&lt;/p&gt;

&lt;p&gt;What’s really exciting about data mesh is that much of the technology today 
makes these theoretical principles more of a reality without having to invent 
your own services. The author of &lt;a href=&quot;https://martinfowler.com/articles/data-mesh-principles.html&quot;&gt;data mesh&lt;/a&gt;,
Zhamak Dehghani, lays out 4 principles that characterize a data mesh:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Domain-oriented, decentralized data ownership and architecture&lt;/li&gt;
  &lt;li&gt;Data as a product&lt;/li&gt;
  &lt;li&gt;Self-serve data infrastructure as a platform&lt;/li&gt;
  &lt;li&gt;Federated computational governance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s see what the engineers from Talkdesk are doing to implement their data 
mesh.&lt;/p&gt;

&lt;h3 id=&quot;talkdesk&quot;&gt;Talkdesk&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://www.talkdesk.com&quot;&gt;Talkdesk&lt;/a&gt; is a contact center as a service. Talkdesk
was created at a &lt;a href=&quot;https://www.twilio.com&quot;&gt;Twilio&lt;/a&gt; Hackathon in 2011. They just 
hit a 10 billion dollar valuation. As a fast growing startup, they are growing 
their product strategy at a fast pace, and deal with a large data sets to
analyze regularly.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-scale.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The Talkdesk product is deployed in cloud infrastructure and provides all the 
infrastructure for operating a call center. Its architecture is heavily 
event-driven. Dealing with realtime events at scale is difficult and requires a 
reactive and flexible architecture.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-events.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The early architecture for the analytics platform followed a traditional
approach using Spark and Fivetran to ingest data into Redshift. It had various
pipelines to update the data for downstream consumption.&lt;/p&gt;

&lt;p&gt;This centralized workflow made communication across data entity management much
simpler as it all exists on the same team. However, scaling caused increased 
backlogs, which delayed analysis and deployments. It also made it difficult to 
handle different use cases like realtime and historical use cases.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The use cases between analytics and transactional are varied and overlapping. 
Live data typically feeds into stateful databases that updates as data arrives. 
To analyze data in motion, you need a realtime database. Historical data exists
to keep a backup of multiple copies of different states over time. This enables
trend analysis over longer periods of time versus right now. One challenge 
Talkdesk faced was realizing a robust architecture that satisfies analyzing live
data that gets the latest changes as they arrive to OLTP databases while 
meeting all the analytics use cases.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/olap-oltp.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;To enable analytics across the various use cases, Talkdesk integrated Trino into
their workflow to read data across both live and historic data and merge them.
Using Trino enabled reading from live data feeding into their stateful data 
stores, and reads across historic data stores to produce data in the form needed
to support Talkdesk products.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;90%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-architecture-2.0.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Trino is also used to hide the complexity of the data platform, and allows 
merging data across mulitple relational and object stores.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-architecture-2.0-external.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;why-dbt&quot;&gt;Why dbt?&lt;/h3&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/21.html&quot;&gt;episode 21&lt;/a&gt; we discussed using dbt and Trino in detail. As
we mentioned there:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;dbt is a transformation workflow tool that lets teams quickly and 
collaboratively deploy analytics code, following software engineering best 
practices like modularity, CI/CD, testing, and documentation. It enables 
anyone who knows SQL to build production-grade data pipelines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can achieve modular, repeatable, and testable units of processing by 
defining various models and definitions to the data pipelines. For example:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/dbt-definition.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Using the definitions above, Talkdesk engineers were able to consolidate all
these tasks into a much more simplified graph of operations.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/dbt-results.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;why-data-mesh&quot;&gt;Why data mesh?&lt;/h3&gt;

&lt;p&gt;While a lot of focus has gone into the technology aspects of data mesh, there is
also a lot to be said about the implications on the data team and 
socio-political policies that come with data mesh. Talkdesk also made structural
changes to their team to improve their data mesh strategy.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/30/talkdesk-data-team.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;how-data-mesh-affects-the-everyday-life-of-data-engineers&quot;&gt;How data mesh affects the everyday life of data engineers?&lt;/h3&gt;

&lt;p&gt;There is a real fear that comes around when management changes business 
policies. It can be hard to tell how these policies trickle down and affect
the engineer’s every day work life. In general, engineers become more entrenched
in different domains rather than trying to manage all domains under one 
architecture. Data engineers are distributed to product teams and specialize
in the domain’s data models. They also have specific knowledge of how to use
the self-service platform to integrate across other teams.&lt;/p&gt;

&lt;h3 id=&quot;comparing-microservices-based-applications-to-the-data-mesh&quot;&gt;Comparing microservices-based applications to the data mesh&lt;/h3&gt;

&lt;p&gt;When we think of a functional system for deploying and managing 
microservices-based applications, there are several features that we’ve come to
expect. It is very easy to compare the features of microservices-based 
applications to features of a data mesh. &lt;a href=&quot;https://blog.starburst.io/data-mesh-a-software-engineers-perspective&quot;&gt;Data Mesh: A Software Engineer’s Perspective&lt;/a&gt;
blog.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-partitioned-table-tests-and-fixed-pr-9757&quot;&gt;PR of the week: Partitioned table tests and fixed PR 9757&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/trinodb/trino/pull/9757&quot;&gt;PR of the week&lt;/a&gt;
is for the Iceberg connector. Release 364 had quite a few improvements for 
Iceberg and handled small issues that could cause query failure in some
scenarios. This PR addressed a query failure when reading a partition on a 
UUID column.&lt;/p&gt;

&lt;p&gt;Thanks to Piotr Findeisen for fixing this and many other bugs, as well as,
improving performance in the Iceberg connector!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-whats-the-difference-between-location-and-external_location&quot;&gt;Question of the week: What’s the difference between &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt;?&lt;/h2&gt;

&lt;p&gt;This week’s &lt;a href=&quot;https://www.trinoforum.org/t/105&quot;&gt;question of the week&lt;/a&gt; comes from 
Aakash Nand on Slack and ported to Trino Forum. Aakash asks:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;When creating a Hive table in Trino, what is the difference between 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; . If I have to create external table I have
to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt; right? What is the difference between these two?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This was answered Arkadiusz Czajkowski:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Tables created with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; are managed tables. You have full control over 
them from their creation to modification. tables created with 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt; are tables created by third party systems. We just access 
them mostly for read. I would encourage you to use location in your case.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/geekculture/trino-dbt-a-match-in-sql-heaven-1df2a3d12b5e&quot;&gt;Trino + dbt = a match made in SQL heaven? Blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/episodes/21.html&quot;&gt;Trino + dbt = a match made in SQL heaven? TCB episode&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://martinfowler.com/articles/data-mesh-principles.html&quot;&gt;Data Mesh Principles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.starburst.io/data-mesh-a-software-engineers-perspective&quot;&gt;Data Mesh: A Software Engineer’s Perspective&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>29: What is Trino and the Hive connector</title>
      <link href="https://trino.io/episodes/29.html" rel="alternate" type="text/html" title="29: What is Trino and the Hive connector" />
      <published>2021-10-28T00:00:00+00:00</published>
      <updated>2021-10-28T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/29</id>
      <content type="html" xml:base="https://trino.io/episodes/29.html">&lt;h2 id=&quot;release-364&quot;&gt;Release 364&lt;/h2&gt;

&lt;p&gt;Release 364 is just around the corner, here is Manfred’s release preview:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER MATERIALIZED VIEW ... RENAME TO&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;A whole bunch of performance improvements&lt;/li&gt;
  &lt;li&gt;Elasticsearch connector no longer fails if fields with unsupported types exist&lt;/li&gt;
  &lt;li&gt;Hive connector has optimize procedure now!&lt;/li&gt;
  &lt;li&gt;Parquet and Avro fixes and improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-what-is-trino&quot;&gt;Concept of the week: What is Trino?&lt;/h2&gt;

&lt;p&gt;Trino is the project created by Martin Traverso, Dain Sundstrom, David Phillips,
and Eric Hwang in 2012 to replace the 300PB Hive data warehouse at Facebook. The
goal of Trino is to run fast ad-hoc analytics queries over big data file systems
like HDFS and object stores like S3.&lt;/p&gt;

&lt;p&gt;An initially unintended but now characteristic feature of Trino is its ability 
to execute federated queries over various distributed data sources. This
includes, but is not limited to: Accumulo, BigQuery, Apache Cassandra, 
ClickHouse, Druid, Elasticsearch, Google Sheets, Apache Iceberg, Apache Hive, 
JMX, Apache Kafka, Kinesis, Kudu, MongoDB, MySQL, Oracle, Apache Phoenix, 
Apache Pinot, PostgreSQL, Prometheus, Redis, Redshift, SingleStore (MemSQL), 
Microsoft SQL Server.&lt;/p&gt;

&lt;p&gt;How does Trino query across everything from data lakes, SQL, and NoSQL databases
at unprecedented speeds? It helps to start by going over Trino’s architecture:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/1-architecture.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Trino consists of two types of nodes, &lt;em&gt;coordinator&lt;/em&gt; and &lt;em&gt;worker&lt;/em&gt; nodes. The 
coordinator plans, and schedules the processing of SQL queries. The queries are 
submitted by users directly or with connected SQL reporting tools. The workers 
actually carry out more of the processing by reading the data from the source or
performing various operations within the task(s) they are assigned.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/2-SPI.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Trino is able to query over multiple data types by exposing a common interface
called the SPI (Service Provider Interface) that enables the core engine to
treat the interactions with each data source the same. Each connector must then
implement the SPI which includes exposing metadata, statistics, data location, 
and establishing one or more connections with an underlying data source.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/3-parser-planner.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Many of these interfaces are used in the coordinator during the analysis and 
planning phases. The analyzer, for example, uses the metadata SPI to make sure
the table in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause actually exists in the data source.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/4-distributed-query-plan.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Once a logical query plan is generated, the coordinator then converts this to a
distributed query plan that maps actions into stages that contain tasks to be
run on nodes. Stages model the sequence of events and a directed acyclic graph
(DAG).&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/5-task-management.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The coordinator then schedules tasks over the worker nodes as efficiently as 
possible, depending on the physical layout and distribution of the data.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/6-splits.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Data is split and distributed across the worker nodes to provide 
inter-node parallelism.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/29/7-parallelism-over-drivers.png&quot; /&gt;&lt;br /&gt;
Source: &lt;a href=&quot;https://trino.io/blog/2021/04/21/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Once this data arrives to the worker node, it is further divided and processed in 
parallel. Workers submit the processed data back to coordinator. Finally, the 
coordinator provides the results of the query to the user.&lt;/p&gt;

&lt;h2 id=&quot;pr-8821-add-https-query-event-logger&quot;&gt;PR 8821 Add HTTP/S query event logger&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/8821&quot;&gt;Pull request 8821&lt;/a&gt; enables Trino cluster 
owners to log query processing metadata by submitting it to an HTTP endpoint. 
This may be used for usage monitoring and alarming, but it might also be used to
extract analytics on cluster usage, such as tables/column usage metrics.&lt;/p&gt;

&lt;p&gt;Query events are serialized to JSON and sent to the provided address over HTTP 
or over HTTPS. Configuration allows selecting which events should be included.&lt;/p&gt;

&lt;p&gt;Thanks for the contribution &lt;a href=&quot;https://github.com/mosiac1&quot;&gt;mosiac1&lt;/a&gt; and others at
Bloomberg!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/admin/event-listeners-http.html&quot;&gt;Read the docs&lt;/a&gt; 
to learn more about this exciting feature!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-does-the-hive-connector-depend-on-the-hive-runtime&quot;&gt;Question of the week: Does the Hive connector depend on the Hive runtime?&lt;/h2&gt;

&lt;p&gt;This week’s question covers a lot of the confusion around the &lt;a href=&quot;https://trino.io/docs/current/connector/hive.html&quot;&gt;Hive
connector&lt;/a&gt;. In short, the answer 
is that the Hive runtime is not required. There’s more information available in 
the &lt;a href=&quot;https://trino.io/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Intro to the Hive Connector blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ZwaVZplVmVA&quot;&gt;An Overview of the Starburst Trino Query Optimizer (Karol Sobczak)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 364</summary>

      
      
    </entry>
  
    <entry>
      <title>28: Autoscaling streaming ingestion to Trino with Pravega</title>
      <link href="https://trino.io/episodes/28.html" rel="alternate" type="text/html" title="28: Autoscaling streaming ingestion to Trino with Pravega" />
      <published>2021-10-14T00:00:00+00:00</published>
      <updated>2021-10-14T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/28</id>
      <content type="html" xml:base="https://trino.io/episodes/28.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Derek Moore, Software Senior Principal Engineer at &lt;a href=&quot;https://www.delltechnologies.com/en-us/index.htm&quot;&gt;Dell EMC&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/derekm00r3&quot;&gt;@derekm00r3&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Andrew Robertson,Principal Software Engineer at &lt;a href=&quot;https://www.delltechnologies.com/en-us/index.htm&quot;&gt;Dell EMC&lt;/a&gt;
  (&lt;a href=&quot;https://www.linkedin.com/in/andrew-robertson-986b885/&quot;&gt;@andrew-robertson&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Karan Singh, Software Engineer 2 at &lt;a href=&quot;https://www.delltechnologies.com/en-us/index.htm&quot;&gt;Dell EMC&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/singhkaranrakesh/&quot;&gt;@singhkaranrakesh&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;Get ready for &lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;Trino Summit&lt;/a&gt;, coming
October 21st and 22nd! This annual Trino community event is where we gather 
practitioners that deploy Trino at scale and share their experiences and best 
practices with the rest of the community. While the planning for this event was 
a bit chaotic due to the pandemic, we have made the final decision to host the 
event virtually for the safety of all the attendees. We look forward to seeing
you there, and can’t wait to share more information in the coming weeks!&lt;/p&gt;

&lt;h2 id=&quot;release-363&quot;&gt;Release 363&lt;/h2&gt;

&lt;p&gt;Official announcement items from Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;New HTTP event listener plugin&lt;/li&gt;
  &lt;li&gt;Insert overwrite for S3-backed tables&lt;/li&gt;
  &lt;li&gt;Support for Elasticsearch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaled_float&lt;/code&gt; type&lt;/li&gt;
  &lt;li&gt;Support for Cassandra &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tuple&lt;/code&gt; type&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;time&lt;/code&gt; type in MySQL connector&lt;/li&gt;
  &lt;li&gt;Support for SQLServer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datetimeoffset&lt;/code&gt; type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Misc performance and memory usage improvements&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW ROLES&lt;/code&gt; fix&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN ANALYZE&lt;/code&gt; fix for estimate display&lt;/li&gt;
  &lt;li&gt;Numerous improvements for Parquet files in Hive and Iceberg connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-363.html&quot;&gt;https://trino.io/docs/current/release/release-363.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-event-stream-abstractions-and-pravega&quot;&gt;Concept of the week: Event stream abstractions and Pravega&lt;/h2&gt;

&lt;h3 id=&quot;events-and-streams&quot;&gt;Events and streams&lt;/h3&gt;

&lt;p&gt;What is an event? This sounds like a silly question when asked generally. The
answer is less clear when discussing event-driven systems though. An &lt;strong&gt;event&lt;/strong&gt;
is an action or occurrence that is captured by either a sensor, or a generated
by a source system, and emitted to a sink system. Some examples include user
events from an application, system events in telemetry systems, or sensor events
from monitoring applications.&lt;/p&gt;

&lt;p&gt;What is an event stream? Now knowing what an event is, an &lt;strong&gt;event stream&lt;/strong&gt; is an 
unbounded set of events that are tracked over time.&lt;/p&gt;

&lt;p&gt;In this simple view, an event stream contains a sequential list of events. The
list contains events that have been processed, and some that still need to be 
processed.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/event-stream.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;This is very different from a more realistic view of event streams that 
considers that events arrive and are processed in parallel. Event load may also
fluctuate as events may burst around specific events or events have specific 
periodic behavior. While taking event ingest (writes) into consideration, it is
also important to consider event egress (reads) as part of the problem of 
representing event streams.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/event-stream-realistic.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;pravega-and-segments&quot;&gt;Pravega and segments&lt;/h3&gt;

&lt;p&gt;Engineers at Dell Labs wanted to find a better abstraction to solve for the 
problems they saw in existing event streaming systems. This included how to 
address this type of constant shift in scaling, while also addressing the 
brittle storage abstractions that even streams use today. The storage 
abstraction needs to allow for both real-time and historical analytics. The data
along a particular transaction also needs to be consistent.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/segment.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Their solution is Pravega. The core of Pravega models streams built around a 
storage unit called a segment. A &lt;strong&gt;segment&lt;/strong&gt; is an append-only sequence of bytes
(not events/records). This offers a greater level of flexibility and better 
parallelism and serialization over streams. Pravega stream writers are then able
to write in parallel increasing ingest throughput.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/autoscale-parallel-segment.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;You can use &lt;strong&gt;routing keys&lt;/strong&gt; to map events to particular segments. Pravega
enforces order within specific keys, but does not guarantee ordering of events
across keys. The tradeoff is providing ordering of events versus higher 
parallelism and better performance.&lt;/p&gt;

&lt;p&gt;With segments, you can also scale up and scale down the number of segments 
depending on the workload you’re experiencing. Another compelling capability
this enables is managing transactions in the stream. As writers submit data,
they write to a temporary segment, which are merged to a permanent segment on
commit.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/segment-transactions.png&quot; /&gt;&lt;br /&gt;
Cloud Native Computing Foundation Presentation: &lt;a href=&quot;https://www.cncf.io/wp-content/uploads/2020/08/pravega-overview-cncf-apr-2020.pdf&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The following diagram displays autoscaling splits and merges as specific routing
keys become more popular. To provide a clearer example, say that the routing 
keys are actually just hash geo location values for a taxi app that are mapped 
between zero and one. As certain locations become crowded, lets say that a lot 
of people are going home for the work day, and many taxis are in the downtown 
location. The locations mapped to the downtown routing keys can automatically 
trigger a split, and once the rush hour is over, it merges these segments as 
traffic slows down.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/segment-split-merge.png&quot; /&gt;&lt;br /&gt;
Pravega Docs: &lt;a href=&quot;https://pravega.io/docs/nightly/pravega-concepts/#elastic-streams-auto-scaling&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;pravega-architecture&quot;&gt;Pravega architecture&lt;/h3&gt;

&lt;p&gt;The Pravega architecture comes with writers groups and reader groups that scale
up and down along with the autoscaling applied to the segments. It consists of
a controller that maintains stream metadata and the segment store that works off
of tier one storage (Apache Bookkeeper) and tier two storage (Object storage).&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/pravega-architecture.png&quot; /&gt;&lt;br /&gt;
Pravega Docs: &lt;a href=&quot;https://pravega.io/docs/nightly/pravega-concepts/#elastic-streams-auto-scaling&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Just like Trino, Pravega also aims to build a rich set of connectors with
systems that act as a source and sink. This includes a connector used for Trino.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/28/pravega-connectors.png&quot; /&gt;&lt;br /&gt;
Pravega Docs: &lt;a href=&quot;https://pravega.io/docs/nightly/pravega-concepts/#elastic-streams-auto-scaling&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;pravega-compared-to-other-event-streaming-platforms&quot;&gt;Pravega compared to other event streaming platforms.&lt;/h3&gt;

&lt;p&gt;This chart is very helpful resource to summarize Pravega against other popular 
streaming platforms. This comes from the Pravega site so be sure to check for
an up to date list of these features moving forward.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;Pravega&lt;/th&gt;
      &lt;th&gt;Kafka&lt;/th&gt;
      &lt;th&gt;Pulsar&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Transactions&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Event streams&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Long-term retention&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Durable by default&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Auto-scaling&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Ingestion of large data (video)&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;efficient at high partition counts&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Consistent state replication&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Key-value tables&lt;/td&gt;
      &lt;td&gt;✅&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Comparison between Pravega, Kafka, and Pulsar: &lt;a href=&quot;https://pravega.io&quot;&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-week-querying-pravega-from-trino&quot;&gt;Demo of the week: Querying Pravega from Trino&lt;/h2&gt;

&lt;p&gt;This week the Pravega teams demonstrates an example from their &lt;a href=&quot;https://github.com/pravega/presto-connector/tree/main/getting-started&quot;&gt;getting-started&lt;/a&gt;
tutorial for the Trino connector.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pravega-presto-connector-pr-49&quot;&gt;PR of the week: Pravega presto-connector PR 49&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/pravega/presto-connector/pull/49&quot;&gt;PR of the week&lt;/a&gt;
doesn’t come from the Trino repository this week but rather the presto-connector
repository. The Trino portion of the repository was committed by Dell engineer 
Karan Singh. As it states, this now makes Pravega available from Trino along 
with the original Presto connector.&lt;/p&gt;

&lt;p&gt;Thanks Karan for adding Trino and Andrew for writing the original Presto-Pravega
connector!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-what-is-the-point-of-trino-forum-and-what-is-the-relationship-to-trino-slack&quot;&gt;Question of the week: What is the point of Trino Forum and what is the relationship to Trino Slack?&lt;/h2&gt;

&lt;p&gt;Our &lt;a href=&quot;https://www.trinoforum.org/t/what-is-the-point-of-this-forum-and-what-is-the-relationship-to-trino-slack/28&quot;&gt;question of the week&lt;/a&gt;
comes from the new Trino Forum by Starburst. Brian and a few others at Starburst
created. Slack is a much more adhoc platform for people to work 
through problems rather than to search and find solutions to problems. The Trino
community has such a great amount of knowledge accumulated in this Slack channel,
but there is no way for people to find answers unless they have joined here and 
none of the information we discuss can be found by a search engine like Google.&lt;/p&gt;

&lt;p&gt;Further, a lot of the answers are scattered between different conversations and 
this too can be condensed and simplified. I pondered about the best way for us 
to expose this and though maybe to add an FAQ page on &lt;trino.io&gt; but this would
get stale quickly and this would require a lot of work to be maintained at scale
without a crowdsourcing element. Instead, starting a [Discourse forum](https://www.discourse.org) 
(not to be confused with Discord) acts as a central repository of knowledge 
makes this information easily searchable. The forum is maintained by some of us 
at Starburst but over time we want more moderators from the community (this 
happens through merit and consistency using Discourse Trust levels).&lt;/trino.io&gt;&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.cncf.io/online-programs/pravega-rethinking-storage-for-streams/&quot;&gt;Pravega: Rethinking Storage For Streams&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>JVM challenges in production</title>
      <link href="https://trino.io/blog/2021/10/06/jvm-issues-at-comcast.html" rel="alternate" type="text/html" title="JVM challenges in production" />
      <published>2021-10-06T00:00:00+00:00</published>
      <updated>2021-10-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/10/06/jvm-issues-at-comcast</id>
      <content type="html" xml:base="https://trino.io/blog/2021/10/06/jvm-issues-at-comcast.html">&lt;p&gt;At Comcast, we have a large on-premise Trino cluster. It enables us to extract
insights from data no matter where it resides, and prepares the company for a
more cloud-centric future. Recently, however, we experienced and overcame
challenges related to the Java virtual machine (JVM). We wanted to share what
we encountered and learned in hopes that it might be useful for the Trino
community.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;jit-recompilation&quot;&gt;JIT recompilation&lt;/h2&gt;

&lt;p&gt;Some users complained that nightly reports were taking far too long to
complete. Queries that ran for six hours made very little progress.&lt;/p&gt;

&lt;p&gt;First, we looked at the queries involved in these nightly reports. We
noticed that all these queries involved two particular tables. In this post,
let’s call them table A and table B.&lt;/p&gt;

&lt;p&gt;Our initial suspicion was that there could be an issue with the table data in
HDFS. Thus, we tried to reproduce the performance problem by using queries that
performed simple scans against these tables.&lt;/p&gt;

&lt;p&gt;We tried a simple table scan with no filters, range filter on a partitioned
column, etc.,  ran these queries multiple times and execution times were
consistent. This ruled out a potential problem with HDFS.&lt;/p&gt;

&lt;p&gt;Next, we took a closer look  at the portion of the slow running queries
involving table A, and came up with the simplest possible query that could
demonstrate the problem. We discovered that the following query did not exhibit
the performance problem:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
 count(a.c1)
FROM
 hive.schema1.A a, hive.schema2.B da
WHERE
 a.day_id = da.date_id
 AND a.day_id BETWEEN &apos;2021-03-22&apos; AND &apos;2021-04-21&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But adding a predicate, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a.c2 = &apos;4 (Success)&apos;&lt;/code&gt;, caused the performance problem
to appear:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
 count(a.c1)
FROM
 hive.schema1.A a, hive.schema2.date_dim da
WHERE
 a.day_id = da.date_id
 AND a.day_id BETWEEN &apos;2021-03-22&apos; AND &apos;2021-04-21&apos;
 AND a.c2 = &apos;4 (Success)&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We narrowed the problem down to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Scan/Filter/Project&lt;/code&gt; operator using the
output of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN ANALYZE&lt;/code&gt; from Trino. For the query that performed as
expected, this stage had the following CPU stats:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CPU: 2.39h, Scheduled: 4.47h, Input: 17434967615 rows (357.47GB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For the version of the query with the additional predicate, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a.c2 = &apos;4 (Success)&apos;&lt;/code&gt;,
that exhibited the performance problem, the same stage has the following CPU
stats:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CPU: 3.73d, Scheduled: 48.01d, Input: 17052985227 rows (413.98GB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This shows that for roughly the equivalent amount of data, Trino used
significantly more CPU (3.73 days to 2.39 hours!!). Our next step was to
determine possible reasons.&lt;/p&gt;

&lt;p&gt;We generated a few &lt;a href=&quot;https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html&quot;&gt;jstack&lt;/a&gt;
and Java flight recorder (JFR) profiles of the Trino Java process from
one of the worker nodes while the scan stage was running. After analyzing these
profiles, we found no obvious problem. Trino performed as expected.&lt;/p&gt;

&lt;p&gt;Next, we looked at the list of tasks in the web UI to see what the distribution
of CPU times for each stage was:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/jvm-issues-at-comcast/web_ui_before.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Some workers have tasks that only use up a few minutes of CPU time and others
have tasks that use up to 2 hours of CPU time! Different query runs would show
this would happen to different workers so it was not a problem with any one
individual worker.&lt;/p&gt;

&lt;p&gt;We discussed this with Starburst engineer, &lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt;,
and came to the conclusion that this could potentially be an issue with JVM
code deoptimization. After re-compiling a method a certain number of times,
the JVM refuses to do so any more and will run the method in interpreted
mode, which is much slower.&lt;/p&gt;

&lt;p&gt;The evidence for this is what we highlighted above: that the CPU used by the
same tasks on different workers vary by a factor of approximately 30. This is
the typical difference for compiled versus interpreted code, according to
Piotr’s experience at Starburst.&lt;/p&gt;

&lt;p&gt;The following JVM options were added to the Trino &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt; file to help
with this issue:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:PerMethodRecompilationCutoff=10000&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:PerBytecodeRecompilationCutoff=10000&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These settings increased the recompilation cutoff limit. They are now also
included in the default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt; settings that ship with Trino since the
348 release.&lt;/p&gt;

&lt;p&gt;Since we have been running Trino in production for some time, we did not have
these settings in our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;initial-results&quot;&gt;Initial results&lt;/h3&gt;

&lt;p&gt;Execution time observed  with the JVM options in place was 4 minutes and 51
seconds. The CPU stats for the scan/filter/project stage for this query now
look like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CPU: 3.22h, Scheduled: 7.21h, Input: 17631445897 rows (428.03GB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The CPU used by individual tasks is much more uniform:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/jvm-issues-at-comcast/web_ui_after.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;code-cache&quot;&gt;Code cache&lt;/h2&gt;

&lt;p&gt;We noticed that the cluster’s overall CPU utilization decreased after the
cluster was up for a few days, and there would be a few workers where tasks
were running slow.&lt;/p&gt;

&lt;p&gt;When looking at these workers with slow running tasks, we found that CPU usage
was very high:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[root@worker-node log]# uptime
 21:36:57 up 20 days, 20:39,  1 user,  load average: 149.92, 152.83, 144.82
[root@worker-node log]#
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We also noticed all these workers had messages like this in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;launcher.log&lt;/code&gt;
file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[219756.210s][warning][codecache] Try increasing the code heap size using -XX:ProfiledCodeHeapSize=
OpenJDK 64-Bit Server VM warning: CodeHeap &apos;profiled nmethods&apos; is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:ProfiledCodeHeapSize=
CodeHeap &apos;non-profiled nmethods&apos;: size=258436Kb used=235661Kb max_used=257882Kb free=22774Kb
 bounds [0x00007f466f980000, 0x00007f467f5e1000, 0x00007f467f5e1000]
CodeHeap &apos;profiled nmethods&apos;: size=258432Kb used=207330Kb max_used=216383Kb free=51101Kb
 bounds [0x00007f465fd20000, 0x00007f466f980000, 0x00007f466f980000]
CodeHeap &apos;non-nmethods&apos;: size=7420Kb used=1881Kb max_used=3766Kb free=5538Kb
 bounds [0x00007f465f5e1000, 0x00007f465fab1000, 0x00007f465fd20000]
 total_blobs=64220 nmethods=62699 adapters=1432
 compilation: disabled (not enough contiguous free space left)
              stopped_count=4, restarted_count=3
 full_count=3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the code cache is full, the JVM won’t compile any additional code until
space is freed.&lt;/p&gt;

&lt;p&gt;We were running with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:ReservedCodeCacheSize&lt;/code&gt; JVM option set to 512M.
To see what’s taking up space in the code cache, we used jcmd:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;jcmd &amp;lt;TRINO_PID&amp;gt; Compiler.CodeHeap_Analytics
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We ran this at various intervals so we could compare how the code cache changed
over time.&lt;/p&gt;

&lt;p&gt;30 of the top 48 non-profiled methods were &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesHashStrategy&lt;/code&gt;, which are
generated per-query. These can’t be removed from the cache until the query is
completed, so the amount of cache needed is going to be relative to the
concurrency. We have a very busy cluster with significant concurrency at our
busiest times.&lt;/p&gt;

&lt;p&gt;Next, we set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:ReservedCodeCacheSize&lt;/code&gt; to 2G to see how that would help. We
have not seen the code cache fill while the cluster has been running since
increasing the size to 2GB. We can also monitor the size of the code cache over
time using JMX. One query that can be used if you have the JMX catalog enabled
on your cluster is:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
    node,
    regexp_extract(usage, &apos;max=(-?\d*)&apos;, 1) as max,
    regexp_extract(usage, &apos;used=(-?\d*)&apos;, 1) AS used
FROM
  jmx.current.&quot;java.lang:name=codeheap &apos;non-profiled nmethods&apos;,type=memorypool&quot;
ORDER BY used DESC
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;off-heap-memory-usage&quot;&gt;Off heap memory usage&lt;/h2&gt;

&lt;p&gt;One final JVM issue we noticed in our production cluster was that off-heap
memory on some workers grew to be quite large. We allocate approximately 85%
of the physical memory on our workers for the JVM heap. Recently, we received
alerts from our monitoring systems that memory consumption on our workers got
dangerously close to the physical limit on the machines.&lt;/p&gt;

&lt;p&gt;We noticed some memory related issues from the Alluxio client in the Trino
worker logs on machines generating these high memory alerts. Upon further
investigation, we noticed that Trino was running with the open source version
of the Alluxio client. Trino ships with version 2.4.0 of the Alluxio client. We
are an Alluxio customer and use it in our environment.&lt;/p&gt;

&lt;p&gt;After discussing with Alluxio, they suggested we upgrade to version 2.4.1 of
their Enterprise client which includes a fix for an off-heap memory leak bug.
After upgrading to the Alluxio Enterprise client, the off-heap memory usage
became a lot more stable.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;This post outlined some of the JVM issues we encountered while running Trino in
production. Many of these issues we only hit in our production environment and
were difficult to replicate outside of it. Thus, we wanted to write up our 
experience with the hopes of helping other Trino users in the future!&lt;/p&gt;</content>

      
        <author>
          <name>Sajumon Joseph, David Leach, Bryan Aller, Pavan Madhineni, Lavanya Ragothaman, Pratap Moturi, Pádraig O&apos;Sullivan (Starburst)</name>
        </author>
      

      <summary>At Comcast, we have a large on-premise Trino cluster. It enables us to extract insights from data no matter where it resides, and prepares the company for a more cloud-centric future. Recently, however, we experienced and overcame challenges related to the Java virtual machine (JVM). We wanted to share what we encountered and learned in hopes that it might be useful for the Trino community.</summary>

      
      
    </entry>
  
    <entry>
      <title>27: Trino gits to wade in the data LakeFS</title>
      <link href="https://trino.io/episodes/27.html" rel="alternate" type="text/html" title="27: Trino gits to wade in the data LakeFS" />
      <published>2021-09-30T00:00:00+00:00</published>
      <updated>2021-09-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/27</id>
      <content type="html" xml:base="https://trino.io/episodes/27.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Paul Singman, Developer Advocate at &lt;a href=&quot;https://treeverse.io/&quot;&gt;Treeverse&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/datawhisp&quot;&gt;@datawhisp&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h2&gt;

&lt;p&gt;Get ready for &lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;Trino Summit&lt;/a&gt;, coming
October 21st and 22nd! This annual Trino community event is where we gather 
practitioners that deploy Trino at scale, and share their experiences and best 
practices with the rest of the community. While the planning for this event was 
a bit chaotic due to the pandemic, we have made the final decision to host the 
event virtually for the safety of all the attendees. We look forward to seeing
you there, and can’t wait to share more information in the coming weeks!&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-lakefs-and-git-on-object-storage&quot;&gt;Concept of the week: LakeFS and Git on object storage&lt;/h2&gt;

&lt;p&gt;LakeFS offers git-like semantics over your files in the data lake. Akin to the
versioning you can do on Iceberg, you can also version your data with LakeFS, 
and roll back to previous commits when you make a mistake. LakeFS allows you to 
roll out new features in production or prod-like environments with ease and 
isolation from the real data. Join us as we dive into this awesome new way to 
approach versioning on your data!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/27/trino-lakefs.jpg&quot; /&gt;&lt;br /&gt;
Why we built LakeFS: &lt;a href=&quot;https://lakefs.io/why-we-built-lakefs-atomic-and-versioned-data-lake-operations/&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;features&quot;&gt;Features&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Exabytes scale version control&lt;/li&gt;
  &lt;li&gt;Git-like operations: branch, commit, merge, revert&lt;/li&gt;
  &lt;li&gt;Zero copy branching for frictionless experiments&lt;/li&gt;
  &lt;li&gt;Full reproducibility of data and code&lt;/li&gt;
  &lt;li&gt;Pre-commit/merge hooks for data CI/CD&lt;/li&gt;
  &lt;li&gt;Instantly revert changes to data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;use-cases&quot;&gt;Use cases&lt;/h3&gt;

&lt;h4 id=&quot;in-development&quot;&gt;In development&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Experiment - try new tools, upgrade versions, and evaluate code changes in 
isolation. By creating a branch of the data you get an isolated snapshot to run 
experiments over, while others are not exposed. Compare between branches with 
different experiments or to the main branch of the repository to understand a 
change’s impact.&lt;/li&gt;
  &lt;li&gt;Debug - checkout specific commits in a repository’s commit history to 
materialize consistent, historical versions of your data. See the exact state of
your data at the point-in-time of an error to understand its root cause.&lt;/li&gt;
  &lt;li&gt;Collaborate - avoid managing data access at the two extremes of either 
treating your data lake like a shared folder or creating multiple copies of the
data to safely collaborate. Instead, leverage isolated branches managed by 
metadata (not copies of files) to work in parallel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;during-deployment&quot;&gt;During deployment&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Version Control - deploy data safely with CI/CD workflows borrowed from 
software engineering best practices. Ingest new data onto an isolated branch, 
perform data validations, then add to production through a merge operation.&lt;/li&gt;
  &lt;li&gt;Test - define pre-merge and pre-commit hooks to run tests that enforce schema 
and validate properties of the data to catch issues before they reach 
production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;in-production&quot;&gt;In production&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Roll back - recover from errors by instantly reverting data to a former, 
consistent snapshot of the data lake. Choose any commit in a repository’s commit
 history to revert in one atomic action.&lt;/li&gt;
  &lt;li&gt;Troubleshoot - investigate production errors by starting with a snapshot of 
the inputs to the failed process. Spend less time re-creating the state of 
datasets at the time of failure, and more time finding the solution.&lt;/li&gt;
  &lt;li&gt;Cross-collection consistency - provide consumers multiple synchronized 
collections of data in one atomic, revertable action. Using branches, writers 
provide consistency guarantees across different logical collections - merging to
 the main branch only after all relevant datasets have been created or updated 
 successfully.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://docs.lakefs.io/#use-cases&quot;&gt;https://docs.lakefs.io/#use-cases&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;demo-of-the-week-running-trino-on-lakefs&quot;&gt;Demo of the week: Running Trino on LakeFS&lt;/h2&gt;

&lt;p&gt;In order to run Trino and LakeFS, you need Docker installed on your system with at least 4GB
of memory allocated to Docker.&lt;/p&gt;

&lt;p&gt;Let’s start up the LakeFS instance and the required PostgreSQL instance along 
with the typical Trino containers used with the Hive connector. 
Clone the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-getting-started&lt;/code&gt; repository and navigate to the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;community_tutorials/lakefs/trino-lakefs-minio/&lt;/code&gt; directory.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/lakefs/trino-lakefs-minio/

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once this is done, you can navigate to the following locations to verify that
everything started correctly.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Navigate to &lt;a href=&quot;http://localhost:8000&quot;&gt;http://localhost:8000&lt;/a&gt; to open the LakeFS user interface.&lt;/li&gt;
  &lt;li&gt;Log in with Access Key, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AKIAIOSFODNN7EXAMPLE&lt;/code&gt;, and Secret Access Key, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Verify that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example&lt;/code&gt; repository exists in the UI and open it.&lt;/li&gt;
  &lt;li&gt;The branch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; in the repository, found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example/main/&lt;/code&gt;, should be 
empty.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once you have verified the repository exists, let’s go ahead and create a schema
under the Trino Hive catalog called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; that was pointing to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; but is
now wrapped by LakeFS to add the git-like layer around the file storage.&lt;/p&gt;

&lt;p&gt;Name the schema &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; as that is the schema we copy from the TPCH data set. 
Notice the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; property of the schema. It now has a namespace that is 
prefixed before the actual &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny/&lt;/code&gt; table directory. The prefix contains the 
repository name, then the branch name. All together this follows the pattern of 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;protocol&amp;gt;://&amp;lt;repository&amp;gt;/&amp;lt;branch&amp;gt;/&amp;lt;schema&amp;gt;/&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA minio.tiny
WITH (location = &apos;s3a://example/main/tiny&apos;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, create two tables, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; by setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt;
using the same namespace used in the schema and adding the table name. The query
retrieves the data from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; TPCH data set.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE minio.tiny.customer
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://example/main/tiny/customer/&apos;
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE minio.tiny.orders
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://example/main/tiny/orders/&apos;
) 
AS SELECT * FROM tpch.tiny.orders;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Verify that you can see the table directories in LakeFS once they exist.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run a query on these two tables using the standard table pointing to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;
branch.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tiny.customer c, minio.tiny.orders o
WHERE MKTSEGMENT = &apos;BUILDING&apos; AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE &amp;lt; date&apos;1995-03-15&apos;
GROUP BY ORDERKEY, ORDERDATE, SHIPPRIORITY
ORDER BY ORDERDATE;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Open the &lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&quot;&gt;LakeFS UI again&lt;/a&gt; 
and click on the &lt;strong&gt;Unversioned Changes&lt;/strong&gt; tab. Click &lt;strong&gt;Commit Changes&lt;/strong&gt;. Type a 
commit message on the popup and click &lt;strong&gt;Commit Changes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once the changes are commited on branch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;, click on the &lt;strong&gt;Branches&lt;/strong&gt; tab.
Click &lt;strong&gt;Create Branch&lt;/strong&gt;. Name a new branch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; that branches off of the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch. Now click &lt;strong&gt;Create&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Although there is a branch that exists called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt;, this only exists 
logically. We need to make Trino aware by adding another schema and tables 
that point to the new branch. Do this by making a new schema called 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; and changing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;location&lt;/code&gt; property to point to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt;
branch instead of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA minio.tiny_sandbox
WITH (location = &apos;s3a://example/sandbox/tiny&apos;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; schema exists, we can copy the table definitions
of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; table from the original tables created. We got
the schema for free by copying it directly from the TPCH data using the CTAS 
statement. We don’t want to use CTAS in this case as it not only copies the 
table definition, but also the data. This duplication of data is unnecessary and
is what creating a branch in LakeFS avoids. We want to just copy the table
definition using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE TABLE&lt;/code&gt; statement.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SHOW CREATE TABLE minio.tiny.customer;
SHOW CREATE TABLE minio.tiny.orders;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Take the output and update the schema to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;external_location&lt;/code&gt;
to point to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; for both tables.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE minio.tiny_sandbox.customer (
   custkey bigint,
   name varchar(25),
   address varchar(40),
   nationkey bigint,
   phone varchar(15),
   acctbal double,
   mktsegment varchar(10),
   comment varchar(117)
)
WITH (
   external_location = &apos;s3a://example/sandbox/tiny/customer&apos;,
   format = &apos;ORC&apos;
);

CREATE TABLE minio.tiny_sandbox.orders (
   orderkey bigint,
   custkey bigint,
   orderstatus varchar(1),
   totalprice double,
   orderdate date,
   orderpriority varchar(15),
   clerk varchar(15),
   shippriority integer,
   comment varchar(79)
)
WITH (
   external_location = &apos;s3a://example/sandbox/tiny/orders&apos;,
   format = &apos;ORC&apos;
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once these table definitions exist, go ahead and run the same query as before,
but update using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; schema instead of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; schema.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tiny_sandbox.customer c, minio.tiny_sandbox.orders o
WHERE MKTSEGMENT = &apos;BUILDING&apos; AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE &amp;lt; date&apos;1995-03-15&apos;
ORDER BY ORDERDATE;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One last bit of functionality we want to test is the merging capabilities. To
do this, create a table called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; branch using a CTAS
statement.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE minio.tiny_sandbox.lineitem
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://example/sandbox/tiny/lineitem/&apos;
) 
AS SELECT * FROM tpch.tiny.lineitem;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Verify that you can see three table directories in LakeFS including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; 
in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; branch.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=sandbox&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=sandbox&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Verify that you do not see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the table directories in LakeFS in the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also verify this by running queries against &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the tables
pointing to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; branch that should fail on the tables pointing to the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;p&gt;To merge the new table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; to show up in the main branch, first commit 
the new change to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt; by again going to &lt;strong&gt;Unversioned Changes&lt;/strong&gt; tab. 
Click &lt;strong&gt;Commit Changes&lt;/strong&gt;. Type a commit message on the popup and click 
&lt;strong&gt;Commit Changes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; add is committed, click on the &lt;strong&gt;Compare&lt;/strong&gt; tab. Set the
base branch to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; and the compared to branch to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sandbox&lt;/code&gt;. You should see
the addition of a line item show up in the diff view. Click &lt;strong&gt;Merge&lt;/strong&gt; and click
&lt;strong&gt;Yes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once this is merged you should see the table data show up in LakeFS. Verify that
you can see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; in the table directories in LakeFS in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; branch.
&lt;a href=&quot;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&quot;&gt;http://localhost:8000/repositories/example/objects?ref=main&amp;amp;path=tiny%2F&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As before, we won’t be able to query this data from Trino until we run the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE TABLE&lt;/code&gt; from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny_sandbox&lt;/code&gt; schema and use the output to create
the table in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; schema that is pointing to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8762-add-query-error-info-to-cluster-overview-page-in-web-ui&quot;&gt;PR of the week: PR 8762 Add query error info to cluster overview page in web UI&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8762&quot;&gt;PR of the week&lt;/a&gt; adds some 
really useful context around query failures in the Trino Web UI. This PR was
created by &lt;a href=&quot;https://github.com/posulliv&quot;&gt;Pádraig O’Sullivan &lt;/a&gt;. For many, it can
be fustrating when a query fails and you have to do a lot of digging before you
understand even the type of error that is happening.This PR gives a better
highlight of what failed so that you don’t have to do a lot of investigation 
upfront to get a sense of what is happening and where to look next.&lt;/p&gt;

&lt;p&gt;Thank you so much Pádraig!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-why-are-deletes-so-limited-in-trino&quot;&gt;Question of the week: Why are deletes so limited in Trino?&lt;/h2&gt;

&lt;p&gt;Our &lt;a href=&quot;https://trinodb.slack.com/archives/CGB0QHWSW/p1632775855390300&quot;&gt;question of the week&lt;/a&gt;
comes from Marius Grama on our Trino community Slack. Marius created the 
&lt;a href=&quot;https://github.com/findinpath/dbt-trino-incremental-hive&quot;&gt;dbt-trino&lt;/a&gt; adapter 
and wants to implement &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt; functionality.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt; checks whether there are entries in the target table that 
exist as well in the staging table, and it first deletes the target entries, 
before inserting the staging entries. Unfortunately the delete didn’t work for
RDBMS, Hive, or Iceberg. His questionis if this is a limitation of Trino for 
all connectors, and how we can approach the “delete” part of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT OVERWRITE&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and Resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/hive-metastore-why-its-still-here-and-what-can-replace-it/&quot;&gt;Hive Metastore - Why its still here and what can replace it&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/hive-metastore-it-didnt-age-well/&quot;&gt;Hive Metastore - It didn’t age well&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/hudi-iceberg-and-delta-lake-data-lake-table-formats-compared/&quot;&gt;Hudi, Iceberg, Delta Lake Table Formats Compared&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lakefs.io/the-docker-everything-bagel-spin-up-a-local-data-stack/&quot;&gt;The Docker Everything Bagel&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>Announcing Trino Summit</title>
      <link href="https://trino.io/blog/2021/09/23/announcing_trino_summit.html" rel="alternate" type="text/html" title="Announcing Trino Summit" />
      <published>2021-09-23T00:00:00+00:00</published>
      <updated>2021-09-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/09/23/announcing_trino_summit</id>
      <content type="html" xml:base="https://trino.io/blog/2021/09/23/announcing_trino_summit.html">&lt;p&gt;Greetings Trino nation,&lt;/p&gt;

&lt;p&gt;Get ready for this year’s virtual Trino Summit event! This year’s summit feels a
little different as the name of the event has changed from Presto to Trino. So
this will be the first event of the project hosted &lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;under the new banner of Trino&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;This year’s Summit is hosted by Starburst virtually on October 21st and 22nd.  We’d originally set the date on September 15th but later realized that this was conflicting with Yom Kippur. While we had originally set out to make this event a hybrid format, we had to make the difficult decision of moving the event to fully virtual in lieu of the growing health concerns around contracting and spreading the delta variant. If you haven’t registered yet, &lt;a href=&quot;http://starburst.io/trinosummit2021&quot;&gt;register here&lt;/a&gt;. If you planned on attending in person, we will still have your registration and you will still be able to attend virtually.&lt;/p&gt;

&lt;p&gt;Get excited for our great lineup of speakers, panels, and presentations! We’re always on the lookout for speakers who are excited to share their Trino experiences.&lt;/p&gt;

&lt;p&gt;We look forward to seeing you there!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Greetings Trino nation, Get ready for this year’s virtual Trino Summit event! This year’s summit feels a little different as the name of the event has changed from Presto to Trino. So this will be the first event of the project hosted under the new banner of Trino.</summary>

      
      
    </entry>
  
    <entry>
      <title>26: Trino discovers data catalogs with Amundsen</title>
      <link href="https://trino.io/episodes/26.html" rel="alternate" type="text/html" title="26: Trino discovers data catalogs with Amundsen" />
      <published>2021-09-16T00:00:00+00:00</published>
      <updated>2021-09-16T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/26</id>
      <content type="html" xml:base="https://trino.io/episodes/26.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Mark Grover, Co-creator of Amundsen and Founder at &lt;a href=&quot;https://www.stemma.ai/&quot;&gt;Stemma&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/mark_grover&quot;&gt;@mark_grover&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-362&quot;&gt;Release 362&lt;/h2&gt;

&lt;p&gt;Official announcement items from Martin is not yet available since release it 
not out… but soon.&lt;/p&gt;

&lt;p&gt;Manfreds notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Add new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;listagg&lt;/code&gt; function contributed by Marius&lt;/li&gt;
  &lt;li&gt;Join performance and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DISTINCT&lt;/code&gt; performance improvements&lt;/li&gt;
  &lt;li&gt;SQL security related changes in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER SCHEMA&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN table&lt;/code&gt; for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP&lt;/code&gt;/… &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROLE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Whole bunch of improvements in the BigQuery connector&lt;/li&gt;
  &lt;li&gt;Numerous improvements for Parquet file usage in Hive connector&lt;/li&gt;
  &lt;li&gt;All connector docs now have SQL support section&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-data-discovery-and-amundsen&quot;&gt;Concept of the week: Data discovery and Amundsen&lt;/h2&gt;

&lt;p&gt;Data discovery is a process that aids in the analysis of data where siloed data 
has been centralized, and it is difficult to find data or overlap between
disparate data sets. Many teams have their own view of the world when it comes 
to the data they need, but they commonly need to reason about how their data 
relates to data outside of their domain.&lt;/p&gt;

&lt;p&gt;There are typically questions about who owns what data to help identify 
individuals responsible for maintaining the standards. Additionally, there are 
also issues around providing documentation around the data, and to identify who 
to call for help if there are issues using the data. This allows analysts to 
discover patterns in the data, and periodically audit the data storage 
practices. Interesting questions also arise around existing policies, and can 
encourage a system of record that act as a shared front end around their data 
policies.&lt;/p&gt;

&lt;h3 id=&quot;what-is-amundsen&quot;&gt;What is Amundsen?&lt;/h3&gt;

&lt;p&gt;Amundsen provides data discovery by using ETL processes to scrape metadata from
all of the data sources. It creates a central location to collect all that 
metadata and enables search and other analytics of this metadata. Here’s how the
project describes itself on &lt;a href=&quot;https://www.amundsen.io/amundsen/&quot;&gt;the Amundsen website&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Amundsen is a data discovery and metadata engine for improving the 
productivity of data analysts, data scientists and engineers when interacting
with data. It does that today by indexing data resources (tables, dashboards,
streams, etc.) and powering a page-rank style search based on usage patterns 
(e.g. highly queried tables show up earlier than less queried tables).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Amundsen has an architecture that interacts primarily with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;information_schema&lt;/code&gt;
tables, among other metadata, depending on the data source. In Trino’s case, 
&lt;a href=&quot;https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/presto_view_metadata_extractor.py&quot;&gt;the extractor used&lt;/a&gt; 
connects directly to the Hive metastore database, for Trino views, since 
they’re stored there. Physical tables use the &lt;a href=&quot;https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/hive_table_metadata_extractor.py&quot;&gt;HiveTableMetadataExtractor&lt;/a&gt;
to load these tables into Amundsen. This makes sense since the data is stored in
the Hive table format. For non-Hive use cases, you generally want to bypass
using Trino (for now) and directly connect Amundsen to each data source.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/26/amundsen-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Amundsen includes an ETL framework called &lt;a href=&quot;https://www.amundsen.io/amundsen/databuilder/&quot;&gt;databuilder&lt;/a&gt;
that runs multiple jobs. Jobs contain an ETL task to extract the metadata and 
load it into the two databases that are central to Amundsen, Neo4j and 
Elasticsearch. Neo4j stores the core metadata that is represented on the UI. 
Elasticsearch enables search over the many fields in the metadata. Ingestion via
ETL follows the following steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ingest base data to Neo4j.&lt;/li&gt;
  &lt;li&gt;Ingest additional data and decorate Neo4j over base data.&lt;/li&gt;
  &lt;li&gt;Update Elasticsearch index using Neo4j data.&lt;/li&gt;
  &lt;li&gt;Remove stale data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each job contains an ETL task. The task must define an extractor and a loader, 
and optionally a translator. You can see example configurations for different
extractors on the website, like the &lt;a href=&quot;https://www.amundsen.io/amundsen/databuilder/#hivetablemetadataextractor&quot;&gt;example for the HiveTableMetadataExtractor&lt;/a&gt;.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/26/amundsen-job.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The metadata is modeled using a graph representation in neo4j and optionally
&lt;a href=&quot;https://atlas.apache.org/#/&quot;&gt;Apache Atlas&lt;/a&gt; to model advanced concepts, such as,
lineage and other relations.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/26/amundsen-metadata.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;You can learn more about the &lt;a href=&quot;https://www.amundsen.io/amundsen/databuilder/docs/models/&quot;&gt;models in the metadata here&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;amundsen-resources&quot;&gt;Amundsen resources&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Docs: &lt;a href=&quot;https://www.amundsen.io/amundsen/&quot;&gt;https://www.amundsen.io/amundsen/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;GitHub: &lt;a href=&quot;https://github.com/amundsen-io/amundsen&quot;&gt;https://github.com/amundsen-io/amundsen&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;YouTube: &lt;a href=&quot;https://www.youtube.com/playlist?list=PL0UJdxehTNlKnGU_h7k2fzJyvAiufeh1U&quot;&gt;https://www.youtube.com/playlist?list=PL0UJdxehTNlKnGU_h7k2fzJyvAiufeh1U&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slack: &lt;a href=&quot;https://join.slack.com/t/amundsenworkspace/shared_invite/enQtNTk2ODQ1NDU1NDI0LTc3MzQyZmM0ZGFjNzg5MzY1MzJlZTg4YjQ4YTU0ZmMxYWU2MmVlMzhhY2MzMTc1MDg0MzRjNTA4MzRkMGE0Nzk&quot;&gt;Join&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;amundsen-as-a-subcomponent-to-data-mesh&quot;&gt;Amundsen as a subcomponent to data mesh&lt;/h3&gt;

&lt;p&gt;A new architecture, philosophy, and yes, &lt;a href=&quot;https://www.merriam-webster.com/dictionary/buzzword&quot;&gt;buzzword&lt;/a&gt; 
that is gaining momentum is the &lt;em&gt;data mesh&lt;/em&gt;. While it certainly still not 
concretely defined, it is in the research and development phase. Data mesh is
gaining a lot of attention as a potential alternative to data lakes and data 
warehouses for analytics solutions.&lt;/p&gt;

&lt;p&gt;Data mesh mirrors the philosophy of microservice architecture. It argues that 
data should be defined and maintained by teams responsible for their business 
domain similar to how the responsibility is delegated at the service layer. 
Since not everyone is going to be a data engineer on the domain team, there must
be some consideration for the architecture of such a platform. The author of 
this paradigm, Zhamak Dehghani, lays out 4 principles that characterize a data 
mesh. Below are the principles of a Data mesh. Below the systems that provide 
some or all of the solution for a principle are listed in parentheses.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Domain-oriented decentralized data ownership and architecture (Trino &amp;amp; Amundsen)&lt;/li&gt;
  &lt;li&gt;Data as a product	(Amundsen)&lt;/li&gt;
  &lt;li&gt;Self-serve data infrastructure as a platform (Trino)&lt;/li&gt;
  &lt;li&gt;Federated computational governance (Amundsen to some extent)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;stemma&quot;&gt;Stemma&lt;/h3&gt;

&lt;p&gt;Like with many successful open source projects, there are enterprise products 
that build on and support the open source project. &lt;a href=&quot;https://www.stemma.ai/&quot;&gt;Stemma&lt;/a&gt; 
is the enterprise company that supports Amundsen. It’s founded by Mark and 
others central to the open source project.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-index-trino-views&quot;&gt;PR of the week: Index Trino views&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/amundsen-io/amundsen/commit/4cfc55d311ca7bc9b02df26ece3b4bde5eedecd6#diff-1c6e94c4ea77e16625f97d4e029f5611d3f3b10d428ab6038edc0b931df4243c&quot;&gt;PR (or should we say commit) of the week&lt;/a&gt;, 
adds the original Trino extractor. As mentioned above this extractor is only
needed for views as the physical tables exist in Hive and are retrieved.&lt;/p&gt;

&lt;h3 id=&quot;call-to-contribute-to-amundsen&quot;&gt;Call to contribute to Amundsen&lt;/h3&gt;

&lt;p&gt;If you want to help out, you can consider adding the Trino image similar to 
&lt;a href=&quot;https://github.com/amundsen-io/amundsenfrontendlibrary/commit/4e24bfe1c1cd3c6cf568ee1b3e39580686fafbe6&quot;&gt;this commit completed a while back&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-extracting-metadata-from-hive-metastore-and-loading-it-into-amundsen&quot;&gt;Demo: Extracting metadata from Hive metastore and loading it into Amundsen&lt;/h2&gt;

&lt;p&gt;There were technical difficulties on the day of broadcasting the show, so the
demo was moved to its own separate video.&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/m-mL00FkWd0&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;The steps in this demo are adapted from the &lt;a href=&quot;https://www.amundsen.io/amundsen/installation/&quot;&gt;Amundsen installation page&lt;/a&gt;.
Clone this repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-getting-started/community_tutorials/amundsen&lt;/code&gt; 
directory. For this demo you need at least 3GB of memory allocated to your 
Docker application.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/amundsen

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once all the services are running, clone the Amundsen repository in a separate
terminal. Then navigate to the databuilder folder and install all the 
dependencies:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone --recursive https://github.com/amundsen-io/amundsen.git
cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Navigate to MinIO at &lt;a href=&quot;http://localhost:9000&quot;&gt;http://localhost:9000&lt;/a&gt; to create the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; bucket for the
schema in Trino to map to. In Trino, create a schema and a couple tables in the 
existing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; catalog:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA minio.tiny
WITH (location = &apos;s3a://tiny/&apos;);

CREATE TABLE minio.tiny.customer
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://tiny/customer/&apos;
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE minio.tiny.orders
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;s3a://tiny/orders/&apos;
) 
AS SELECT * FROM tpch.tiny.orders;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Navigate back to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-getting-started/community_tutorials/amundsen&lt;/code&gt; directory in the same 
Python virtual environment you just opened.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cd trino-getting-started/community_tutorials/amundsen
python3 assets/scripts/sample_trino_data_loader.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;View the Amundsen UI at &lt;a href=&quot;http://localhost:5000&quot;&gt;http://localhost:5000&lt;/a&gt; and try to search test, it 
should return the tables you just created.&lt;/p&gt;

&lt;p&gt;You can verify dummy data has been ingested into Neo4j by visiting &lt;a href=&quot;http://localhost:7474/browser/&quot;&gt;http://localhost:7474/browser/&lt;/a&gt;.
Log in as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;neo4j&lt;/code&gt; with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;test&lt;/code&gt; password and run 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH (n:Table) RETURN n LIMIT 25&lt;/code&gt; in the query box. You should see few tables.&lt;/p&gt;

&lt;p&gt;If you have any issues, look at some of the &lt;a href=&quot;https://www.amundsen.io/amundsen/installation/#troubleshooting&quot;&gt;troubleshooting steps&lt;/a&gt;
in the Amundsen installation page.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-can-i-add-a-udf-without-restarting-trino&quot;&gt;Question of the week: Can I add a UDF without restarting Trino?&lt;/h2&gt;

&lt;p&gt;This weeks question of the week comes in from the Trino Slack from Chen Xuying.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Is there any way to register &lt;a href=&quot;https://trino.io/docs/current/develop/functions.html&quot;&gt;a new user defined function (UDF)&lt;/a&gt; 
and needn’t restart coordinator and worker?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Currently, no. In Java, jar files and all the java code is loaded up on start 
time. So in order to load the files on all the worker nodes and coordinator, you
need to restart. There are various ways for UDFs to be implemented in a dynamic
way so we are still looking for a suggestion here.&lt;/p&gt;

&lt;p&gt;One option, as Manfred mentions, would be to load Javascript as a UDF as Java
allows to compile Javascript. This would allow for new functions to be added 
without restart. There may be other ways to acheive and we invite you to
contribute your ideas!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://martinfowler.com/articles/data-mesh-principles.html&quot;&gt;Data Mesh Principles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.starburst.io/trino-data-governance-and-accelerating-data-science&quot;&gt;Trino, Data Governance, and Accelerating Data Science&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Mark Grover, Co-creator of Amundsen and Founder at Stemma (@mark_grover). Release 362</summary>

      
      
    </entry>
  
    <entry>
      <title>25: Trino going through changes</title>
      <link href="https://trino.io/episodes/25.html" rel="alternate" type="text/html" title="25: Trino going through changes" />
      <published>2021-09-02T00:00:00+00:00</published>
      <updated>2021-09-02T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/25</id>
      <content type="html" xml:base="https://trino.io/episodes/25.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Ayush Chauhan, Data Platform Engineer at &lt;a href=&quot;https://www.zomato.com/who-we-are&quot;&gt;Zomato&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/ayush-chauhan/&quot;&gt;Ayush Chauhan&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Gunnar Morling, Lead of Debezium and Open source software engineer at &lt;a href=&quot;https://www.redhat.com&quot;&gt;Red Hat&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/gunnarmorling&quot;&gt;@gunnarmorling&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Ashhar Hasan, Software Engineer at &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/hashhar&quot;&gt;@hashhar&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-361&quot;&gt;Release 361&lt;/h2&gt;

&lt;p&gt;Official announcement items from Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for OAuth2/OIDC opaque access tokens&lt;/li&gt;
  &lt;li&gt;Aggregation pushdown for Pinot&lt;/li&gt;
  &lt;li&gt;Better performance for Parquet files with column indexes&lt;/li&gt;
  &lt;li&gt;Support for reading fields as JSON values in Elasticsearch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Predicate pushdown in Cassandra&lt;/li&gt;
  &lt;li&gt;Metadata cache size limitation in a few connectors&lt;/li&gt;
  &lt;li&gt;Lots of improvements for Hive view support&lt;/li&gt;
  &lt;li&gt;Glue table statistics improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-361.html&quot;&gt;https://trino.io/docs/current/release/release-361.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-change-data-capture&quot;&gt;Concept of the week: Change Data Capture&lt;/h2&gt;

&lt;p&gt;If you know Trino, you know it allows for flexible architectures that include 
many systems with varying use cases they support. We’ve come to accept this 
potpourri of systems as a general modus operandi for most businesses.&lt;/p&gt;

&lt;p&gt;Many times the data gets copied to different systems to accomplish varying use 
cases from performance and data warehousing to merge cross cutting data into a 
single store. When copying data between systems, how do these systems stay in 
sync? It’s a critical need especially for Trino to know that the state across 
the data sources we query is valid.&lt;/p&gt;

&lt;p&gt;To answer this, we can use the concept of Change Data Capture (CDC). CDC is a 
powerful concept that considers a data source(s), called a systems of record(s), 
that store the true state of a system. The systems of records are monitored for
changes, and upon detecting changes, the CDC system propogates changes to a 
number of target systems.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/cdc.png&quot; /&gt;&lt;br /&gt;
Change Data Capture: &lt;a href=&quot;https://medium.com/event-driven-utopia/a-gentle-introduction-to-event-driven-change-data-capture-683297625f9b&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;h3 id=&quot;debezium-for-cdc&quot;&gt;Debezium for CDC&lt;/h3&gt;

&lt;p&gt;One implemention of CDC that has grown tremendously in popularity since its 
inception is called Debezium. According to &lt;a href=&quot;https://debezium.io&quot;&gt;https://debezium.io&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Debezium is an open-source distributed platform for change data capture. Start
it up, point it at your databases, and your apps can start responding to all 
of the inserts, updates, and deletes that other apps commit to your databases.
Debezium is durable and fast, so your apps can respond quickly and never miss
an event, even when things go wrong.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The common way Debezium is deployed in the wild is using [Kafka Connect(https://docs.confluent.io/platform/current/connect/index.html) 
and defining the Debezium source connectors. You can then use the Kafka Connect 
ecosystem to create to different targets downstream.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/debezium-architecture.png&quot; /&gt;&lt;br /&gt;
The Debezium architecture with Kafka Connect: &lt;a href=&quot;https://debezium.io/documentation/reference/architecture.html&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Another alternative, if you don’t want to use Kafka, is to use dedicated Debezium
servers to implement CDC and push the logs to the target database downstram 
using Debezium connectors.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/debezium-server-architecture.png&quot; /&gt;&lt;br /&gt;
The Debezium standalone server architecture: &lt;a href=&quot;https://debezium.io/documentation/reference/architecture.html&quot;&gt;Source&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;While CDC is the primary focus, Debezium also provides support for more advanced
concepts such as the &lt;a href=&quot;https://debezium.io/documentation/reference/integrations/outbox.html&quot;&gt;outbox pattern support for Quarkus apps&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;debezium--trino-at-zomato&quot;&gt;Debezium + Trino at Zomato&lt;/h3&gt;

&lt;p&gt;Zomato is a technology platform that connects customers, restaurant partners and
delivery partners, serving their multiple needs. Customers use their platform to
search and discover restaurants, read and write customer generated reviews and 
view and upload photos, order food delivery, book a table and make payments 
while dining-out at restaurants. Clearly there’s a lot of data that can flow
through a platform like this. You’ll have both operational databases to support
the applications in this platform, but also need big data stores to store and
analyze all of this data.&lt;/p&gt;

&lt;p&gt;Here is one of the earlier iterations of Zomato’s big data architecture before
they were able to integrate Debezium. Ayush covers some of the pain points they
experienced before implementing CDC.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/zomato-before.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Once Zomato implemented CDC, they were able to keep their downstream Iceberg 
stores in sync across multiple operational systems. As a result the analytics 
data is now much more dependable.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/25/zomato-after.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-4140-implement-aggregation-pushdown-in-pinot&quot;&gt;PR of the week: PR 4140 Implement aggregation pushdown in Pinot&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/6069&quot;&gt;PR of the week&lt;/a&gt; is actually a
throwback to &lt;a href=&quot;/episodes/13.html&quot;&gt;episode thirteen&lt;/a&gt;, &lt;em&gt;Trino takes a sip of Pinot&lt;/em&gt;,
where our guest &lt;a href=&quot;https://twitter.com/ElonAzoulay&quot;&gt;Elon Azoulay&lt;/a&gt; discussed some of
the upcoming features coming to the Pinot connector were. Push down aggregates
was on that list and this just landed in the 361 release!&lt;/p&gt;

&lt;p&gt;This PR implements aggregation pushdown for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COUNT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AVG&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MIN&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAX&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SUM&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COUNT(DISTINCT)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;approx_distinct&lt;/code&gt;. It is enabled by default and can be 
disabled using the configuration property &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.aggregation-pushdown.enabled&lt;/code&gt; 
or the catalog session property &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aggregation_pushdown_enabled&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;FYI: &lt;a href=&quot;https://github.com/trinodb/trino/pull/9208&quot;&gt;https://github.com/trinodb/trino/pull/9208&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks Elon!&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-is-there-an-array-function-that-flattens-a-row-like-1--a-b-c-into-three-rows&quot;&gt;Question of the week: Is there an array function that flattens a row like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1 | [a, b, c]&lt;/code&gt; into three rows?&lt;/h2&gt;

&lt;p&gt;Our &lt;a href=&quot;https://trinodb.slack.com/archives/CFLB9AMBN/p1630241736052500&quot;&gt;question of the week&lt;/a&gt;
comes from Brian Hudson on our Trino community Slack. Brian is dealing with an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY&lt;/code&gt;
type in one column and an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTEGER&lt;/code&gt; column in another. This is common when 
processing nested denormalized data. The goal is to make this row &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1 | [a, b, c]&lt;/code&gt;,
split the array into three rows.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1 | a
1 | b
1 | c
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Kasia answered this question by using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; on the array column. This
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; statement produces a single column of the size of the array and a 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; is performed with the original &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTEGER&lt;/code&gt; column.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
WITH t(x, y) AS (VALUES (1, ARRAY[&apos;a&apos;, &apos;b&apos;, &apos;c&apos;]))
SELECT x, y_unnested
FROM t
LEFT JOIN UNNEST (t.y) t2(y_unnested) ON true;

trino&amp;gt; WITH t(x, y) AS (VALUES (1, ARRAY[&apos;a&apos;, &apos;b&apos;, &apos;c&apos;]))
     -&amp;gt; SELECT x, y_unnested
     -&amp;gt; FROM t
     -&amp;gt; LEFT JOIN UNNEST (t.y) t2(y_unnested) ON true;
 x | y_unnested
---+------------
 1 | a
 1 | b
 1 | c
(3 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs and Resources&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/event-driven-utopia/a-gentle-introduction-to-event-driven-change-data-capture-683297625f9b&quot;&gt;A gentle introduction to Event Driven Change Data Capture&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/event-driven-utopia/a-visual-introduction-to-debezium-32563e23c6b8&quot;&gt;A Visual Introduction to Debezium&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://debezium.io/blog/&quot;&gt;Debezium Blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://debezium.io/documentation/reference/&quot;&gt;Debezium Docs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/debezium/debezium-examples/&quot;&gt;Debezium Examples&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://debezium.io/documentation/online-resources/&quot;&gt;Debezium Resources&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoq.com/presentations/data-streaming-kafka-debezium/&quot;&gt;Practical Change Data Streaming Use Cases with Apache Kafka &amp;amp; Debezium&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://speakerdeck.com/gunnarmorling/practical-change-data-streaming-use-cases-with-apache-kafka-and-debezium-qcon-san-francisco-2019&quot;&gt;Slides&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QYbXDp4Vu-8&quot;&gt;Apache Kafka and Debezium / DevNation Tech Talk&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>24: Trinetes I: Trino on Kubernetes</title>
      <link href="https://trino.io/episodes/24.html" rel="alternate" type="text/html" title="24: Trinetes I: Trino on Kubernetes" />
      <published>2021-08-19T00:00:00+00:00</published>
      <updated>2021-08-19T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/24</id>
      <content type="html" xml:base="https://trino.io/episodes/24.html">&lt;p&gt;This is the first episode in a series where we cover the basics and just enough
advanced Kubernetes features and information to understand how to deploy Trino 
on Kubernetes.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-k8s-architecture-containers-pods-and-kubelets&quot;&gt;Concept of the week: K8s architecture: Containers, Pods, and kubelets&lt;/h2&gt;

&lt;p&gt;For this concept of the week, we want to provide you a minimalistic overview of
what you need to know about Kubernetes to deploy Trino to a cluster.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Why Kubernetes?&lt;/strong&gt; Kubernetes is a container orchestration platform that allows
you to indicate how to manage containers declaritively using yaml 
configuration files. This definition can be tricky to understand if you don’t
have proper context. To make sure nobody is left behind, it is useful to 
cover what containers are:&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;The traditional way to deploy an application is to take the compiled 
binary of that application and run it directly on computer hardware that has
an operating system to run the application on it. This works, but has a lot
of dependency on the underlying hardware and operating system to be 
functional and requires multiple applications to share the same resources. If
one of the applications fails and causes any of the shared resources to 
crash, it could cause all applications to fail on that machine.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;To remove these dependencies, engineers created virtual machines (VMs) by 
using a VM manager called the hypervisor that emulate hardware environments 
to host other operating systems. This is a big step forward as now each 
application can be isolated, but it comes at a great cost. Each virtual machine
hosts an entire operating system and is resource intensive and slow.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Containers are the newest type of deployment. Containers enable a logical
isolation of resources while still physically running on shared resources. 
All resources created in the hardware and operating systems exist on the host
system. The isolation restricts any interference from other processes. 
Containers achieve the goals of virtualization without sacrificing much 
performance or efficiency.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;

    &lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/container-evolution.svg&quot; /&gt;&lt;br /&gt;
 Source: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/
&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;Containerization simplified a trend in service oriented architecture called 
microservices. Microservices deploy loosely coupled and modular applications
rather than all-encompassing monolithic applications. With containers, these
applications can be deployed and scaled up quickly across various virtual and
physical machines without affecting other applications on the same machine. 
This is great, but results in new complexities. Some examples are the need 
for new approaches to monitoring the health of applications, scaling the 
applications as requests grow and diminish, redeploying crashed applications, 
and networking the applications together. In summary, all of these activities
can be considered container orchestration and this is exactly what Kubernetes
solves!&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/load-balancer.jpeg&quot; /&gt;&lt;br /&gt;
 Source: https://www.slideshare.net/devopsdaysaustin/continuously-delivering-microservices-in-kubernetes-using-jenkins&lt;br /&gt;
 Here we hae two services that each sit behind a load balancer provided and mapped by the Kuberenets cluster.
&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Kubernetes components and architecture&lt;/strong&gt;:&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;Node - The physical machine or VM running a kubelet and container runtime.&lt;/li&gt;
      &lt;li&gt;Control Plane - The container orchestration layer that exposes the API and 
interfaces to define, deploy, and manage the lifecycle of containers.&lt;/li&gt;
      &lt;li&gt;Cluster - a set of nodes connected to the same control plane.&lt;/li&gt;
      &lt;li&gt;Pod - single instance of an application, the smallest object in kubernetes.&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/components-of-kubernetes.svg&quot; /&gt;&lt;br /&gt;
 Source: https://kubernetes.io/docs/concepts/overview/components/
&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;kubernetes-control-plane-components&quot;&gt;Kubernetes control plane components:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;API server that nodes connect with and is the front end for users and 
 administrators of the cluster.&lt;/li&gt;
  &lt;li&gt;etcd keystore is a distributed store containing all data used to manage 
 the cluster&lt;/li&gt;
  &lt;li&gt;Scheduler that distributes work across nodes and assigns newly created 
 containers to nodes&lt;/li&gt;
  &lt;li&gt;Controllers that are the brain behind orchestration and monitors for 
 nodes going down etc…&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;kubernetes-worker-node-components&quot;&gt;Kubernetes worker node components:&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;container runtime - underlying runtime used to manage containers&lt;/li&gt;
  &lt;li&gt;kubelet - agent that checks the health and manages the pods running on the node based on the desired state provided in the PodSpec&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;kube-proxy - network proxy that maintains network rules applied to nodes and allows network access between Pods in a cluster&lt;/p&gt;

    &lt;p&gt;You can scale up multiple pods on a single node until the node has no more 
resources, at which time a new node needs to be added and pod instances are 
distributed between the nodes.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;So how does this relate to Trino?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Out of the box, Kubernetes can do these key things for Trino.
    &lt;ul&gt;
      &lt;li&gt;Simple scale up and down (manually tell k8s to start or kill Trino pods).&lt;/li&gt;
      &lt;li&gt;Kubernetes supports failover, meaning that your workers will restart if they die.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Advanced jobs that could exist but not currently in open source.
    &lt;ul&gt;
      &lt;li&gt;Auto-scaling via the &lt;a href=&quot;https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/&quot;&gt;Horizontal Pod Autoscaler&lt;/a&gt; 
 and custom metrics.&lt;/li&gt;
      &lt;li&gt;Graceful Shutdowns are hooks that you can add into your cluster that wait 
 to shut down to avoid a failed call to a node that already shut down.&lt;/li&gt;
    &lt;/ul&gt;
    &lt;p align=&quot;center&quot;&gt;
     &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/kubernetes-shutdown.svg&quot; /&gt;&lt;br /&gt;
     Source: https://learnk8s.io/graceful-shutdown
  &lt;/p&gt;
    &lt;p align=&quot;center&quot;&gt;
     &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/24/graceful-shutdown.svg&quot; /&gt;&lt;br /&gt;
     Source: https://learnk8s.io/graceful-shutdown
  &lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;What the heck are helm charts then?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Helm is package manager for Kubernetes&lt;/li&gt;
  &lt;li&gt;Removes the need for managing lots of Kubernetes related yaml files&lt;/li&gt;
  &lt;li&gt;Best way to deploy apps to Kubernetes&lt;/li&gt;
  &lt;li&gt;Charts are available for many different applications&lt;/li&gt;
  &lt;li&gt;Helm chart for Trino&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-11-merge-contributor-version-of-k8s-charts-with-the-community-version&quot;&gt;PR of the week: PR 11 Merge contributor version of k8s charts with the community version&lt;/h2&gt;

&lt;p&gt;This weeks &lt;a href=&quot;https://github.com/trinodb/charts/pull/11&quot;&gt;PR of the week&lt;/a&gt; comes 
from a different repo under the trinodb org, &lt;a href=&quot;https://github.com/trinodb/charts&quot;&gt;trinodb/charts&lt;/a&gt;.
This PR contains the merging from contributor &lt;a href=&quot;https://github.com/valeriano-manassero&quot;&gt;Valeriano Manassero&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Valerino maintains a &lt;a href=&quot;https://github.com/valeriano-manassero/helm-charts/tree/main/valeriano-manassero/trino&quot;&gt;very useful helm chart&lt;/a&gt;, 
that started before the Trino org had defined our own community chart. This pull
request effectively is trying to merge some useful features Valeriano added to 
his Trino helm chart so that it can be maintained in the community version.&lt;/p&gt;

&lt;p&gt;Valeriano’s Trino Helm Chart: &lt;a href=&quot;https://artifacthub.io/packages/helm/valeriano-manassero/trino&quot;&gt;https://artifacthub.io/packages/helm/valeriano-manassero/trino&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It hasn’t been merged yet but we are really looking forward to seeing this get
merged in. Thanks Valeriano!&lt;/p&gt;

&lt;h2 id=&quot;demo-running-the-trino-charts-with-kubectl&quot;&gt;Demo: Running the Trino charts with kubectl&lt;/h2&gt;

&lt;p&gt;For this weeks demo, you need to install &lt;a href=&quot;https://kubernetes.io/docs/tasks/tools/&quot;&gt;kubectl&lt;/a&gt;,
&lt;a href=&quot;https://minikube.sigs.k8s.io/docs/start/&quot;&gt;minikube&lt;/a&gt; using the &lt;a href=&quot;https://minikube.sigs.k8s.io/docs/drivers/docker/&quot;&gt;docker driver&lt;/a&gt;,
and &lt;a href=&quot;https://helm.sh/docs/intro/install/&quot;&gt;helm&lt;/a&gt;. You can find the trino helm 
chart on ArtifactHub at this URL.&lt;/p&gt;

&lt;p&gt;https://artifacthub.io/packages/helm/trino/trino&lt;/p&gt;

&lt;p&gt;First, start your minikube instance.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube start --driver=docker
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now take a quick look at the state of your k8s cluster.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the template for the different trino catalogs on coordinators and workers.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/configmap-catalog.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-catalog
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    role: catalogs
data:
  tpch.properties: |
    connector.name=tpch
    tpch.splits-per-node=4
  tpcds.properties: |
    connector.name=tpcds
    tpcds.splits-per-node=4
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the template for a single coordinator configuration.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/configmap-coordinator.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-coordinator
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    component: coordinator
data:
  node.properties: |
    node.environment=production
    node.data-dir=/data/trino
    plugin.dir=/usr/lib/trino/plugin

  jvm.config: |
    -server
    -Xmx8G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    -Djdk.attach.allowAttachSelf=true
    -XX:-UseBiasedLocking
    -XX:ReservedCodeCacheSize=512M
    -XX:PerMethodRecompilationCutoff=10000
    -XX:PerBytecodeRecompilationCutoff=10000
    -Djdk.nio.maxCachedBufferSize=2000000

  config.properties: |
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8080
    query.max-memory=4GB
    query.max-memory-per-node=1GB
    query.max-total-memory-per-node=2GB
    memory.heap-headroom-per-node=1GB
    discovery-server.enabled=true
    discovery.uri=http://localhost:8080

  log.properties: |
    io.trino=INFO
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the tcb-trino service definition to run Trino.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: tcb-trino
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: trino
    release: tcb
    component: coordinator
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Add the deployment definition for the service.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply -f - &amp;lt;&amp;lt;EOF
# Source: trino/templates/deployment-coordinator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tcb-trino-coordinator
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    component: coordinator
spec:
  selector:
    matchLabels:
      app: trino
      release: tcb
      component: coordinator
  template:
    metadata:
      labels:
        app: trino
        release: tcb
        component: coordinator
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
      volumes:
        - name: config-volume
          configMap:
            name: tcb-trino-coordinator
        - name: catalog-volume
          configMap:
            name: tcb-trino-catalog
      imagePullSecrets:
        - name: registry-credentials
      containers:
        - name: trino-coordinator
          image: &quot;trinodb/trino:latest&quot;
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /etc/trino
              name: config-volume
            - mountPath: /etc/trino/catalog
              name: catalog-volume
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /v1/info
              port: http
          readinessProbe:
            httpGet:
              path: /v1/info
              port: http
          resources:
            {}
EOF
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now check the state of the k8s cluster again.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run the following command to expose the url and port to the localhost system.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube service tcb-trino --url
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Clean up all the resources.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl delete pod --all
kubectl delete replicaset --all
kubectl delete service tcb-trino
kubectl delete deployment tcb-trino-coordinator
kubectl delete configmap --all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now you can run the same demo using the helm chart which includes all of these
templates out-of-the-box. First add the trino helm chart, check the templates
that are produced by helm, and run the install.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# HELM DEMO

helm repo add trino https://trinodb.github.io/charts/

helm template tcb trino/trino --version 0.2.0

helm install tcb trino/trino --version 0.2.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that it’s installed, run the same command to expose the url of the service.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube service tcb-trino --url
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Clean up all the resources.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;minikube delete
helm repo remove trino
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Summit is moving to 100% virtual: &lt;a href=&quot;https://www.starburst.io/info/trinosummit/&quot;&gt;register here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>This is the first episode in a series where we cover the basics and just enough advanced Kubernetes features and information to understand how to deploy Trino on Kubernetes.</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice IV: Deep dive into Iceberg internals</title>
      <link href="https://trino.io/blog/2021/08/12/deep-dive-into-iceberg-internals.html" rel="alternate" type="text/html" title="Trino on ice IV: Deep dive into Iceberg internals" />
      <published>2021-08-12T00:00:00+00:00</published>
      <updated>2021-08-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/08/12/deep-dive-into-iceberg-internals</id>
      <content type="html" xml:base="https://trino.io/blog/2021/08/12/deep-dive-into-iceberg-internals.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far, this series has covered some very interesting user level concepts of the
Iceberg model, and how you can take advantage of them using the Trino query 
engine. This blog post dives into some implementation details of Iceberg by 
dissecting some files that result from various operations carried out using 
Trino. To dissect you must use some surgical instrumentation, namely Trino, Avro
tools, the MinIO client tool and Iceberg’s core library. It’s useful to dissect
how these files work, not only to help understand how Iceberg works, but also to
aid in troubleshooting issues, should you have any issues during ingestion or
querying of your Iceberg table. I like to think of this type of debugging much
like a fun game of operation, and you’re looking to see what causes the red
errors to fly by on your screen.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/operation.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;understanding-iceberg-metadata&quot;&gt;Understanding Iceberg metadata&lt;/h2&gt;

&lt;p&gt;Iceberg can use any compatible metastore, but for Trino, it only supports the 
Hive metastore and AWS Glue similar to the Hive connector. This is because there
is already a vast amount of testing and support for using the Hive metastore in
Trino. Likewise, many Trino use cases that currently use data lakes already use
the Hive connector and therefore the Hive metastore. This makes it convenient to
have as the leading supported use case as existing users can easily migrate
between Hive to Iceberg tables. Since there is no indication of which connector
is actually executed in the diagram of the Hive connector architecture, it
serves as a diagram that can be used for both Hive and Iceberg. The only
difference is the connector used, but if you create a table in Hive, you can 
view the same table in Iceberg.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-metadata.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To recap the steps taken from the first three blogs; the first blog created an
events table, while the first two blogs ran two insert statements. The first
insert contained three records, while the second insert contained a single
record.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-snapshot-files.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Up until this point, the state of the files in MinIO haven’t really been shown
except some of the manifest list pointers from the snapshot in the third blog
post. Using the &lt;a href=&quot;https://docs.min.io/minio/baremetal/reference/minio-cli/minio-mc.html&quot;&gt;MinIO client tool&lt;/a&gt;,
you can list files that Iceberg generated through all these operations and then
try to understand what purpose they are serving.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% mc tree -f local/
local/
└─ iceberg
   └─ logging.db
      └─ events
         ├─ data
         │  ├─ event_time_day=2021-04-01
         │  │  ├─ 51eb1ea6-266b-490f-8bca-c63391f02d10.orc
         │  │  └─ cbcf052d-240d-4881-8a68-2bbc0f7e5233.orc
         │  └─ event_time_day=2021-04-02
         │     └─ b012ec20-bbdd-47f5-89d3-57b9e32ea9eb.orc
         └─ metadata
            ├─ 00000-c5cfaab4-f82f-4351-b2a5-bd0e241f84bc.metadata.json
            ├─ 00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json
            ├─ 00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json
            ├─ 23cc980c-9570-42ed-85cf-8658fda2727d-m0.avro
            ├─ 92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro
            ├─ snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro
            ├─ snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro
            └─ snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are a lot of files here, but here are a couple of patterns that you
can observe with these files.&lt;/p&gt;

&lt;p&gt;First, the top two directories are named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;metadata&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/data//&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;As you might expect, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; contains the actual ORC files split by partition.
This is akin to what you would see in a Hive table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; directory. What is
really of interest here is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;metadata&lt;/code&gt; directory. There are specifically
three patterns of files you’ll find here.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/&amp;lt;file-id&amp;gt;.avro/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/snap-&amp;lt;snapshot-id&amp;gt;-&amp;lt;version&amp;gt;-&amp;lt;file-id&amp;gt;.avro&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/&amp;lt;bucket&amp;gt;/&amp;lt;database&amp;gt;/&amp;lt;table&amp;gt;/metadata/&amp;lt;version&amp;gt;-&amp;lt;commit-UUID&amp;gt;.metadata.json&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Iceberg has a persistent tree structure that manages various snapshots of the
data that are created for every mutation of the data. This enables not only a
concurrency model that supports serializable isolation, but also cool features
like time travel across a linear progression of snapshots.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-metastore-files.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This tree structure contains two types of Avro files, manifest lists and
manifest files. Manifest list files contain pointers to various manifest files
and the manifest files themselves point to various data files. This post starts
out by covering these manifest files, and later covers the table metadata files
that are suffixed by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.metadata.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;The last blog covered&lt;/a&gt;
the command in Trino that shows the snapshot information that is stored in the
metastore. Here is that command and its output again for your review.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT manifest_list 
FROM iceberg.logging.&quot;events$snapshots&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;snapshots&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You’ll notice that the manifest list returns the Avro files prefixed with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snap-&lt;/code&gt; are returned. These files are directly correlated with the snapshot
record stored in the metastore. According to the diagram above, snapshots are
records in the metastore that contain the url of the manifest list in the Avro
file. Avro files are binary files and not something you can just open up in a
text editor to read. Using the 
&lt;a href=&quot;https://downloads.apache.org/avro/avro-1.10.2/java/avro-tools-1.10.2.jar&quot;&gt;avro-tools.jar tool&lt;/a&gt;
distributed by the 
&lt;a href=&quot;https://avro.apache.org/docs/current/index.html&quot;&gt;Apache Avro project&lt;/a&gt;,
you can actually inspect the contents of this file to get a better understanding
of how it is used by Iceberg.&lt;/p&gt;

&lt;p&gt;The first snapshot is generated on the creation of the events table. Upon
inspecting this file, you notice that the file is empty. The output is an
empty line that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jq&lt;/code&gt; JSON command line utility removes on pretty printing
the JSON that is returned, which is just a newline. This snapshot represents an
empty state of the table upon creation. To investigate the snapshots you need to
download the files to your local filesystem. Let’s move them to the home 
directory:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro | jq .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result (is empty):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The second snapshot is a little more interesting and actually shows us the 
contents of a manifest list.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro | jq .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;manifest_path&quot;:&quot;s3a://iceberg/logging.db/events/metadata/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro&quot;,
   &quot;manifest_length&quot;:6114,
   &quot;partition_spec_id&quot;:0,
   &quot;added_snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;added_data_files_count&quot;:{
      &quot;int&quot;:2
   },
   &quot;existing_data_files_count&quot;:{
      &quot;int&quot;:0
   },
   &quot;deleted_data_files_count&quot;:{
      &quot;int&quot;:0
   },
   &quot;partitions&quot;:{
      &quot;array&quot;:[
         {
            &quot;contains_null&quot;:false,
            &quot;lower_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            },
            &quot;upper_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001fI\u0000\u0000&quot;
            }
         }
      ]
   },
   &quot;added_rows_count&quot;:{
      &quot;long&quot;:3
   },
   &quot;existing_rows_count&quot;:{
      &quot;long&quot;:0
   },
   &quot;deleted_rows_count&quot;:{
      &quot;long&quot;:0
   }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To understand each of the values in each of these rows, you can refer to the 
Iceberg 
&lt;a href=&quot;https://iceberg.apache.org/spec/#manifest-lists&quot;&gt;specification in the manifest list file section&lt;/a&gt;.
Instead of covering these exhaustively, let’s focus on a few key fields. Below
are the fields, and their definition according to the specification.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;manifest_path&lt;/code&gt; - Location of the manifest file.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;partition_spec_id&lt;/code&gt; - ID of a partition spec used to write the manifest; must
be listed in table metadata partition-specs.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;added_snapshot_id&lt;/code&gt; - ID of the snapshot where the manifest file was added.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;partitions&lt;/code&gt; - A list of field summaries for each partition field in the spec.
Each field in the list corresponds to a field in the manifest file’s partition
spec.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;added_rows_count&lt;/code&gt; - Number of rows in all files in the manifest that have
status ADDED, when null this is assumed to be non-zero.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As mentioned above, manifest lists hold references to various manifest files.
These manifest paths are the pointers in the persistent tree that tells any
client using Iceberg where to find all of the manifest files associated with a
particular snapshot. To traverse this tree, you can look over the different
manifest paths to find all the manifest files associated with the particular
snapshot you want to traverse. Partition spec ids are helpful to know the
current partition specification which are stored in the table metadata in the
metastore. This references where to find the spec in the metastore. Added
snapshot ids tells you which snapshot is associated with the manifest list.
Partitions hold some high level partition bound information to make for faster
querying. If a query is looking for a particular value, it only traverses the
manifest files where the query values fall within the range of the file values.
Finally, you get a few metrics like the number of changed rows and data files,
one of which is the count of added rows. The first operation consisted of three
rows inserts and the second operation was the insertion of one row. Using the
row counts you can easily determine which manifest file belongs to which
operation.&lt;/p&gt;

&lt;p&gt;The following command shows the final snapshot after both operations executed
and filters out only the fields pointed out above.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro | jq &apos;. | {manifest_path: .manifest_path, partition_spec_id: .partition_spec_id, added_snapshot_id: .added_snapshot_id, partitions: .partitions, added_rows_count: .added_rows_count }&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;manifest_path&quot;:&quot;s3a://iceberg/logging.db/events/metadata/23cc980c-9570-42ed-85cf-8658fda2727d-m0.avro&quot;,
   &quot;partition_spec_id&quot;:0,
   &quot;added_snapshot_id&quot;:{
      &quot;long&quot;:4564366177504223700
   },
   &quot;partitions&quot;:{
      &quot;array&quot;:[
         {
            &quot;contains_null&quot;:false,
            &quot;lower_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            },
            &quot;upper_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            }
         }
      ]
   },
   &quot;added_rows_count&quot;:{
      &quot;long&quot;:1
   }
}
{
   &quot;manifest_path&quot;:&quot;s3a://iceberg/logging.db/events/metadata/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro&quot;,
   &quot;partition_spec_id&quot;:0,
   &quot;added_snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;partitions&quot;:{
      &quot;array&quot;:[
         {
            &quot;contains_null&quot;:false,
            &quot;lower_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001eI\u0000\u0000&quot;
            },
            &quot;upper_bound&quot;:{
               &quot;bytes&quot;:&quot;\u001fI\u0000\u0000&quot;
            }
         }
      ]
   },
   &quot;added_rows_count&quot;:{
      &quot;long&quot;:3
   }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the listing of the manifest file related to the last snapshot, you notice the
first operation where three rows were inserted is contained in the manifest file
in the second JSON object. You can determine this from the snapshot id, as well
as, the number of rows that were added in the operation. The first JSON object
contains the last operation that inserted a single row. So the most recent
operations are listed in reverse commit order.&lt;/p&gt;

&lt;p&gt;The next command does the same listing of the file that you ran with the
manifest list, except you run this on the manifest files themselves to expose
their contents and discuss them. To begin with, you run the command to show the
contents of the manifest file associated with the insertion of three rows.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% java -jar  ~/avro-tools-1.10.0.jar tojson ~/Desktop/avro_files/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro | jq .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;status&quot;:1,
   &quot;snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;data_file&quot;:{
      &quot;file_path&quot;:&quot;s3a://iceberg/logging.db/events/data/event_time_day=2021-04-01/51eb1ea6-266b-490f-8bca-c63391f02d10.orc&quot;,
      &quot;file_format&quot;:&quot;ORC&quot;,
      &quot;partition&quot;:{
         &quot;event_time_day&quot;:{
            &quot;int&quot;:18718
         }
      },
      &quot;record_count&quot;:1,
      &quot;file_size_in_bytes&quot;:870,
      &quot;block_size_in_bytes&quot;:67108864,
      &quot;column_sizes&quot;:null,
      &quot;value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:1
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:1
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:1
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:1
            }
         ]
      },
      &quot;null_value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:0
            }
         ]
      },
      &quot;nan_value_counts&quot;:null,
      &quot;lower_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;ERROR&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Oh noes&quot;
            }
         ]
      },
      &quot;upper_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;ERROR&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Oh noes&quot;
            }
         ]
      },
      &quot;key_metadata&quot;:null,
      &quot;split_offsets&quot;:null
   }
}
{
   &quot;status&quot;:1,
   &quot;snapshot_id&quot;:{
      &quot;long&quot;:2720489016575682000
   },
   &quot;data_file&quot;:{
      &quot;file_path&quot;:&quot;s3a://iceberg/logging.db/events/data/event_time_day=2021-04-02/b012ec20-bbdd-47f5-89d3-57b9e32ea9eb.orc&quot;,
      &quot;file_format&quot;:&quot;ORC&quot;,
      &quot;partition&quot;:{
         &quot;event_time_day&quot;:{
            &quot;int&quot;:18719
         }
      },
      &quot;record_count&quot;:2,
      &quot;file_size_in_bytes&quot;:1084,
      &quot;block_size_in_bytes&quot;:67108864,
      &quot;column_sizes&quot;:null,
      &quot;value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:2
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:2
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:2
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:2
            }
         ]
      },
      &quot;null_value_counts&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:2,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:0
            },
            {
               &quot;key&quot;:4,
               &quot;value&quot;:0
            }
         ]
      },
      &quot;nan_value_counts&quot;:null,
      &quot;lower_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;ERROR&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Double oh noes&quot;
            }
         ]
      },
      &quot;upper_bounds&quot;:{
         &quot;array&quot;:[
            {
               &quot;key&quot;:1,
               &quot;value&quot;:&quot;WARN&quot;
            },
            {
               &quot;key&quot;:3,
               &quot;value&quot;:&quot;Maybeh oh noes?&quot;
            }
         ]
      },
      &quot;key_metadata&quot;:null,
      &quot;split_offsets&quot;:null
   }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now this is a very big output, but in summary, there’s really not too much to
these files. As before, there is a 
&lt;a href=&quot;https://iceberg.apache.org/spec/#manifests&quot;&gt;Manifest section in the Iceberg spec&lt;/a&gt;
that details what each of these fields means. Here are the important fields:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshot_id&lt;/code&gt; - Snapshot id where the file was added, or deleted if status is
two. Inherited when null.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file&lt;/code&gt; - Field containing metadata about the data files pertaining to the
manifest file, such as file path, partition tuple, metrics, etc…&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.file_path&lt;/code&gt; - Full URI for the file with FS scheme.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.partition&lt;/code&gt; - Partition data tuple, schema based on the partition
spec.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.record_count&lt;/code&gt; - Number of records in the data file.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.*_count&lt;/code&gt; - Multiple fields that contain a map from column id to 
number of values, null, nan counts in the file. These can be used to quickly 
filter out unnecessary get operations.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data_file.*_bounds&lt;/code&gt; - Multiple fields that contain a map from column id to
lower or upper bound in the column serialized as binary. Each value must be less
than or equal to all non-null, non-NaN values in the column for the file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each data file struct contains a partition and data file that it maps to. These
files only be scanned and returned if the criteria for the query is met when 
checking all of the count, bounds, and other statistics that are recorded in the
file. Ideally only files that contain data relevant to the query should be
scanned at all. Having information like the record count may also help in the
query planning process to determine splits and other information. This
particular optimization hasn’t been completed yet as planning typically happens
before traversal of the files. It is still in ongoing discussion and
&lt;a href=&quot;https://youtu.be/ifXpOn0NJWk?t=2132&quot;&gt;is discussed a bit by Iceberg creator Ryan Blue in a recent meetup&lt;/a&gt;.
If this is something you are interested in, keep posted on the Slack channel and
releases as the Trino Iceberg connector progresses in this area.&lt;/p&gt;

&lt;p&gt;As mentioned above, the last set of files that you find in the metadata
directory which are suffixed with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.metadata.json&lt;/code&gt;. These files at baseline are
a bit strange as they aren’t stored in the Avro format, but instead the JSON
format. This is because they are not part of the persistent tree structure.
These files are essentially a copy of the table metadata that is stored in the
metastore. You can find the fields for the table metadata listed
&lt;a href=&quot;https://iceberg.apache.org/spec/#table-metadata-fields&quot;&gt;in the Iceberg specification&lt;/a&gt;.
These tables are typically stored persistently in a metasture much like the Hive
metastore but could easily be replaced by any datastore that can support 
&lt;a href=&quot;https://iceberg.apache.org/spec/#metastore-tables&quot;&gt;an atomic swap (check-and-put) operation&lt;/a&gt;
required for Iceberg to support the optimistic concurrency operation.&lt;/p&gt;

&lt;p&gt;The naming of the table metadata includes a table version and UUID: 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;table-version&amp;gt;-&amp;lt;UUID&amp;gt;.metadata.json&lt;/code&gt;. To commit a new metadata version, which
just adds 1 to the current version number, the writer performs these steps:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;It creates a new table metadata file using the current metadata.&lt;/li&gt;
  &lt;li&gt;It writes the new table metadata to a file following the naming with the next
version number.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;It requests the metastore swap the table’s metadata pointer from the old
location to the new location.&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;If the swap succeeds, the commit succeeded. The new file is now the 
 current metadata.&lt;/li&gt;
      &lt;li&gt;If the swap fails, another writer has already created their own. The
 current writer goes back to step 1.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want to see where this is stored in the Hive metastore, you can reference
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TABLE_PARAMS&lt;/code&gt; table. At the time of writing, this is the only method of
using the metastore that is supported by the Trino Iceberg connector.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT PARAM_KEY, PARAM_VALUEFROM metastore.TABLE_PARAMS;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;PARAM_KEY                &lt;/th&gt;
      &lt;th&gt;PARAM_VALUE                                                                                     &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;EXTERNAL                 &lt;/td&gt;
      &lt;td&gt;TRUE                                                                                            &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;metadata_location        &lt;/td&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;numFiles                 &lt;/td&gt;
      &lt;td&gt;2                                                                                               &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;previous_metadata_location&lt;/td&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;table_type               &lt;/td&gt;
      &lt;td&gt;iceberg                                                                                         &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;totalSize                &lt;/td&gt;
      &lt;td&gt;5323                                                                                            &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;transient_lastDdlTime    &lt;/td&gt;
      &lt;td&gt;1622865672                                                                                      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;So as you can see, the metastore is saying the current metadata location is the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json&lt;/code&gt; file. Now you can
dive in to see the table metadata that is being used by the Iceberg connector.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;% cat ~/Desktop/avro_files/00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{
   &quot;format-version&quot;:1,
   &quot;table-uuid&quot;:&quot;32e3c271-84a9-4be5-9342-2148c878227a&quot;,
   &quot;location&quot;:&quot;s3a://iceberg/logging.db/events&quot;,
   &quot;last-updated-ms&quot;:1622865686323,
   &quot;last-column-id&quot;:5,
   &quot;schema&quot;:{
      &quot;type&quot;:&quot;struct&quot;,
      &quot;fields&quot;:[
         {
            &quot;id&quot;:1,
            &quot;name&quot;:&quot;level&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:&quot;string&quot;
         },
         {
            &quot;id&quot;:2,
            &quot;name&quot;:&quot;event_time&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:&quot;timestamp&quot;
         },
         {
            &quot;id&quot;:3,
            &quot;name&quot;:&quot;message&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:&quot;string&quot;
         },
         {
            &quot;id&quot;:4,
            &quot;name&quot;:&quot;call_stack&quot;,
            &quot;required&quot;:false,
            &quot;type&quot;:{
               &quot;type&quot;:&quot;list&quot;,
               &quot;element-id&quot;:5,
               &quot;element&quot;:&quot;string&quot;,
               &quot;element-required&quot;:false
            }
         }
      ]
   },
   &quot;partition-spec&quot;:[
      {
         &quot;name&quot;:&quot;event_time_day&quot;,
         &quot;transform&quot;:&quot;day&quot;,
         &quot;source-id&quot;:2,
         &quot;field-id&quot;:1000
      }
   ],
   &quot;default-spec-id&quot;:0,
   &quot;partition-specs&quot;:[
      {
         &quot;spec-id&quot;:0,
         &quot;fields&quot;:[
            {
               &quot;name&quot;:&quot;event_time_day&quot;,
               &quot;transform&quot;:&quot;day&quot;,
               &quot;source-id&quot;:2,
               &quot;field-id&quot;:1000
            }
         ]
      }
   ],
   &quot;default-sort-order-id&quot;:0,
   &quot;sort-orders&quot;:[
      {
         &quot;order-id&quot;:0,
         &quot;fields&quot;:[
            
         ]
      }
   ],
   &quot;properties&quot;:{
      &quot;write.format.default&quot;:&quot;ORC&quot;
   },
   &quot;current-snapshot-id&quot;:4564366177504223943,
   &quot;snapshots&quot;:[
      {
         &quot;snapshot-id&quot;:6967685587675910019,
         &quot;timestamp-ms&quot;:1622865672882,
         &quot;summary&quot;:{
            &quot;operation&quot;:&quot;append&quot;,
            &quot;changed-partition-count&quot;:&quot;0&quot;,
            &quot;total-records&quot;:&quot;0&quot;,
            &quot;total-data-files&quot;:&quot;0&quot;,
            &quot;total-delete-files&quot;:&quot;0&quot;,
            &quot;total-position-deletes&quot;:&quot;0&quot;,
            &quot;total-equality-deletes&quot;:&quot;0&quot;
         },
         &quot;manifest-list&quot;:&quot;s3a://iceberg/logging.db/events/metadata/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro&quot;
      },
      {
         &quot;snapshot-id&quot;:2720489016575682283,
         &quot;parent-snapshot-id&quot;:6967685587675910019,
         &quot;timestamp-ms&quot;:1622865680419,
         &quot;summary&quot;:{
            &quot;operation&quot;:&quot;append&quot;,
            &quot;added-data-files&quot;:&quot;2&quot;,
            &quot;added-records&quot;:&quot;3&quot;,
            &quot;added-files-size&quot;:&quot;1954&quot;,
            &quot;changed-partition-count&quot;:&quot;2&quot;,
            &quot;total-records&quot;:&quot;3&quot;,
            &quot;total-data-files&quot;:&quot;2&quot;,
            &quot;total-delete-files&quot;:&quot;0&quot;,
            &quot;total-position-deletes&quot;:&quot;0&quot;,
            &quot;total-equality-deletes&quot;:&quot;0&quot;
         },
         &quot;manifest-list&quot;:&quot;s3a://iceberg/logging.db/events/metadata/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro&quot;
      },
      {
         &quot;snapshot-id&quot;:4564366177504223943,
         &quot;parent-snapshot-id&quot;:2720489016575682283,
         &quot;timestamp-ms&quot;:1622865686278,
         &quot;summary&quot;:{
            &quot;operation&quot;:&quot;append&quot;,
            &quot;added-data-files&quot;:&quot;1&quot;,
            &quot;added-records&quot;:&quot;1&quot;,
            &quot;added-files-size&quot;:&quot;746&quot;,
            &quot;changed-partition-count&quot;:&quot;1&quot;,
            &quot;total-records&quot;:&quot;4&quot;,
            &quot;total-data-files&quot;:&quot;3&quot;,
            &quot;total-delete-files&quot;:&quot;0&quot;,
            &quot;total-position-deletes&quot;:&quot;0&quot;,
            &quot;total-equality-deletes&quot;:&quot;0&quot;
         },
         &quot;manifest-list&quot;:&quot;s3a://iceberg/logging.db/events/metadata/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro&quot;
      }
   ],
   &quot;snapshot-log&quot;:[
      {
         &quot;timestamp-ms&quot;:1622865672882,
         &quot;snapshot-id&quot;:6967685587675910019
      },
      {
         &quot;timestamp-ms&quot;:1622865680419,
         &quot;snapshot-id&quot;:2720489016575682283
      },
      {
         &quot;timestamp-ms&quot;:1622865686278,
         &quot;snapshot-id&quot;:4564366177504223943
      }
   ],
   &quot;metadata-log&quot;:[
      {
         &quot;timestamp-ms&quot;:1622865672894,
         &quot;metadata-file&quot;:&quot;s3a://iceberg/logging.db/events/metadata/00000-c5cfaab4-f82f-4351-b2a5-bd0e241f84bc.metadata.json&quot;
      },
      {
         &quot;timestamp-ms&quot;:1622865680524,
         &quot;metadata-file&quot;:&quot;s3a://iceberg/logging.db/events/metadata/00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json&quot;
      }
   ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, these JSON files can quickly grow as you perform different
updates on your table. This file contains a pointer to all of the snapshots and
manifest list files, much like the output you found from looking at the
snapshots in the table. A really important piece to note is the schema is stored
here. This is what Trino uses for validation on inserts and reads. As you may
expect, there is the root location of the table itself, as well as a unique
table identifier. The final part I’d like to note about this file is the
partition-spec and partition-specs fields. The partition-spec field holds the
current partition spec, while the partition-specs is an array that can hold a
list of all partition specs that have existed for this table. As pointed out
earlier, you can have many different manifest files that use different partition
specs. That wraps up all of the metadata file types you can expect to see in
Iceberg!&lt;/p&gt;

&lt;p&gt;This post wraps up the Trino on ice series. Hopefully these blog posts serve as
a helpful initial dialogue about what is expected to grow as a vital portion of
an open data lakehouse stack. What are you waiting for? Come join the fun and
help us implement some of the missing features or instead go ahead and try 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/iceberg/trino-iceberg-minio&quot;&gt;Trino on Ice(berg)&lt;/a&gt;
yourself!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals So far, this series has covered some very interesting user level concepts of the Iceberg model, and how you can take advantage of them using the Trino query engine. This blog post dives into some implementation details of Iceberg by dissecting some files that result from various operations carried out using Trino. To dissect you must use some surgical instrumentation, namely Trino, Avro tools, the MinIO client tool and Iceberg’s core library. It’s useful to dissect how these files work, not only to help understand how Iceberg works, but also to aid in troubleshooting issues, should you have any issues during ingestion or querying of your Iceberg table. I like to think of this type of debugging much like a fun game of operation, and you’re looking to see what causes the red errors to fly by on your screen.</summary>

      
      
    </entry>
  
    <entry>
      <title>23: Trino looking for patterns</title>
      <link href="https://trino.io/episodes/23.html" rel="alternate" type="text/html" title="23: Trino looking for patterns" />
      <published>2021-08-02T00:00:00+00:00</published>
      <updated>2021-08-02T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/23</id>
      <content type="html" xml:base="https://trino.io/episodes/23.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Kasia Findeisen, Software Engineer at &lt;a href=&quot;https://starburst.io/&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://github.com/kasiafi&quot;&gt;@kasiafi&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-360&quot;&gt;Release 360&lt;/h2&gt;

&lt;p&gt;In our last episode we already had a bit of a glimpse. Now the release is really out.&lt;/p&gt;

&lt;p&gt;Official announcement items from Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Automatic configuration of TLS for internal communication.&lt;/li&gt;
  &lt;li&gt;Improved correlated subqueries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Support for assuming an IAM role in Elasticsearch connector.&lt;/li&gt;
  &lt;li&gt;Support for Trino views in Iceberg connector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s additional notes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Documentation for materialized views SQL commands&lt;/li&gt;
  &lt;li&gt;Partial support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and batch insert support for various JDBC-based connectors&lt;/li&gt;
  &lt;li&gt;A bunch of performance and correctness fixes&lt;/li&gt;
  &lt;li&gt;Numerous improvements on Iceberg connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-360.html&quot;&gt;https://trino.io/docs/current/release/release-360.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-row-pattern-matching-and-match_recognize&quot;&gt;Concept of the week: Row pattern matching and MATCH_RECOGNIZE&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax was introduced in the latest SQL specification
of 2016. It is a super powerful tool for analyzing trends in your data. We are
proud to announce that Trino supports this great feature since
&lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;version 356&lt;/a&gt;. With
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you can define a pattern using the well-known regular
expression syntax, and match it to a set of rows. Upon finding a matching row
sequence, you can retrieve all kinds of detailed or summary information about
the match, and pass it on to be processed by the subsequent parts of your
query. This is a new level of what a pure SQL statement can do.&lt;/p&gt;

&lt;p&gt;For more details, &lt;a href=&quot;/blog/2021/05/19/row_pattern_matching.html&quot;&gt;this blog post&lt;/a&gt; 
gives you a taste of row pattern matching capabilities, and a quick overview of 
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax.&lt;/p&gt;

&lt;p&gt;Let’s look at an example with data similar to the TPCH data. Here is an example, 
and the same goal: detect a “V”-shape of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt;
values over time for different customers.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; WITH orders(customer_id, order_date, price) AS (VALUES
    (&apos;cust_1&apos;, DATE &apos;2020-05-11&apos;, 100),
    (&apos;cust_1&apos;, DATE &apos;2020-05-12&apos;, 200),
    (&apos;cust_2&apos;, DATE &apos;2020-05-13&apos;,   8),
    (&apos;cust_1&apos;, DATE &apos;2020-05-14&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-15&apos;,   4),
    (&apos;cust_1&apos;, DATE &apos;2020-05-16&apos;,  50),
    (&apos;cust_1&apos;, DATE &apos;2020-05-17&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-18&apos;,   6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
    FROM orders
        MATCH_RECOGNIZE (
            PARTITION BY customer_id
            ORDER BY order_date
            MEASURES
                START.price AS start_price,
                LAST(DOWN.price) AS bottom_price,
                LAST(UP.price) AS final_price,
                START.order_date AS start_date,
                LAST(UP.order_date) AS final_date
            ONE ROW PER MATCH
            AFTER MATCH SKIP PAST LAST ROW
            PATTERN (START DOWN+ UP+)
            DEFINE
                DOWN AS price &amp;lt; PREV(price),
                UP AS price &amp;gt; PREV(price)
            );

 customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
 cust_1      |         200 |           50 |         100 | 2020-05-12 | 2020-05-17
 cust_2      |           8 |            4 |           6 | 2020-05-13 | 2020-05-18
(2 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Two matches are detected, one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_1&lt;/code&gt;, and one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The matching algorithm was a collaboration between Martin and Kasia. This 
algorithm &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/operator/window/matcher/Matcher.java&quot;&gt;lives in the Matcher class&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;running semantics&lt;/em&gt; is the default both in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MESAURES&lt;/code&gt;
clauses. Note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; only applies to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;To sum up, here’s one complex measure expression combining different elements
of the special syntax:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/measure-example.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8348-document-row-pattern-recognition-in-window&quot;&gt;PR of the week: PR 8348 Document row pattern recognition in window&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8348&quot;&gt;PR of the week&lt;/a&gt;, is adding 
documentation for applying pattern matching over windows. This is yet another
SQL functionality that Kasia added after getting the patter recognition to work
with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-showing-match_recognize-functionality-by-example&quot;&gt;Demo: Showing MATCH_RECOGNIZE functionality by example&lt;/h2&gt;

&lt;p&gt;Here are a few examples that Kasia will be running:&lt;/p&gt;

&lt;p&gt;Demo preview:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The initial query. That’s mostly the same query that’s in the blog post, the 
differences being:
    &lt;ul&gt;
      &lt;li&gt;Usage of a real table instead of a CTE.&lt;/li&gt;
      &lt;li&gt;Additional sort key for consistent ordering&lt;/li&gt;
      &lt;li&gt;Two more measures&lt;/li&gt;
    &lt;/ul&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN+ UP+)
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The query returns many results (many matches). Wrap it in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;count()&lt;/code&gt; 
aggregation to check how many there are:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN+ UP+)
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       ))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Modify the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATTERN&lt;/code&gt; to limit the results. Now searching for a “big V”:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       ))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Unwrap from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;count()&lt;/code&gt; aggregation to see the actual matches:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP PAST LAST ROW&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP TO NEXT ROW&lt;/code&gt; to 
detect overlapping matches:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP TO NEXT ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ONE ROW PER MATCH&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt; (also, revert the previous 
change). Discuss the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classy&lt;/code&gt; column and explain the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;running&lt;/code&gt; semantics on the 
example of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;final_date&lt;/code&gt; column:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ALL ROWS PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change the semantics of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;final_date&lt;/code&gt; column to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt;:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           FINAL LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ALL ROWS PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice &amp;lt; PREV(totalprice),
                           UP AS totalprice &amp;gt; PREV(totalprice)
                       )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-you-tag-a-list-of-rows-with-custom-periodic-rules&quot;&gt;Question of the week: How do you tag a list of rows with custom periodic rules?&lt;/h2&gt;

&lt;p&gt;A StackOverflow user asked how to tag orders in a table that meet a certain 
criterion that relies on periodicity. There are certainly some complicated and
inefficient SQL queries that you could craft to address these issues. However,
now with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; it is possible to do this and take advantage of the
efficient matching capabilities that Martin and Kasia have added.&lt;/p&gt;

&lt;p&gt;Here is an example orders table represented as a csv table:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Create_time, Order_id, person_id, variable_a
&apos;2021-06-01&apos;, 1234, 2232, 1
&apos;2021-06-02&apos;, 1235, 2232, 0.6
&apos;2021-06-03&apos;, 1236, 2232, 0.33
&apos;2021-06-04&apos;, 1237, 2232, 0.7
&apos;2021-06-05&apos;, 1238, 2232, 0.6
&apos;2021-06-06&apos;, 1239, 2232, 0.4
&apos;2021-06-07&apos;, 1240, 2232, 0.8
&apos;2021-06-08&apos;, 1241, 2232, 0.7
&apos;2021-06-09&apos;, 1242, 2232, 0.4
&apos;2021-06-10&apos;, 1243, 2232, 0.6
&apos;2021-06-11&apos;, 1244, 2232, 0.7
&apos;2021-06-12&apos;, 1245, 2232, 0.6
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The grace period logic will produce the final_hit column as the result of this 
logic:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_hit&lt;/code&gt; column equals to 1 if the variable A less than equal to 0.5&lt;/li&gt;
  &lt;li&gt;There is a grace period totaling 4 Orders after the hit, so any hit that 
is within the grace period will be ignored. The resulting row can be called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;final_hit&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on this logic, this is the desired result of the example is:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Create_time, Order_id, person_id, variable_a, is_hit, final_hit
&apos;2021-06-01&apos;, 1234, 2232, 1, NULL, NULL
&apos;2021-06-02&apos;, 1235, 2232, 0.6, NULL, NULL
&apos;2021-06-03&apos;, 1236, 2232, 0.33, true, true
&apos;2021-06-04&apos;, 1237, 2232, 0.7, NULL, NULL
&apos;2021-06-05&apos;, 1238, 2232, 0.6, NULL, NULL
&apos;2021-06-06&apos;, 1239, 2232, 0.4, true, NULL
&apos;2021-06-07&apos;, 1240, 2232, 0.8, NULL, NULL
&apos;2021-06-08&apos;, 1241, 2232, 0.7, NULL, NULL
&apos;2021-06-09&apos;, 1242, 2232, 0.4, true, true
&apos;2021-06-10&apos;, 1243, 2232, 0.6, NULL, NULL
&apos;2021-06-11&apos;, 1244, 2232, 0.7, NULL, NULL
&apos;2021-06-12&apos;, 1245, 2232, 0.6, NULL, NULL
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To accomplish this with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you can do the following statement, 
which gives us the correct answer:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH data(Create_time, Order_id, person_id, variable_a) AS (
    VALUES
      (DATE &apos;2021-06-01&apos;, 1234, 2232, 1),
      (DATE &apos;2021-06-02&apos;, 1235, 2232, 0.6),
      (DATE &apos;2021-06-03&apos;, 1236, 2232, 0.33),
      (DATE &apos;2021-06-04&apos;, 1237, 2232, 0.7),
      (DATE &apos;2021-06-05&apos;, 1238, 2232, 0.6),
      (DATE &apos;2021-06-06&apos;, 1239, 2232, 0.4),
      (DATE &apos;2021-06-07&apos;, 1240, 2232, 0.8),
      (DATE &apos;2021-06-08&apos;, 1241, 2232, 0.7),
      (DATE &apos;2021-06-09&apos;, 1242, 2232, 0.4),
      (DATE &apos;2021-06-10&apos;, 1243, 2232, 0.6),
      (DATE &apos;2021-06-11&apos;, 1244, 2232, 0.7),
      (DATE &apos;2021-06-12&apos;, 1245, 2232, 0.6)
)
SELECT Create_time, Order_id, person_id, variable_a, if(variable_a &amp;lt;= 0.5, true, null) is_hit, final_hit
FROM data
   MATCH_RECOGNIZE (
     PARTITION BY person_id
     ORDER BY Create_time
     MEASURES if(classifier() = &apos;HIT&apos;, true, null) AS final_hit
     ALL ROWS PER MATCH WITH UNMATCHED ROWS
     AFTER MATCH SKIP PAST LAST ROW
     PATTERN (HIT G{,4})
     DEFINE /* G -- grace period */
            HIT AS HIT.variable_a &amp;lt;= 0.5
  )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Check out &lt;a href=&quot;https://stackoverflow.com/questions/68095763&quot;&gt;Martin and Kasia’s full answer to this question&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Kasia Findeisen, Software Engineer at Starburst (@kasiafi). Release 360</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec</title>
      <link href="https://trino.io/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html" rel="alternate" type="text/html" title="Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec" />
      <published>2021-07-30T00:00:00+00:00</published>
      <updated>2021-07-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/07/30/iceberg-concurrency-snapshots-spec</id>
      <content type="html" xml:base="https://trino.io/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the last two blog posts, we’ve covered a lot of cool feature improvements of
Iceberg over the Hive model. I recommend you take a look at those if you haven’t
yet. We introduced concepts and issues that table formats address. This blog 
closes up the overview of Iceberg features by discussing the concurrency model
Iceberg uses to ensure data integrity, how to use snapshots via Trino, and the
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;Iceberg Specification&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;concurrency-model&quot;&gt;Concurrency Model&lt;/h2&gt;

&lt;p&gt;Some issues with the Hive model are the distinct locations where the metadata is
stored and where the data files are stored. Having your data and metadata split
up like this is a recipe for disaster when trying to apply updates to both
services atomically.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-metadata.png&quot; alt=&quot;Iceberg metadata diagram of runtime, and file storage&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A very common problem with Hive is that if a writing process failed during
insertion, many times you would find the data written to file storage, but the
metastore writes failed to occur. Or conversely, the metastore writes were
successful, but the data failed to finish writing to file storage due to a 
network or file IO failure. There’s a good 
&lt;a href=&quot;https://trino.io/episodes/5.html&quot;&gt;Trino Community Broadcast episode&lt;/a&gt; that talks
about a function in Trino that exists to resolve these issues by syncing the
metastore and file storage. You can watch 
&lt;a href=&quot;https://www.youtube.com/watch?v=OXyJFZSsX5w&amp;amp;t=2097s&quot;&gt;a simulation of this error&lt;/a&gt;
on that episode.&lt;/p&gt;

&lt;p&gt;Aside from having issues due to the split state in the system, there are many 
other issues that stem from the file system itself. In the case of HDFS, 
depending on the specific filesystem implementation you are using, you may have
&lt;a href=&quot;https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Core_Expectations_of_a_Hadoop_Compatible_FileSystem&quot;&gt;different atomicity guarantees for various file systems and their operations&lt;/a&gt;,
such as creating, deleting, and renaming files and directories. HDFS isn’t the
only troublemaker here. Other than Amazon S3’s 
&lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3-now-delivers-strong-read-after-write-consistency-automatically-for-all-applications/&quot;&gt;recent announcement of strong consistency in their S3 service,&lt;/a&gt;
most object storage systems only offer &lt;em&gt;eventual&lt;/em&gt; consistency that may not show
the latest files immediately after writes. Despite storage systems showing more
progress towards offering better performance and guarantees, these systems still
offer no reliable locking mechanism.&lt;/p&gt;

&lt;p&gt;Iceberg addresses all of these issues in a multitude of ways. One of the primary
ways Iceberg introduces transactional guarantees is by storing the metadata in
the same datastore as the data itself. This simplifies handling commit failures
down to rolling back on one system rather than trying to coordinate a rollback
across two systems like in Hive. Writers independently write their metadata and
attempt to perform their operations, needing no coordination with other writers.
The only time the writers coordinate is when they attempt to perform a commit of
their operations. In order to do a commit, they perform a lock of the current
snapshot record in a database. This concurrency model where writers eagerly do
the work upfront is called &lt;strong&gt;&lt;em&gt;optimistic concurrency control&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Currently, in Trino, this method still uses the Hive metastore to perform the
lock-and-swap operation necessary to coordinate the final commits. Iceberg 
creator, &lt;a href=&quot;https://www.linkedin.com/in/rdblue/&quot;&gt;Ryan Blue&lt;/a&gt;, 
&lt;a href=&quot;https://youtu.be/-iIY2sOFBRc?t=1351&quot;&gt;covers this lock-and-swap mechanism&lt;/a&gt; and
how the metastore can be replaced with alternate locking methods. In the event
that &lt;a href=&quot;https://iceberg.apache.org/reliability/#concurrent-write-operations&quot;&gt;two writers attempt to commit at the same time&lt;/a&gt;,
the writer that first acquires the lock successfully commits by swapping its
snapshot as the current snapshot, while the second writer will retry to apply
its changes again. The second writer should have no problem with this, assuming
there are no conflicting changes between the two snapshots.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-on-ice/iceberg-files.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This works similarly to a git workflow where the main branch is the locked
resource, and two developers try to commit their changes at the same time. The
first developer’s changes may conflict with the second developer’s changes. The
second developer is then forced to rebase or merge the first developer’s code
with their changes before commiting to the main branch again. The same logic
applies to merging data files. Currently, Iceberg clients use a
&lt;a href=&quot;https://iceberg.apache.org/reliability/#concurrent-write-operations&quot;&gt;copy-on-write mechanism&lt;/a&gt;
that makes a new file out of the merged data in the next snapshot. This enables
accurate time traveling and preserves previous split versions of the files. At
the time of writing, upserts via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE INTO&lt;/code&gt; syntax are not supported in Trino,
but 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/7708&quot;&gt;this is in active development&lt;/a&gt;.
&lt;strong&gt;&lt;em&gt;UPDATE:&lt;/em&gt;&lt;/strong&gt; Since the original writing of this post, the 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/7933&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; syntax exists as of version 393&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One of the great benefits of tracking each individual change that gets written
to Iceberg is that you are given a view of the data at every point in time. This
enables a really cool feature that I mentioned earlier called &lt;strong&gt;&lt;em&gt;time travel&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;snapshots-and-time-travel&quot;&gt;Snapshots and Time Travel&lt;/h2&gt;

&lt;p&gt;To showcase snapshots, it’s best to go over a few examples drawing from the
event table we 
&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;created in the previous blog posts&lt;/a&gt;.
This time we’ll only be working with the Iceberg table, as this capability is
not available in Hive. Snapshots allow you to have an immutable set of your data
at a given time. They are automatically created on every append or removal of
data. One thing to note is that for now, they do not store the state of your
metadata.&lt;/p&gt;

&lt;p&gt;Say that you have created your events table and inserted the three initial rows
as we did previously. Let’s look at the data we get back and see how to check
the existing snapshots in Trino:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;To query the snapshots, all you need is to use the $ operator appended to the
end of the table name, and add the hidden table, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;snapshots&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT snapshot_id, parent_id, operation
FROM iceberg.logging.“events$snapshots”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;snapshot_id&lt;/th&gt;
      &lt;th&gt;parent_id&lt;/th&gt;
      &lt;th&gt;operation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2115743741823353537&lt;/td&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Let’s take a look at the manifest list files that are associated with each 
snapshot ID. You can tell which file belongs to which snapshot based on the 
snapshot ID embedded in the filename:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT manifest_list
FROM iceberg.logging.“events$snapshots”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;shapshots&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-7620328658793169607-1-cc857d89-1c07-4087-bdbc-2144a814dae2.avro&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;s3a://iceberg/logging.db/events/metadata/snap-2115743741823353537-1-4cb458be-7152-4e99-8db7-b2dda52c556c.avro&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Now, let’s insert another row to the table:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO iceberg.logging.events
VALUES
(
‘INFO’,
timestamp ‘2021-04-02 00:00:11.1122222’,
‘It is all good’,
ARRAY [‘Just updating you!’]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s check the snapshot table again:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT snapshot_id, parent_id, operation
FROM iceberg.logging.“events$snapshots”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;snapshot_id&lt;/th&gt;
      &lt;th&gt;parent_id&lt;/th&gt;
      &lt;th&gt;operation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2115743741823353537&lt;/td&gt;
      &lt;td&gt;7620328658793169607&lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;7030511368881343137&lt;/td&gt;
      &lt;td&gt;2115743741823353537&lt;/td&gt;
      &lt;td&gt;append&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Let’s also verify that our row was added:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;INFO&lt;/td&gt;
      &lt;td&gt;It is all good&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Since Iceberg is already tracking the list of files added and removed at each
snapshot, it would make sense that you can travel back and forth between these
different views into the system, right? This concept is called time traveling.
You need to specify which snapshot you would like to read from and you will see
the view of the data at that timestamp. In Trino, you need to use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt;
operator, followed by the snapshot you wish to read from:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.“events@2115743741823353537”;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;If you determine there is some issue with your data, you can always roll back to
the previous state permanently as well. In Trino we have a function called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rollback_to_snapshot&lt;/code&gt; to move the table state to another snapshot:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CALL system.rollback_to_snapshot(‘logging’, ‘events’, 2115743741823353537);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that we have rolled back, observe what happens when we query the events
table with:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Notice the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INFO&lt;/code&gt; row is still missing even though we query the table without
specifying a snapshot id. Now just because we rolled back, doesn’t mean we’ve
lost the snapshot we just rolled back from. In fact, we can roll forward, or as
I like to call it, 
&lt;a href=&quot;https://en.wikipedia.org/wiki/Back_to_the_Future&quot;&gt;back to the future&lt;/a&gt;! In
Trino, you use the same function call but with a predecessor of the existing
snapshot:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CALL system.rollback_to_snapshot(‘logging’, ‘events’, 7030511368881343137)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And now we should be able to query the table again and see the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INFO&lt;/code&gt; row 
return:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT level, message
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;INFO&lt;/td&gt;
      &lt;td&gt;It is all good&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;As expected, the INFO row returns when you roll back to the future.&lt;/p&gt;

&lt;p&gt;Having snapshots not only provides you with a level of immutability that is key
to the eventual consistency model, but gives you a rich set of features to
version and move between different versions of your data like a git repository.&lt;/p&gt;

&lt;h2 id=&quot;iceberg-specification&quot;&gt;Iceberg Specification&lt;/h2&gt;

&lt;p&gt;Perhaps saving the best for last, the benefit of using Iceberg is the community
that surrounds it, and the support you receive. It can be daunting to have to
choose a project that replaces something so core to your architecture. While
Hive has so many drawbacks, one of the things keeping many companies locked in
is the fear of the unknown. How do you know which table format to choose? Are
there unknown data corruption issues that I’m about to take on? What if this
doesn’t scale like it promises on the label? It is worth noting that 
&lt;a href=&quot;https://lakefs.io/hudi-iceberg-and-delta-lake-data-lake-table-formats-compared/&quot;&gt;alternative table formats are also emerging in this space&lt;/a&gt; 
and we encourage you to investigate these for your own use cases. When sitting
down with Iceberg creator, Ryan Blue, 
&lt;a href=&quot;https://www.twitch.tv/videos/989098630&quot;&gt;comparing Iceberg to other table formats&lt;/a&gt;, 
he claims the community’s greatest strength is their ability to look forward.
They intentionally broke compatibility with Hive to enable them to provide a
richer level of features. Unlike Hive, the Iceberg project explained their
thinking in a spec.&lt;/p&gt;

&lt;p&gt;The strongest argument I can see for Iceberg is that it has a 
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;specification&lt;/a&gt;. This is something that has
largely been missing from Hive and shows a real maturity in how the Iceberg
community has approached the issue. On the Trino project, we think standards are
important. We adhere to many of them ourselves, such as the ANSI SQL syntax, and
exposing the client through a JDBC connection. By creating a standard around
this, you’re no longer tied to any particular technology, not even Iceberg
itself. You are adhering to a standard that will hopefully become the de facto
standard over a decade or two, much like Hive did. Having the standard in clear
writing invites multiple communities to the table and brings even more use 
cases. Doing so improves the standards and therefore the technologies that
implement them.&lt;/p&gt;

&lt;p&gt;The previous three blog posts of this series covered the features and massive
benefits from using this novel table format. The following post will dive deeper
and discuss more about how Iceberg achieves some of this functionality, with an
overview into some of the internals and metadata layouts. In the meantime, feel
free to try 
&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/iceberg/trino-iceberg-minio&quot;&gt;Trino on Ice(berg)&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals In the last two blog posts, we’ve covered a lot of cool feature improvements of Iceberg over the Hive model. I recommend you take a look at those if you haven’t yet. We introduced concepts and issues that table formats address. This blog closes up the overview of Iceberg features by discussing the concurrency model Iceberg uses to ensure data integrity, how to use snapshots via Trino, and the Iceberg Specification.</summary>

      
      
    </entry>
  
    <entry>
      <title>22: TrinkedIn: LinkedIn gets a Trino promotion</title>
      <link href="https://trino.io/episodes/22.html" rel="alternate" type="text/html" title="22: TrinkedIn: LinkedIn gets a Trino promotion" />
      <published>2021-07-22T00:00:00+00:00</published>
      <updated>2021-07-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/22</id>
      <content type="html" xml:base="https://trino.io/episodes/22.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/22/cbb-linkedin.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun, landing the job!
&lt;/p&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Akshay Rai, Staff Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/akshayrai09/&quot;&gt;@akshayrai09&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Jithesh Rajan, Staff Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/jithesh-tr-a3185b20/&quot;&gt;@jithesh-tr-a3185b20&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Laura Chen, Staff Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/laura-yu-chen-3a75413/&quot;&gt;@laura-yu-chen-3a75413&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Pratham Desai, Software Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/pratham-desai/&quot;&gt;@pratham-desai&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Raju Nalli, Staff Site Reliability Engineer at &lt;a href=&quot;https://www.linkedin.com/&quot;&gt;LinkedIn&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/rajunalli/&quot;&gt;@rajunalli&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;upcoming-release-and-trino-summit&quot;&gt;Upcoming release and Trino Summit&lt;/h2&gt;

&lt;h3 id=&quot;sneak-peek-items-for-360&quot;&gt;Sneak peek items for 360&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Automatic cluster internal TLS&lt;/li&gt;
  &lt;li&gt;Views support in Iceberg connector&lt;/li&gt;
  &lt;li&gt;Documentation for materialized views SQL commands&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; and batch insert support for various JDBC-based connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;trino-summit-2021&quot;&gt;Trino Summit 2021&lt;/h3&gt;

&lt;p&gt;Get excited for this year’s &lt;a href=&quot;https://blog.starburst.io/announcing-trino-summit-2021&quot;&gt;Trino Summit&lt;/a&gt;
hosted by &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt;. 
&lt;a href=&quot;https://www.starburst.io/info/trino-summit-call-for-papers/&quot;&gt;Registration and call for papers&lt;/a&gt;
is now open!&lt;/p&gt;

&lt;h3 id=&quot;linkedin-is-hiring&quot;&gt;LinkedIn is hiring!&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/jobs/view/2402727250/?alternateChannel=search&amp;amp;refId=VRDXEQNgS2gxtpsJaHPXjQ%3D%3D&amp;amp;trackingId=0GzsJkrXWYt6qHWSUHTvCg%3D%3D&amp;amp;trk=d_flagship3_search_srp_jobs&quot;&gt;Software Engineer - Big Data Platform&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/jobs/view/2291645936/?eBP=CwEAAAF6y0tYtsROpAG7XxMEhLVgpq2rSMwpNv28Q_j06PdFsD_s11eFyh-sIv2rxm_Y8zN-p755Gts-ElMlR6XvK2hOMp3JMnxFPzOnZvvZnv_-oHaBslitgtWzsmJy7_f7BKljmgAUtfinG9WCp1Bpi574HZEBJwAsjzKx-89NUdnIBj_SBIPHES_G2RNqoKp5eZ4c0k7YaVJSuZJTyi2K6KoKJ7njT65FEOWvmS9S80ysbINbXjX_WSz71RNAugEpqIgE9-gB1MhW8tQ9z72jQhbjXMqSuUaYS43zFaP8ImXhjTrhbopTxyxTIN9yst6tvlcPo_T5RNAaf_0e8x_km2SGdw&amp;amp;recommendedFlavor=IN_NETWORK&amp;amp;refId=VRDXEQNgS2gxtpsJaHPXjQ%3D%3D&amp;amp;trackingId=5Qo2D07i3Wl%2FVhGeAvLtew%3D%3D&amp;amp;trk=flagship3_search_srp_jobs&quot;&gt;Senior Software Engineer - Big Data Platform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-at-linkedin&quot;&gt;Concept of the week: Trino at LinkedIn&lt;/h2&gt;

&lt;p&gt;The LinkedIn team covers the concept of the week in &lt;a href=&quot;https://www.youtube.com/watch?v=vlc84xB-Hfs&amp;amp;t=955s&quot;&gt;this section&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-digging-into-join-queries&quot;&gt;PR of the week: Digging into join queries&lt;/h2&gt;

&lt;p&gt;Today our PR of the week is from the future 🔮! 
&lt;a href=&quot;https://github.com/jitheshtr/trino/issues/1&quot;&gt;LinkedIn is currently investigating the issue&lt;/a&gt;.
This gives us a chance to talk about the research aspects that go into a PR.&lt;/p&gt;

&lt;p&gt;With a view &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;V&lt;/code&gt; that performs a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION ALL&lt;/code&gt; from an old table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O&lt;/code&gt; and a new 
migrated table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt;. For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datepartition&lt;/code&gt; values older than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt; (say 2021-06-05), 
table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O&lt;/code&gt; will be referred for data, while for date equal to or greater than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;,
data from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt; will be used.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/22/view-old-new-tables.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The query in question is:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * FROM V
WHERE x IN (SELECT x2 FROM Z)
AND cast(substring(datepartition,1,10) as date) &amp;gt;= date(&apos;2021-06-08&apos;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Z&lt;/code&gt; has stats available and only have 17 rows in them. While the 
data from view &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;V&lt;/code&gt; (which is entirely from underlying table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt; for this query) 
has say billions of rows.&lt;/p&gt;

&lt;p&gt;This query used to take about 39 seconds to run before our upgrade 
(PrestoSQL-333). After the upgrade (Trino-352) it increased to approximately 
thirty-five minutes.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-how-can-i-query-the-hive-views-from-trino&quot;&gt;Question of the week: How can I query the Hive views from Trino?&lt;/h2&gt;

&lt;p&gt;We actually covered the answer in &lt;a href=&quot;/episodes/18.html&quot;&gt;episode 18&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can use the &lt;a href=&quot;https://engineering.linkedin.com/blog/2020/coral&quot;&gt;Coral&lt;/a&gt; 
project that allows for translation between different SQL syntax. For example, 
it processes Hive QL statements and convert them to an internal representation using
&lt;a href=&quot;https://calcite.apache.org/&quot;&gt;Apache Calcite&lt;/a&gt;. It then converts the internal
representation to Trino SQL. See &lt;a href=&quot;/docs/current/connector/hive.html#hive-views&quot;&gt;the docs&lt;/a&gt;
for more details.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/coral.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows the creation of a Hive view, then shows the sequence of events 
when Trino reads that view.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/hive-view-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2020/coral&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2021/from-daily-dashboards-to-enterprise-grade-data-pipelines&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2018/11/using-translatable-portable-UDFs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;&quot;&gt;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;News&lt;/p&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun, landing the job!</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice II: In-place table evolution and cloud compatibility with Iceberg</title>
      <link href="https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html" rel="alternate" type="text/html" title="Trino on ice II: In-place table evolution and cloud compatibility with Iceberg" />
      <published>2021-07-12T00:00:00+00:00</published>
      <updated>2021-07-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg</id>
      <content type="html" xml:base="https://trino.io/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;The first post&lt;/a&gt; 
covered how Iceberg is a table format and not a file format It demonstrated the
benefits of hidden partitioning in Iceberg in contrast to exposed partitioning 
in Hive. There really is no such thing as “exposed partitioning.” I just thought
that sounded better than not-hidden partitioning. If any of that wasn’t clear, I
recommend either that you stop reading now, or go back to the first post before 
starting this one. This post discusses evolution. No, the post isn’t covering 
Darwinian nor Pokémon evolution, but in-place table evolution!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/evolution.gif&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;You may find it a little odd that I am getting excited over tables evolving 
in-place, but as mentioned in the last post, if you have experience performing 
table evolution in Hive, you’d be as happy as Ash Ketchum when Charmander 
evolved into Charmeleon discovering that Iceberg supports Partition evolution 
and schema evolution. That is, until Charmeleon started treating Ash like a jerk
after the evolution from Charmander. Hopefully, you won’t face the same issue 
when your tables evolve.&lt;/p&gt;

&lt;p&gt;Another important aspect that is covered, is how Iceberg is developed with cloud
storage in mind. Hive and other data lake technologies were developed with file
systems as their primary storage layer. This is still a very common layer today,
but as more companies move to include object storage, table formats did not 
adapt to the needs of object stores. Let’s dive in!&lt;/p&gt;

&lt;h2 id=&quot;partition-specification-evolution&quot;&gt;Partition Specification evolution&lt;/h2&gt;

&lt;p&gt;In Iceberg, you are able to update the partition specification, shortened to 
partition spec in Iceberg, on a live table. You do not need to perform a table 
migration as you do in Hive. In Hive, partition specs don’t explicitly exist 
because they are tightly coupled with the creation of the Hive table. Meaning, 
if you ever need to change the granularity of your data partitions at any point,
you need to create an entirely new table, and move all the data to the new 
partition granularity you desire. No pressure on choosing the right granularity
or anything!&lt;/p&gt;

&lt;p&gt;In Iceberg, you’re not required to choose the perfect partition specification 
upfront, and you can have multiple partition specs in the same table, and query
across the different sized partition specs. How great is that! This means, if 
you’re initially partitioning your data by month, and later you decide to move 
to a daily partitioning spec due to a growing ingest from all your new 
customers, you can do so with no migration, and query over the table with no 
issue.&lt;/p&gt;

&lt;p&gt;This is conveyed pretty succinctly in this graphic from the Iceberg 
documentation. At the end of the year 2008, partitioning occurs at a monthly 
granularity and after 2009, it moves to a daily granularity. When the query to 
pull data from December 14th, 2008 and January 13th, 2009, the entire month of 
December gets scanned due to the monthly partition, but for the dates in 
January, only the first 13 days are scanned to answer the query.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/partition-spec-evolution.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;At the time of writing, Trino is able to perform reads from tables that have 
multiple partition spec changes but partition evolution write support does not 
yet exist. &lt;a href=&quot;https://github.com/trinodb/trino/issues/7580&quot;&gt;There are efforts to add this support in the near future&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;schema-evolution&quot;&gt;Schema evolution&lt;/h2&gt;

&lt;p&gt;Iceberg also handles schema evolution much more elegantly than Hive. In Hive, 
adding columns worked well enough, as data inserted before the schema change 
just reports null for that column. For formats that use column names, like ORC 
and Parquet, deletes are also straightforward for Hive, as it simply ignores 
fields that are no longer part of the table. For unstructured files like CSV 
that use the position of the column, deletes would still cause issues, as 
deleting one column shifts the rest of the columns. Renames for schemas pose an 
issue for all formats in Hive as data written prior to the rename is not 
modified to the new field. This effectively works the same as if you deleted 
the old field and added a new column with the new name. This lack of support for
schema evolution across various file types in Hive requires a lot of memorizing
the formats underneath various tables. This is very susceptible to causing user
errors if someone executes one of the unsupported operations on the wrong table.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th colspan=&quot;4&quot;&gt;Hive 2.2.0 schema evolution based on file type and operation.&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;/td&gt;
    &lt;td&gt;Add&lt;/td&gt;
    &lt;td&gt;Delete&lt;/td&gt;
    &lt;td&gt;Rename&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;CSV/TSV&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;JSON&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;ORC/Parquet/Avro&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Currently in Iceberg, schemaless position-based data formats such as CSV and TSV
are not supported, though there are &lt;a href=&quot;https://github.com/apache/iceberg/issues/118&quot;&gt;some discussions on adding limited support 
for them&lt;/a&gt;. This would be good from
a reading standpoint, to load data from the CSV, into an Iceberg format with all
the guarantees that Iceberg offers.&lt;/p&gt;

&lt;p&gt;While JSON doesn’t rely on positional data, it does have an explicit dependency
on names. This means, that if I remove a text column from a JSON table named 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt;, then later I want to add a new int column called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt;, I 
encounter an error when I try to read in the data with the string type from 
before when I try to deserialize the JSON files. Even worse would be if the new
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt; column you add has the same type as the original but a semantically 
different meaning. This results in old rows containing values that are 
unknowingly from a different domain, which can lead to wrong analytics. After 
all, someone who adds the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt; column might not even be aware of the 
old &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt; column, if it was quite some time ago when it was dropped.&lt;/p&gt;

&lt;p&gt;ORC, Parquet, and Avro do not suffer from these issues as they are columnar 
formats that keep a schema internal to the file itself, and each format tracks 
changes to the columns through IDs rather than name values or position. Iceberg
uses these unique column IDs to also keep track of the columns as changes are 
applied.&lt;/p&gt;

&lt;p&gt;In general, Iceberg can only allow this small set of file formats due to the 
&lt;a href=&quot;https://iceberg.apache.org/evolution/#correctness&quot;&gt;correctness guarantees&lt;/a&gt; it 
provides. In Trino, you can add, delete, or rename columns using the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE&lt;/code&gt; command. Here’s an example that continues from the table created 
in the last post  that inserted three rows. The DDL statement looked like this.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE iceberg.logging.events (
  level VARCHAR,
  event_time TIMESTAMP(6), 
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  format = &apos;ORC&apos;,
  partitioning = ARRAY[&apos;day(event_time)&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE&lt;/code&gt; sequence that adds a new column named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;severity&lt;/code&gt;, 
inserts data including into the new column, renames the column, and prints the 
data.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ALTER TABLE iceberg.logging.events ADD COLUMN severity INTEGER; 

INSERT INTO iceberg.logging.events VALUES 
(
  &apos;INFO&apos;, 
  timestamp 
  &apos;2021-04-01 19:59:59.999999&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;es muy bueno&apos;, 
  ARRAY [&apos;It is all normal&apos;], 
  1
);

ALTER TABLE iceberg.logging.events RENAME COLUMN severity TO priority;

SELECT level, message, priority
FROM iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;priority&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Double oh noes&lt;/td&gt;
      &lt;td&gt;NULL&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;WARN&lt;/td&gt;
      &lt;td&gt;Maybeh oh noes?&lt;/td&gt;
      &lt;td&gt;NULL&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;NULL&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;INFO&lt;/td&gt;
      &lt;td&gt;es muy bueno&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ALTER TABLE iceberg.logging.events 
DROP COLUMN priority;

SHOW CREATE TABLE iceberg.logging.events;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE iceberg.logging.events (
   level varchar,
   event_time timestamp(6),
   message varchar,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;day(event_time)&apos;]
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Notice how the priority and severity columns are both not present in the schema.
As noted in the table above, Hive renames cause issues for all file formats. Yet
in Iceberg, performing all these operations causes no issues with the table and
underlying data.&lt;/p&gt;

&lt;h2 id=&quot;cloud-storage-compatibility&quot;&gt;Cloud storage compatibility&lt;/h2&gt;

&lt;p&gt;Not all developers consider or are aware of the performance implications of 
using Hive over a cloud object storage solution like S3 or Azure Blob storage. 
One thing to remember is that Hive was developed with the Hadoop Distributed 
File System (HDFS) in mind. HDFS is a filesystem and is particularly well suited
to handle listing files on the filesystem, because they were stored in a 
contiguous manner. When Hive stores data associated with a table, it assumes 
there is a contiguous layout underneath it and performs list operations that are
expensive on cloud storage systems.&lt;/p&gt;

&lt;p&gt;The common cloud storage systems are typically object stores that do not lay out
the files in a contiguous manner based on paths. Therefore, it becomes very 
expensive to list out all the files in a particular path. Yet, these list 
operations are executed for every partition that could be included in a query, 
regardless of only a single row, in a single file out of thousands of files 
needing to be retrieved to answer the query. Even ignoring the performance costs
for a minute, object stores may also pose issues for Hive due to eventual 
consistency. Inserting and deleting can cause inconsistent results for readers, 
if the files you end up reading are out of date.&lt;/p&gt;

&lt;p&gt;Iceberg avoids all of these issues by tracking the data at the file level, 
rather than the partition level. By tracking the files, Iceberg only accesses 
the files containing data relevant to the query, as opposed to accessing files 
in the same partition looking for the few files that are relevant to the query. 
Further, this allows Iceberg to control for the inconsistency issue in 
cloud-based file systems by using a locking mechanism at the file level. See the
file layout below that Hive layout versus the Iceberg layout. As you can see in 
the next image, Iceberg makes no assumptions about the data being contiguous or 
not. It simply builds a persistent tree using the snapshot (S) location stored 
in the metadata, that points to the manifest list (ML), which points to 
manifests containing partitions (P). Finally, these manifest files contain the 
file (F) locations and stats that can quickly be used to prune data versus 
needing to do a list operation and scanning all the files.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/cloud-file-layout.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Referencing the picture above, if you were to run a query where the result set 
only contains rows from file F1, Hive would require a list operation and 
scanning the files, F2 and F3. In Iceberg, file metadata exists in the manifest 
file, P1, that would have a range on the predicate field that prunes out files 
F2 and F3, and only scans file F1. This example only shows a couple of files, 
but imagine storage that scales up to thousands of files! Listing becomes 
expensive on files that are not contiguously stored in memory. Having this 
flexibility in the logical layout is essential to increase query performance. 
This is especially true on cloud object stores.&lt;/p&gt;

&lt;p&gt;If you want to play around with Iceberg using Trino, check out the 
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html&quot;&gt;Trino Iceberg docs&lt;/a&gt;. 
To avoid issues like the eventual consistency issue, as well as other problems 
of trying to sync operations across systems, Iceberg provides optimistic 
concurrency support, which is covered in more detail in
&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;the next post&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals The first post covered how Iceberg is a table format and not a file format It demonstrated the benefits of hidden partitioning in Iceberg in contrast to exposed partitioning in Hive. There really is no such thing as “exposed partitioning.” I just thought that sounded better than not-hidden partitioning. If any of that wasn’t clear, I recommend either that you stop reading now, or go back to the first post before starting this one. This post discusses evolution. No, the post isn’t covering Darwinian nor Pokémon evolution, but in-place table evolution!</summary>

      
      
    </entry>
  
    <entry>
      <title>21: Trino + dbt = a match made in SQL heaven?</title>
      <link href="https://trino.io/episodes/21.html" rel="alternate" type="text/html" title="21: Trino + dbt = a match made in SQL heaven?" />
      <published>2021-07-08T00:00:00+00:00</published>
      <updated>2021-07-08T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/21</id>
      <content type="html" xml:base="https://trino.io/episodes/21.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Amy Chen, Partner Solutions Architect at &lt;a href=&quot;https://www.getdbt.com/&quot;&gt;dbt Labs (formerly Fishtown Analytics)&lt;/a&gt;
 (&lt;a href=&quot;https://www.linkedin.com/in/yuanamychen/&quot;&gt;@yuanamychen&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Victor Coustenoble, Solutions Architect at &lt;a href=&quot;https://www.starburst.io/&quot;&gt;Starburst&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/victorcouste&quot;&gt;@victorcouste&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-359&quot;&gt;Release 359&lt;/h2&gt;

&lt;p&gt;Martin:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Row pattern recognition for window functions&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp(n)&lt;/code&gt; with precision higher than 3 in MySQL&lt;/li&gt;
  &lt;li&gt;ARM64-compatible docker image&lt;/li&gt;
  &lt;li&gt;Support for granting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; privilege&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt; is a feature from our guest Marius from last time!&lt;/li&gt;
  &lt;li&gt;ARM64 compatible docker image as well as already existing tar.gz and rpm means usage of Graviton and other ARM64 processors is now available also for Kubernetes users, there are significant cost/performance benefits, try it out&lt;/li&gt;
  &lt;li&gt;wow .. this time it took a whole month from 358 to 359&lt;/li&gt;
  &lt;li&gt;breaking change - need Java 11.0.11&lt;/li&gt;
  &lt;li&gt;more materialized view stuff, and I am working on docs!&lt;/li&gt;
  &lt;li&gt;Fix handling of multiple LDAP user bind patterns - for those of us in larger orgs..&lt;/li&gt;
  &lt;li&gt;network logging in CLI&lt;/li&gt;
  &lt;li&gt;rename &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;connector.name&lt;/code&gt; from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive-hadoop2&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-359.html&quot;&gt;https://trino.io/docs/current/release/release-359.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-can-dbt-connect-to-different-databases-in-the-same-project&quot;&gt;Question of the week: Can dbt connect to different databases in the same project?&lt;/h2&gt;

&lt;p&gt;This week we are going a little out of order from our usual sequence on this
show. The question really gets to the heart of the concept of the week. We’ll 
cover this first then jump into the concept.&lt;/p&gt;

&lt;p&gt;This question was asked on &lt;a href=&quot;https://stackoverflow.com/questions/63002171&quot;&gt;StackOverflow&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;It seems dbt only works for a single database. If my data is in a different 
database, will that still work? For example, if my datalake is using delta, 
but I want to run dbt using Redshift, would dbt still work for this case?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Our guest Victor replied:&lt;/p&gt;

&lt;p&gt;You can use Trino with dbt to connect to multiple databases in the same project.&lt;/p&gt;

&lt;p&gt;The GitHub example project &lt;a href=&quot;https://github.com/victorcouste/trino-dbt-demo&quot;&gt;https://github.com/victorcouste/trino-dbt-demo&lt;/a&gt; 
contains a fully working setup, that you can replicate and adapt to your needs.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week&quot;&gt;Concept of the week:&lt;/h2&gt;

&lt;h3 id=&quot;what-is-dbt&quot;&gt;What is dbt?&lt;/h3&gt;

&lt;p&gt;dbt is a transformation workflow tool that lets teams quickly and collaboratively 
deploy analytics code, following software engineering best practices like 
modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL 
to build production-grade data pipelines.&lt;/p&gt;

&lt;p&gt;When referring to dbt, it can mean two slightly different things. dbt core is 
the open source framework that provides the SQL compiler and framework to manage
your SQL workflow. You can interact with it via a command line interface. In 
addition, dbtlabs offers the fully managed SaaS product dbt Cloud. You can use 
it to handle all of your dbt projects from development to deployment in a single 
browser based tool. It provides useful features like a full IDE to develop and 
test code, orchestration, logging, and alerting. At the moment, dbt Cloud is not
available for Trino users.&lt;/p&gt;

&lt;p&gt;The framework allows you to check the quality of results, document the lineage, 
manage the changes/versions in the SQL scripts and orchestrate the queries, like
a CI/CD framework but for your data. dbt is not an extract and load tool. The 
focus is on transforming what is already in your data warehouse/data lake.&lt;/p&gt;

&lt;p&gt;Check out these links to learn more:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.getdbt.com/&quot;&gt;https://www.getdbt.com/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.getdbt.com/docs/introduction&quot;&gt;https://docs.getdbt.com/docs/introduction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;goals-of-dbt-and-how-that-differs-from-trino&quot;&gt;Goals of dbt and how that differs from Trino&lt;/h3&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/21/dbt-trino-architecture.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Trino is the execution SQL engine and dbt is the framework to manage your SQL 
statements. dbt won’t execute the SQL itself, rather it pushes all of the 
compute down to the SQL engine. This SQL engine can be Trino, or an engine 
included in the data source like the database itself. Using Trino as the SQL 
execution engine allows you to use the same SQL dialect for all connected data 
sources. This includes data sources that natively do not support SQL like object
storage systems, Kafka, Elasticsearch, and many others.&lt;/p&gt;

&lt;h3 id=&quot;transformation-vs-ad-hoc-joins&quot;&gt;Transformation vs ad-hoc joins&lt;/h3&gt;

&lt;p&gt;Transformations done by dbt are in general used to clean and prepare data for 
analytics purposes. It’s often used to go from the raw data to a ready-to-use 
data for reporting and analysis. dbt creates database objects like tables or 
views to be consumed by business users and analytics tools.&lt;/p&gt;

&lt;p&gt;On the other hand, even if Trino can also execute SQL to create tables and 
views, these SQL queries are not managed but just executed. Trino doesn’t have,
like dbt, all the framework to version, audit, document and orchestrate SQL 
script and execution. Trino is more used to execute SQL SELECT 
statements generated by users or BI tools to analyze data in an interactive way.&lt;/p&gt;

&lt;h3 id=&quot;cases-for-why-you-need-both&quot;&gt;Cases for why you need both&lt;/h3&gt;

&lt;p&gt;Trino and dbt are complementary when you need to access different sources from
a single SQL query or when you need to run SQL query with good performance on
object storage systems like S3, GCS, ADLS, or HDFS.&lt;/p&gt;

&lt;p&gt;It’s where Trino can complement dbt, as dbt can only access a single data 
warehouse connection in a SQL query. In dbt there is no way to query multiple 
storage systems at the same time.&lt;/p&gt;

&lt;p&gt;Trino is recognized for great performance with object storage/data lake 
processing. With dbt it can transform and prepare data at scale. Trino also 
allows you to run dbt on a traditional, on-premise data warehouse where 
normally dbt only runs on a modern cloud data warehouse like Snowflake, 
BigQuery, or Redshift.&lt;/p&gt;

&lt;h3 id=&quot;dbt-basics&quot;&gt;dbt basics&lt;/h3&gt;

&lt;p&gt;dbtlabs offers a &lt;a href=&quot;https://docs.getdbt.com/tutorial/setting-up&quot;&gt;good tutorial&lt;/a&gt;
which covers the fundamental topics of dbt for you to learn:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Project: A directory of SQL and YAML files defined with a single project file.&lt;/li&gt;
  &lt;li&gt;Models: A model is a single SQL file where you define your transformations to create a table or a view.&lt;/li&gt;
  &lt;li&gt;Profile: To define connections to your data sources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you have other resources like seeds, macros, tests, sources, snapshots.&lt;/p&gt;

&lt;h2 id=&quot;demo-querying-trino-from-a-dbt-project&quot;&gt;Demo: Querying Trino from a dbt project&lt;/h2&gt;

&lt;p&gt;Victor shows us a demo from 
&lt;a href=&quot;https://medium.com/geekculture/trino-dbt-a-match-in-sql-heaven-1df2a3d12b5e&quot;&gt;his blog post that inspired this episode&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you looked at the code, you  may have noticed that the code used an adapter 
called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;db-presto-trino&lt;/code&gt;. This adapter derives from the outdated presto naming and is
still there for interaction with legacy Presto clusters. Although it can work
it uses an outdated python client to interact with Trino and there is an open
&lt;a href=&quot;https://github.com/dbt-labs/dbt-presto/issues/39&quot;&gt;issue to create an official &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dbt-trino&lt;/code&gt; adapter&lt;/a&gt; 
that uses the updated &lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;trino-python-client&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to help with this, reach out on the issue itself and join the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#db-presto-trino&lt;/code&gt; channel on the dbt Slack. 
&lt;a href=&quot;https://community.getdbt.com/&quot;&gt;https://community.getdbt.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the show &lt;a href=&quot;https://twitter.com/findinpath&quot;&gt;Marius Grama&lt;/a&gt;, started &lt;a href=&quot;https://github.com/findinpath/dbt-trino&quot;&gt;work on
dbt-trino in his own repository&lt;/a&gt;.
Thanks for the quick turnaround Marius!&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8283-externalised-destination-table-cache-expiry-duration-for-bigquery-connector&quot;&gt;PR of the week: PR 8283 Externalised destination table cache expiry duration for BigQuery Connector&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8283&quot;&gt;PR of the week&lt;/a&gt;, was committed 
by Ayush Bilala(&lt;a href=&quot;https://twitter.com/ayushbilala&quot;&gt;Twitter&lt;/a&gt;), (&lt;a href=&quot;https://www.linkedin.com/in/ayush-bilala/&quot;&gt;LinkedIn&lt;/a&gt;), a Staff Software Engineer at
Walmart Global Tech.&lt;/p&gt;

&lt;p&gt;This fixes &lt;a href=&quot;https://github.com/trinodb/trino/issues/8236&quot;&gt;issue 8263&lt;/a&gt; by adding
a new configuration for the Big Query connector, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigquery.views-cache-ttl&lt;/code&gt; 
to allow configuring the cache expiration for BigQuery views.&lt;/p&gt;

&lt;p&gt;Thanks Ayush!&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;News&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The “frog” book has been &lt;a href=&quot;https://item.jd.com/10028492426649.html&quot;&gt;translated to Chinese&lt;/a&gt;!
 Keep your eyes peeled for the rebrand into Trino for the translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests</summary>

      
      
    </entry>
  
    <entry>
      <title>20: Trino for the Trinewbie</title>
      <link href="https://trino.io/episodes/20.html" rel="alternate" type="text/html" title="20: Trino for the Trinewbie" />
      <published>2021-06-23T00:00:00+00:00</published>
      <updated>2021-06-23T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/20</id>
      <content type="html" xml:base="https://trino.io/episodes/20.html">&lt;script async=&quot;&quot; defer=&quot;&quot; src=&quot;https://buttons.github.io/buttons.js&quot;&gt;&lt;/script&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Marius Grama, Data Engineer at &lt;a href=&quot;https://www.willhaben.at/&quot;&gt;willhaben internet service GmbH &amp;amp; Co KG&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/findinpath&quot;&gt;@findinpath&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-for-the-trinewbie&quot;&gt;Concept of the week: Trino for the Trinewbie&lt;/h2&gt;

&lt;p&gt;One of the best and easiest ways to get an understanding about Trino, and how to
use it is the book Trino: Definitive Guide. The next three sections have a few 
excerpts from the book that does an incredible job at introducing the space 
Trino is in. If you would like to read the book in its entirety, Starburst 
offers &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the digital copy for free&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;the-problems-with-big-data&quot;&gt;The Problems with Big Data&lt;/h3&gt;

&lt;p&gt;Everybody is capturing more and more data from device metrics, user behavior
tracking, business transactions, location data, software and system testing 
procedures and workflows, and much more. The insights gained from understanding
that data and working with it can make or break the success of any initiative,
or even a company.&lt;/p&gt;

&lt;p&gt;At the same time, the diversity of storage mechanisms available for data has 
exploded: relational databases, NoSQL databases, document databases, key-value 
stores, object storage systems, and so on. Many of them are necessary in today’s
organizations, and it is no longer possible to use just one of them.&lt;/p&gt;

&lt;h3 id=&quot;what-is-trino&quot;&gt;What is Trino?&lt;/h3&gt;

&lt;p&gt;Trino is not a database with storage, rather, it simply queries data where it 
lives. When using Trino, storage and compute are decoupled and can be scaled 
independently. Trino represents the compute layer, whereas the underlying data 
sources represent the storage layer.&lt;/p&gt;

&lt;p&gt;This allows Trino to scale up and down its compute resources for query 
processing, based on analytics demand to access this data. There is no need to 
move your data, and provision compute and storage to the exact needs of the 
current queries, or change that regularly, based on your changing query needs.&lt;/p&gt;

&lt;p&gt;Trino can scale the query power by scaling the compute cluster dynamically, and 
the data can be queried right where it lives in the data source. This 
characteristic allows you to greatly optimize your hardware resource needs and 
therefore reduce cost.&lt;/p&gt;

&lt;h3 id=&quot;sql-on-anything&quot;&gt;SQL-on-Anything&lt;/h3&gt;

&lt;p&gt;Trino was initially designed to query data from HDFS. And it can do that very 
efficiently, as you learn later. But that is not where it ends. On the contrary,
Trino is a query engine that can query data from object storage, relational
database management systems (RDBMSs), NoSQL databases, and other systems.&lt;/p&gt;

&lt;p&gt;Trino queries data where it lives and does not require a migration of data to a 
single location. So Trino allows you to query data in HDFS and other distributed
object storage systems. It allows you to query RDBMSs and other data sources. As
such, it can really query data wherever it lives and therefore be a replacement
to the traditional, expensive, and heavy extract, transform, and load (ETL) 
processes. Or at a minimum, it can help you with them and lighten the load. So 
Trino is clearly not just another SQL-on-Hadoop solution.&lt;/p&gt;

&lt;p&gt;Object storage systems include Amazon Web Services (AWS) Simple Storage Service
(S3), Microsoft Azure Blob Storage, Google Cloud Storage, and S3-compatible 
storage such as MinIO and Ceph. Trino can query traditional RDBMSs such as 
Microsoft SQL Server, PostgreSQL, MySQL, Oracle, Teradata, and Amazon Redshift. 
Trino can also query NoSQL systems such as Apache Cassandra, Apache Kafka, 
MongoDB, or Elasticsearch. Trino can query virtually anything and is truly a 
SQL-on-Anything system.&lt;/p&gt;

&lt;p&gt;For users, this means that suddenly they no longer have to rely on specific 
query languages or tools to interact with the data in those specific systems.
They can simply leverage Trino and their existing SQL skills and their 
well-understood analytics, dashboarding, and reporting tools. These tools, 
built on top of using SQL, allow analysis of those additional data sets, which 
are otherwise locked in separate systems. Users can even use Trino to query 
across different systems with the SQL they know.&lt;/p&gt;

&lt;h3 id=&quot;contributing-to-trino&quot;&gt;Contributing to Trino&lt;/h3&gt;

&lt;p&gt;In this episode, Marius Grama discusses his journey with Trino. From joining the
community, his first impressions and experiences, and what led him to make 
sixteen commits over the last three months. We also ask him where he thinks we 
could improve to make the onboarding experience better.&lt;/p&gt;

&lt;p&gt;In the Trino project there are four &lt;a href=&quot;/development/roles.html&quot;&gt;roles&lt;/a&gt;.
You can immediately become a participant or reviewer. To be a contributor, you
need to follow some steps that are covered later in the episode. Likewise, for
maintainers, there is a path to becoming a maintainer that is discussed in 
detail on the roles page.&lt;/p&gt;

&lt;h4 id=&quot;participants&quot;&gt;Participants&lt;/h4&gt;

&lt;blockquote&gt;
  &lt;p&gt;Participants are those who show up and join in discussions about the project. 
Users, developers, and administrators can all be participants, as can 
literally anyone who has the time, energy, and passion to become involved. 
Participants suggest improvements and new features. They report bugs, 
regressions, performance issues, and so on. They work to make Trino better for
everyone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;contributors&quot;&gt;Contributors&lt;/h4&gt;

&lt;p&gt;Today’s episode covers the process that a contributor goes through to make a
code change, but simply put:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A contributor submits code changes to Trino.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;reviewers&quot;&gt;Reviewers&lt;/h4&gt;

&lt;blockquote&gt;
  &lt;p&gt;A reviewer reads a proposed change to Trino, and assesses how well the change 
aligns with the Trino vision and guidelines. This includes everything from 
high level project vision to low level code style. Everyone is invited and 
encouraged to review others’ contributions – you don’t need to be a maintainer
for that.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;maintainers&quot;&gt;Maintainers&lt;/h4&gt;

&lt;blockquote&gt;
  &lt;p&gt;A maintainer is responsible for checking in code only after ensuring it has 
been reviewed thoroughly and aligns with the Trino vision and guidelines. In 
addition to merging code, a maintainer actively participates in discussions 
and reviews. Being a maintainer does not grant additional rights in the 
project to make changes, set direction, or anything else that does not align 
with the direction of the project. Instead, a maintainer is expected to bring
these to the project participants as needed to gain consensus. The maintainer
role is for an individual, so if a maintainer changes employers, the role is 
retained. However, if a maintainer is no longer actively involved in the 
project, their maintainer status will be reviewed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is &lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/BecomingACommitter&quot;&gt;a writeup on the Apache Hive process to become a committer.&lt;/a&gt;
For context, a committer is equivalent to a maintainer in Trino. This writeup
aligns precisely with the Trino philosophy. Here are a few good quotes from that
article:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Contributors often ask Hive PMC members the question, “What do I need to do in
order to become a committer?” The simple (though frustrating) answer to this 
question is, “If you want to become a committer, behave like a committer.” If 
you follow this advice, then rest assured that the PMC will notice, and 
committership will seek you out rather than the other way around.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;It should go without saying, but here it is anyway: your participation in the 
project should be a natural part of your work with Hive; if you find yourself 
undertaking tasks “so that you can become a committer”, then you’re doing it 
wrong, young padawan. This is particularly true if your motivations for 
wanting to become a committer are primarily negative or self-centered&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-8135-set-default-time-zone-for-the-current-session&quot;&gt;PR of the week: PR 8135 Set default time zone for the current session&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/trinodb/trino/pull/8135&quot;&gt;PR of the week&lt;/a&gt;, was committed 
by today’s guest, &lt;a href=&quot;https://twitter.com/findinpath&quot;&gt;Marius Grama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This fixes &lt;a href=&quot;https://github.com/trinodb/trino/issues/8112&quot;&gt;issue 8112&lt;/a&gt; by adding
support for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SET TIME ZONE&lt;/code&gt; statement. The time zone specified is being 
stored as a session property and has a lower precedence than 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sql.forced-session-time-zone&lt;/code&gt; setting.&lt;/p&gt;

&lt;p&gt;Thanks Marius!&lt;/p&gt;

&lt;h2 id=&quot;demo-contributing-to-trino&quot;&gt;Demo: Contributing to Trino&lt;/h2&gt;

&lt;p&gt;Here is the video that goes into detail on the steps below on how to contribute
code to Trino!&lt;/p&gt;

&lt;div class=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;702&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/gAqYkR2oGgM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Download an IDE.&lt;/p&gt;

    &lt;p&gt;First, you need to have an integrated development environment (IDE) to run 
 the code. We recommend &lt;a href=&quot;https://www.jetbrains.com/idea/download/&quot;&gt;Intellij Community Edition&lt;/a&gt;
 as it is the standard that is used by developers across the project. Of 
 course, you may use any IDE you like, but there may be issues that others 
 may not be able to help with as readily.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Install Git.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://git-scm.com/&quot;&gt;Git&lt;/a&gt; is a distributed version source control software 
 used to collaborate code with other users. You must 
 &lt;a href=&quot;https://git-scm.com/book/en/v2/Getting-Started-Installing-Git&quot;&gt;install git&lt;/a&gt;
 in order to contribute to the project.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Install Docker.&lt;/p&gt;

    &lt;p&gt;The Trino testing framework runs Trino and other databases it connects to on
 Docker, a tool that runs different services in isolation using containers.&lt;br /&gt;
 Go ahead and &lt;a href=&quot;https://docs.docker.com/engine/install/&quot;&gt;install Docker&lt;/a&gt; on 
 your system.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Create and configure your GitHub account.&lt;/p&gt;

    &lt;p&gt;GitHub is a free hosted Git repository, and a central point of collaboration
 for the Trino project. If you haven’t done so, please 
 &lt;a href=&quot;https://git-scm.com/book/en/v2/GitHub-Account-Setup-and-Configuration&quot;&gt;create and configure your GitHub account&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Make a fork of the Trino repository on GitHub&lt;/p&gt;

    &lt;p&gt;Navigate to &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the Trino repository&lt;/a&gt; and 
 click the “fork” button. Or you can just click it here: &lt;a class=&quot;github-button&quot; href=&quot;https://github.com/trino/trinodb/fork&quot; data-icon=&quot;octicon-repo-forked&quot; data-size=&quot;large&quot;&gt;Fork&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;You want to create a fork so that you can save your work without needing the
 special privileges it takes to commit code back to the Trino repository. 
 This way, you can upload (also called a “push” in Git) your code to your 
 fork and later open a pull request into the main Trino repository.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Clone your fork of the Trino repository to your computer and import into Intellij.&lt;/p&gt;

    &lt;p&gt;Execute the following clone command in your terminal:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; git clone git@github.com:&amp;lt;your_username&amp;gt;/trino.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;Open the &lt;a href=&quot;https://www.jetbrains.com/help/idea/maven-support.html#maven_import_project_start&quot;&gt;Trino project in Intellij&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Add the Airlift code style checks to Intellij.&lt;/p&gt;

    &lt;p&gt;There are many unspoken rules to code style and formatting in any project. 
Trino is no exception. To make life simpler on the contributor and reviewer, 
the &lt;a href=&quot;https://raw.githubusercontent.com/airlift/codestyle/master/IntelliJIdea2019/Airlift.xml&quot;&gt;Trino code style definition&lt;/a&gt; 
that &lt;a href=&quot;https://www.jetbrains.com/help/idea/copying-code-style-settings.html&quot;&gt;you can import into Intellij&lt;/a&gt; 
to have the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Reformat Code&lt;/code&gt; action to format in the desired style of the project.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Build the project.&lt;/p&gt;

    &lt;p&gt;One of the greatest resources in trino history is &lt;a href=&quot;https://gist.github.com/findepi/04c96f0f60dcc95329f569bb0c44a0cd&quot;&gt;this cheat sheet&lt;/a&gt;
created by &lt;a href=&quot;https://twitter.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt;. I use it for some
of the commands, but the most important use, is the “fast” build command he
adds on the top. In your terminal, make sure you are located in the root 
directory of the Trino project, and run the following command.&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./mvnw -pl &apos;!:trino-server-rpm,!:trino-docs,!:trino-proxy,!:trino-verifier,!:trino-benchto-benchmarks&apos; clean install \
-TC2 -nsu \
-DskipTests \
-Dmaven.javadoc.skip=true \
-Dmaven.source.skip=true \
-Dair.check.skip-all=true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;This builds all necessary modules of the project to run almost everything
in Trino. The build excludes some modules, runs the compiler on multiple 
threads, skips the tests, javadocs, and the Airlift code style checks. If you
would like to run code style check on a specific module (e.g. 
trino-elasticsearch) then you can run the following command.&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./mvnw -pl &apos;:trino-elasticsearch&apos; clean install \
-TC2 -nsu \
-DskipTests \
-Dmaven.javadoc.skip=true \
-Dmaven.source.skip=true 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Sign the CLA.&lt;/p&gt;

    &lt;p&gt;Sign the &lt;a href=&quot;https://github.com/trinodb/cla/blob/master/Trino%20Foundation%20Individual%20CLA.pdf&quot;&gt;contributor license agreement (CLA)&lt;/a&gt; 
 to agree that all of your code you commit to the project is subject to the 
 Apache License 2.0. Once you sign the agreement, scan and submit the form to
 &lt;a href=&quot;mailto:cla@trino.io&quot;&gt;cla@trino.io&lt;/a&gt;. This email gets checked every few days,
 and you can check if your name has been added to the &lt;a href=&quot;https://github.com/trinodb/cla/blob/master/contributors&quot;&gt;contributors&lt;/a&gt;
 list.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;At this point you can look for an &lt;a href=&quot;https://github.com/trinodb/trino/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22&quot;&gt;issue labeled “good first issue”&lt;/a&gt;
This identifies issues that we think are more approachable for developers that 
aren’t as familiar with the Trino repository yet.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;One final thing before you move on to the contribution process. Before you
start jumping in and changing the code, you’ll also want to create a special
branch for your changes. A branch in git makes a separate workflow for all the 
changes you make to be isolated, If something goes wrong, or you need to 
compare with an older branch you can do so. The default branch may either be
named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;master&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt;. See &lt;a href=&quot;https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging&quot;&gt;more on branching in git&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To make a branch for your feature, you can run the following command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git checkout -b my-feature-branch
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Follow the remaining steps in the &lt;a href=&quot;https://trino.io/development/process.html&quot;&gt;contribution process page&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-remove-nulls-from-an-array-in-trino&quot;&gt;Question of the week: How do I remove nulls from an array in Trino?&lt;/h2&gt;

&lt;p&gt;A &lt;a href=&quot;https://stackoverflow.com/questions/66162776&quot;&gt;question posted to StackOverflow&lt;/a&gt; 
asked the following question:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I’m extracting data from a json column in Trino and getting the output in an 
array like this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[&apos;AL&apos;, NULL, &apos;NEW&apos;]&lt;/code&gt;. The problem is I need to remove the null since
the array has to be mapped another array.I tried several options but no luck.
How can I remove the null and get only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[&apos;AL&apos;, &apos;NEW&apos;]&lt;/code&gt; without unnesting?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt; replied:&lt;/p&gt;

&lt;p&gt;You can use &lt;a href=&quot;https://trino.io/docs/current/functions/array.html#filter&quot;&gt;filter()&lt;/a&gt;
for this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SELECT filter(ARRAY[&apos;AL&apos;, NULL,&apos;NEW&apos;], e -&amp;gt; e IS NOT NULL);
   _col0
-----------
 [AL, NEW]
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;News&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The “frog” book has been &lt;a href=&quot;https://item.jd.com/10028492426649.html&quot;&gt;translated to Chinese&lt;/a&gt;!
 Keep your eyes peeled for the rebrand into Trino for the translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary></summary>

      
      
    </entry>
  
    <entry>
      <title>19: Data Ingestion to Iceberg and Trino</title>
      <link href="https://trino.io/episodes/19.html" rel="alternate" type="text/html" title="19: Data Ingestion to Iceberg and Trino" />
      <published>2021-06-10T00:00:00+00:00</published>
      <updated>2021-06-10T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/19</id>
      <content type="html" xml:base="https://trino.io/episodes/19.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Cory Darby, Principal Software Developer at &lt;a href=&quot;https://bluecatnetworks.com/&quot;&gt;BlueCat&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/ckdarby&quot;&gt;@ckdarby&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-358&quot;&gt;Release 358&lt;/h2&gt;

&lt;p&gt;Martin:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW STATS&lt;/code&gt; support for arbitrary queries.&lt;/li&gt;
  &lt;li&gt;Performance improvements for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY ... LIMIT&lt;/code&gt; queries on sorted data.&lt;/li&gt;
  &lt;li&gt;Support for Hive views containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL VIEW&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Reduced graceful shutdown time&lt;/li&gt;
  &lt;li&gt;A bunch of performance and correctness fixes&lt;/li&gt;
  &lt;li&gt;Removed support for legacy JDBC string in driver &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:presto:&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More info at &lt;a href=&quot;https://trino.io/docs/current/release/release-358.html&quot;&gt;https://trino.io/docs/current/release/release-358.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;release-357&quot;&gt;Release 357&lt;/h2&gt;

&lt;p&gt;Martin:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for subquery expressions that produce multiple columns.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_CATALOG&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT_SCHEMA&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Aggregation pushdown for ClickHouse connector.&lt;/li&gt;
  &lt;li&gt;Rule support for identifier mapping in various connectors.&lt;/li&gt;
  &lt;li&gt;New &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;format_number&lt;/code&gt; function.&lt;/li&gt;
  &lt;li&gt;Cast row types as JSON objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Print dynamic filters summary in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN ANALYZE&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Fix trusted cert usage for OAuth&lt;/li&gt;
  &lt;li&gt;clear command in CLI&lt;/li&gt;
  &lt;li&gt;Numerous smaller connector changes - check your favourite connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More at &lt;a href=&quot;https://trino.io/docs/current/release/release-357.html&quot;&gt;https://trino.io/docs/current/release/release-357.html&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-ingesting-into-iceberg-with-pulsar-and-flink-at-bluecat&quot;&gt;Concept of the week: Ingesting into Iceberg with Pulsar and Flink at BlueCat&lt;/h2&gt;

&lt;p&gt;Here are Cory’s slides that you can use to follow along while listening to the 
podcast.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/5KsmZMJtSOoxFx&quot; width=&quot;800&quot; height=&quot;650&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1905-add-format_number-function&quot;&gt;PR of the week: PR 1905 Add format_number function&lt;/h2&gt;

&lt;p&gt;The
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1905&quot;&gt;PR of the week&lt;/a&gt;, is a simple but
always useful PR done by maintainer &lt;a href=&quot;https://twitter.com/ebyhr&quot;&gt;Yuya Ebihara&lt;/a&gt;.
This fixes &lt;a href=&quot;https://github.com/trinodb/trino/issues/1878&quot;&gt;issue 1878&lt;/a&gt; that makes
a nice format for very large numbers that get returned from the query to be
truncated with a value suffix like (B - billion, M - million, K - thousand, 
etc…). Rather than reuse the CLI’s 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/client/trino-cli/src/main/java/io/trino/cli/FormatUtils.java&quot;&gt;FormatUtils&lt;/a&gt;
class, which missed various cases, he created 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/operator/scalar/FormatNumberFunction.java&quot;&gt;his own implementation&lt;/a&gt; 
that solves
for those issues. Thanks Yuya!&lt;/p&gt;

&lt;h2 id=&quot;demo-showing-the-format_number-functionality&quot;&gt;Demo: Showing the format_number functionality&lt;/h2&gt;

&lt;p&gt;Here are the examples we ran in the show.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT format_number(DOUBLE &apos;1234.5&apos;);

SELECT format_number(DOUBLE &apos;-9223372036854775808&apos;);

SELECT format_number(DOUBLE &apos;9223372036854775807&apos;);

SELECT format_number(REAL &apos;-999&apos;);

SELECT format_number(REAL &apos;999&apos;);

SELECT format_number(DECIMAL &apos;-1000&apos;);

SELECT format_number(DECIMAL &apos;1000&apos;);

SELECT format_number(999999999);

SELECT format_number(1000000000);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-search-nested-objects-in-elasticsearch-from-trino&quot;&gt;Question of the week: How do I search nested objects in Elasticsearch from Trino?&lt;/h2&gt;

&lt;p&gt;A &lt;a href=&quot;https://stackoverflow.com/questions/67667313&quot;&gt;question posted to StackOverflow&lt;/a&gt; 
asked how to search nested objects using the Elasticsearch connector.&lt;/p&gt;

&lt;p&gt;Trino maps a &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nested&lt;/code&gt;&lt;/a&gt; 
object type to a &lt;a href=&quot;https://trino.io/docs/current/language/types.html#row&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt;&lt;/a&gt;
the same way that it maps a standard 
&lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html&quot;&gt;object&lt;/a&gt; 
type during a read. The nested designation itself serves no purpose to Trino 
since it only determines how the object is stored in Elasticsearch.&lt;/p&gt;

&lt;p&gt;Check out &lt;a href=&quot;https://stackoverflow.com/a/67843697/2023810&quot;&gt;Brian’s full answer to this question&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;News&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The “frog” book has been &lt;a href=&quot;https://item.jd.com/10028492426649.html&quot;&gt;translated to Chinese&lt;/a&gt;!
 Keep your eyes peeled for the rebrand into Trino for the translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&quot;&gt;Iceberg at Adobe&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Trino Meetup: &lt;a href=&quot;https://www.youtube.com/watch?v=ifXpOn0NJWk&quot;&gt;Apache Iceberg: A table format for data lakes with unforeseen use cases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Cory Darby, Principal Software Developer at BlueCat (@ckdarby) Release 358</summary>

      
      
    </entry>
  
    <entry>
      <title>18: Trino enjoying the view</title>
      <link href="https://trino.io/episodes/18.html" rel="alternate" type="text/html" title="18: Trino enjoying the view" />
      <published>2021-05-20T00:00:00+00:00</published>
      <updated>2021-05-20T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/18</id>
      <content type="html" xml:base="https://trino.io/episodes/18.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/trino-view.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun enjoying the views...
&lt;/p&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Anjali Norwood, Senior Open Source Software Engineer at Netflix 
 (&lt;a href=&quot;https://www.linkedin.com/in/anjali-norwood-9521a16/&quot;&gt;@AnjaliNorwood&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-views-hive-views-and-materialized-views&quot;&gt;Concept of the week: Trino Views, Hive Views and Materialized Views&lt;/h2&gt;

&lt;p&gt;Before diving into views, it can be helpful to take a step back to consider a 
well understood abstraction, like tables, to understand the purpose of a view.
Tables contain data in a vertical orientation, referred to as columns. Databases
represent instances of the data in a horizontal orientation, referred to as rows.
See the following tables, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; tables from the TPCH dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;customer table&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;custkey&lt;/th&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;nationkey&lt;/th&gt;
      &lt;th&gt;acctbal&lt;/th&gt;
      &lt;th&gt;mktsegment&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;Customer#000000376&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;4231.45&lt;/td&gt;
      &lt;td&gt;AUTOMOBILE&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;377&lt;/td&gt;
      &lt;td&gt;Customer#000000377&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;1043.72&lt;/td&gt;
      &lt;td&gt;MACHINERY&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;378&lt;/td&gt;
      &lt;td&gt;Customer#000000378&lt;/td&gt;
      &lt;td&gt;22&lt;/td&gt;
      &lt;td&gt;5718.05&lt;/td&gt;
      &lt;td&gt;BUILDING&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;orders table&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;orderkey&lt;/th&gt;
      &lt;th&gt;custkey&lt;/th&gt;
      &lt;th&gt;orderstatus&lt;/th&gt;
      &lt;th&gt;totalprice&lt;/th&gt;
      &lt;th&gt;orderdate&lt;/th&gt;
      &lt;th&gt;orderpriority&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;172799.49&lt;/td&gt;
      &lt;td&gt;1996-01-02&lt;/td&gt;
      &lt;td&gt;5-LOW&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;38426.09&lt;/td&gt;
      &lt;td&gt;1996-12-01&lt;/td&gt;
      &lt;td&gt;1-URGENT&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;377&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;205654.3&lt;/td&gt;
      &lt;td&gt;1993-10-14&lt;/td&gt;
      &lt;td&gt;5-LOW&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The columns have a schema that enforce particular data types in particular 
columns and prevents insertion of invalid data into the table by throwing
an exception. This becomes extremely useful when reading and processing the data
as there are a clear set of operations that can run on certain columns ased on 
their type. This information is also useful when deserializing result sets into
various in-memory abstractions. Here is an example of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; table 
schema:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;customer table schema&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE customer (
   custkey bigint,
   name varchar(25),
   address varchar(40),
   nationkey bigint,
   phone varchar(15),
   acctbal double,
   mktsegment varchar(10),
   comment varchar(117)
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;views-and-materialized-views&quot;&gt;Views and materialized views:&lt;/h3&gt;

&lt;p&gt;The structure of a view is similar to tables in that they have columns, rows,
and schemas similar to regular database tables. What then do views offer over
tables? Views offer ways to encapsulate complex SQL statements. For example,
take this SQL query that would run over the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; tables
defined before.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
 c.custkey, 
 name, 
 nationkey, 
 mktsegment, 
 sumtotalprice, 
 openstatuscount, 
 failedstatuscount, 
 partialstatuscount
FROM 
 customer c 
 JOIN (
  SELECT 
   custkey, 
   SUM(totalprice) AS sumtotalprice, 
   COUNT_IF(orderstatus = &apos;O&apos;) AS openstatuscount,
   COUNT_IF(orderstatus = &apos;F&apos;) AS failedstatuscount, 
   COUNT_IF(orderstatus = &apos;P&apos;) AS partialstatuscount
  FROM orders
  GROUP BY custkey
 ) o
 ON c.custkey = o.custkey;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This query performs some aggregations on the orders table grouped by customer.
Then there is a join performed on the aggregated orders table and customer table
by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;custkey&lt;/th&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;nationkey&lt;/th&gt;
      &lt;th&gt;mktsegment&lt;/th&gt;
      &lt;th&gt;sumtotalprice&lt;/th&gt;
      &lt;th&gt;openstatuscount&lt;/th&gt;
      &lt;th&gt;failedstatuscount&lt;/th&gt;
      &lt;th&gt;partialstatuscount&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;376&lt;/td&gt;
      &lt;td&gt;Customer#000000376&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;AUTOMOBILE&lt;/td&gt;
      &lt;td&gt;1600696.4700000002&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;377&lt;/td&gt;
      &lt;td&gt;Customer#000000377&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;MACHINERY&lt;/td&gt;
      &lt;td&gt;803271.9400000001&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;379&lt;/td&gt;
      &lt;td&gt;Customer#000000379&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;AUTOMOBILE&lt;/td&gt;
      &lt;td&gt;3155009.54&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;From here, there are many ways you could further evaluate the resulting data. 
You could filter and look at which market segment is spending the most on your
products. You could also look at where there are the most failed orders by the
nation column to evaluate where shipping lines may need to be improved. The 
table above which results from the example query, is a good intermediate state 
of the data that can be reused for many future evaluations. Instead of defining
a new table, you can create a view on this data that encapsulates the complex
SQL that was used to calculate it. This is done using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE VIEW&lt;/code&gt; 
statement.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE VIEW customer_orders_view AS 
&amp;lt;complex SQL query above&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, when you want to run any further analysis on this intermediate dataset, you
simply refer to the view instead of having to rewrite the statement before. As
mentioned, this view also has a schema and is treated much like a table when the
query engine does its planning. In this way it is also easier to map the data to
the application logic by enabling different shapes of the same data. It should
be made clear that these views are read-only and do not allow inserts, updates,
or deleting from the view.&lt;/p&gt;

&lt;p&gt;Another reason why you would want to create a view is to control read access to the
data. When running the query, you get to choose which columns and rows get
filtered out and that return from when users query the view. The authorization 
of a user  is tied to the view and its content, and that can significantly 
differ from the complete data in the underlying tables. For example, the views 
can exclude sensitive data like social security numbers, birth dates, credit 
card numbers, and many other facts.&lt;/p&gt;

&lt;p&gt;When creating a view, there are two modes that the view can run in that will 
indicate the user that will run the queries defined in the view during query
runtime. You can either run this query as the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINER&lt;/code&gt; which indicates to run
the view query as the user that created the view, or as the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INVOKER&lt;/code&gt;, which
indicates to run the view query as the user that is running the outer query of
the view. The default mode is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINER&lt;/code&gt;. See more 
&lt;a href=&quot;https://trino.io/docs/current/sql/create-view.html#security&quot;&gt;in the security section of the create view documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are two types of views; materialized and logical views. The view defined
above is the standard logical view that gets expanded into its definition. 
Logical views do not provide any performance benefit since the data is not 
stored and instead queried at query time. Materialized views persist the view 
data upon view creation by storing the query data.&lt;/p&gt;

&lt;p&gt;Materialized views make overall queries much faster to run as part of the query
has already been computed. One issue with materialized views is that the data 
may become outdated and out of sync with the underlying table data. To keep the 
data between the tables and materialized view in sync, you have to refresh the 
view. A special refresh command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; is called 
periodically to handle this operation, or to schedule the procedure run 
automatically.&lt;/p&gt;

&lt;h3 id=&quot;trino-views-so-many-views-so-little-time&quot;&gt;Trino views: So many views, so little time&lt;/h3&gt;

&lt;p&gt;Views handling in Trino depends on the connector. In general, most connectors
expose views to Trino as if they are another set of tables available for Trino
to query. The main exceptions for this is the Hive and Iceberg connectors. The 
table below lists the current possible Hive and Iceberg views.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
  &lt;tr&gt;
    &lt;th colspan=&quot;2&quot;&gt;&lt;/th&gt;
    &lt;th&gt;Logical&lt;/th&gt;
    &lt;th&gt;Materialized&lt;/th&gt;
  &lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
  &lt;tr&gt;
    &lt;td rowspan=&quot;2&quot;&gt;Trino Created View&lt;/td&gt;
    &lt;td&gt;Hive Connector&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
    &lt;td&gt;❌&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Iceberg Connector&lt;/td&gt;
    &lt;td&gt;✅ (Edit: &lt;a href=&quot;https://github.com/trinodb/trino/pull/8540&quot;&gt;PR 8540&lt;/a&gt;)&lt;/td&gt;
    &lt;td&gt;✅&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td colspan=&quot;2&quot;&gt;Hive Created View&lt;/td&gt;
    &lt;td&gt;✅ (read-only)&lt;/td&gt;
    &lt;td&gt;✅ (read-only)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You’ll notice that the materialized views cannot be created through the Hive
connector in Trino. You will get the following exception:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Caused by: java.sql.SQLException: Query failed (#...): 
This connector does not support creating materialized views.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Also, you cannot create logical views in Iceberg and you will get the following
exception:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Caused by: java.sql.SQLException: Query failed (#...): 
This connector does not support creating views.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;trino-reads-hive-views&quot;&gt;Trino reads Hive views&lt;/h4&gt;

&lt;p&gt;Before Trino there was Hive. Trino is a replacement for the Hive runtime for 
many users, and it is very useful for these users to also be able to read data 
from Hive views in Trino. Trino always aims to be compatible with as many Hive abstractions
as possible to make migrating away from Hive to Trino as painless as possible. 
So Trino supports reading data from Hive Views, though it doesn’t support 
updates on these views. You have to update these views through Hive and ideally
you will gradually migrate these views to Trino over time. Trino also supports
reading Hive materialized views, though Trino reads these views as another Hive 
table rather since they are stored similarly to standard Hive tables. Since
Hive views are defined in HiveQL, the view definitions need to be translated to
Trino SQL syntax. This is done using LinkedIn’s Coral library.&lt;/p&gt;

&lt;h4 id=&quot;coral-the-unifier-of-the-bee-and-the-bunny&quot;&gt;Coral: the unifier of the bee and the bunny&lt;/h4&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/linkedin/coral&quot;&gt;Coral&lt;/a&gt; is a project that allows for 
translation between views from different SQL syntax. It can process Hive QL 
statements and convert them to an internal representation using
&lt;a href=&quot;https://calcite.apache.org/&quot;&gt;Apache Calcite&lt;/a&gt;. It then converts the internal
representation to Trino SQL.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/coral.png&quot; /&gt;
&lt;/p&gt;

&lt;h4 id=&quot;trino-reading-hive-view-sequence-diagrams&quot;&gt;Trino reading Hive view sequence diagrams&lt;/h4&gt;

&lt;p&gt;In both of these sequence diagrams, notice that the first actions are to create
a Hive view. This is created and maintained by the Hive system and it is 
impossible to create or update a similar view in Trino.&lt;/p&gt;

&lt;p&gt;This diagram shows the creation of a Hive view, then shows the sequence of events 
when Trino reads that view.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/hive-view-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows the creation of a Hive materialized view, then shows the 
sequence of events when Trino reads the materialized view.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/hive-materialized-view-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h4 id=&quot;trino-native-view-sequence-diagrams&quot;&gt;Trino native view sequence diagrams&lt;/h4&gt;

&lt;p&gt;This diagram shows the sequence diagram for a Trino view that is created using 
the Hive Connector.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/trino-view-hive-connector-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This diagram shows the sequence diagram for a materialized Trino view that is 
created using the Iceberg Connector.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/18/trino-materialized-view-iceberg-connector-sequence.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;iceberg-materialized-view-refresh-currently-only-full-refresh-in-iceberg-connector&quot;&gt;Iceberg materialized view refresh (currently only full refresh in Iceberg connector)&lt;/h3&gt;

&lt;p&gt;Ideally, as the tables underlying a materialized view change, the materialized
view should be automatically and incrementally updated to reflect the results 
that are in sync with latest data.&lt;/p&gt;

&lt;p&gt;Automatically keeping materialized views fresh can be tricky from resource 
management point of view since the computation to materialize the materialized 
view can be expensive. Trino currently does not support automatic refresh of 
materialized views. It instead supports the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; command 
that the user can issue to ensure that the materialized view is fresh.&lt;/p&gt;

&lt;p&gt;As a part of executing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; command in Trino, existing
data in the materialized view is dropped and new data is inserted if there are 
any changes to base data. If the base data has not changed at all, the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt; command is a no-op.&lt;/p&gt;

&lt;p&gt;What happens if the user issues a query against the materialized view, and the 
materialized view is not fresh? Trino detects that the materialized view is 
stale, so it expands the materialized view definition, much like a logical view 
and executes that SQL statement. Trino runs the query against the base tables.&lt;/p&gt;

&lt;p&gt;Incremental or delta refresh of materialized views is a more efficient way of
keeping the materialized view in sync with the base data. An incremental refresh 
means only parts of the data that need to be updated in a materialized view are 
updated The rest of the data is left untouched. For example, say you have a base
table, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sales&lt;/code&gt;, partitioned on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt; column. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sales&lt;/code&gt; table only gets 
inserted data for that day. If the materialized view is also partitioned on 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt;, a new partition for a day can be added and data inserted for that day. 
Data for previous days/months is still fresh and can be left untouched. 
This is something on Netflix’s roadmap. The incremental refresh of the 
materialized view can be a partition level refresh, another can be a more 
granular row-level refresh by using functionality similar to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SQL MERGE&lt;/code&gt; 
statement.&lt;/p&gt;

&lt;h3 id=&quot;support-in-trino-and-at-netflix&quot;&gt;Support in Trino and at Netflix:&lt;/h3&gt;

&lt;h4 id=&quot;netflix-materialized-views&quot;&gt;Netflix materialized views&lt;/h4&gt;

&lt;p&gt;The main reason Netflix is interested in materialized views is to give analysts 
an easy way to compute and materialize their frequently used queries and keep 
the results refreshed without relying on ETL pipeline to create and maintain 
those result sets. Some materialized views are as simple as queries that project
columns and apply filters, selecting data for a time range or for a test-id. 
Others are more complex that perform multi-level joins and aggregations.&lt;/p&gt;

&lt;h4 id=&quot;netflix-materialized-view-cross-compatibility-extension&quot;&gt;Netflix materialized view cross compatibility extension&lt;/h4&gt;

&lt;p&gt;Materialized views, much like logical views, are compatible across Trino and 
Spark, the two main engines used at Netflix. Spark is used at Netflix to do ETL, 
and creating and populating tables. Trino is the most popular engine with 
analysts and developers for adhoc and experimental queries as well as audits.&lt;/p&gt;

&lt;p&gt;Trino is also used for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS SELECT&lt;/code&gt; (CTAS) in some use cases. Both 
the engines access data from tables using Iceberg and Hive connectors where data
is stored in S3. Netflix built upon the Trino logical views to create common 
views that are accessible from both Spark and Trino. The difference between the 
Trino logical views and Netflix common views is that the metadata is stored in 
the Hive metastore for Trino logical views, while common views store their 
metadata in JSON format in S3.&lt;/p&gt;

&lt;p&gt;A view object in Hive metastore points to the S3 location of metadata. It tracks
evolution of view definition in the form of versions so that you can potentially
revert a view to its older version. Main benefit of common views is 
interoperability between Spark and Trino (can create, replace, query, drop from 
either engine and can be expanded to other engines). Netflix supports common 
views through both Hive and Iceberg connectors.&lt;/p&gt;

&lt;p&gt;Currently, common views support SQL syntax common to both Spark and Trino. This 
support can be expanded in future using LinkedIn’s Coral project such that 
engine specific syntax and semantics can be translated and interpreted by 
another engine. Netflix materialized views are an extension of Trino 
materialized views to make them inter-operable between Spark and Trino. The only
difference between Trino and Netflix materialized views is where the metadata is
stored, very similar to Trino and Netflix logical views.&lt;/p&gt;

&lt;h3 id=&quot;roadmap&quot;&gt;Roadmap:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Netflix is looking into caching query results using materialized views and 
  memory connector.&lt;/li&gt;
  &lt;li&gt;Incremental refresh ideas.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-4832-add-iceberg-support-for-materialized-views&quot;&gt;PR of the week: PR 4832 Add Iceberg support for materialized views&lt;/h2&gt;

&lt;p&gt;Our guest, Anjali, is the author of this weeks 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/4832&quot;&gt;PR of the week&lt;/a&gt;, which adds Iceberg
support for materialized views. Thanks Anjali!&lt;/p&gt;

&lt;p&gt;Honorable PR mentions:&lt;/p&gt;

&lt;p&gt;In order for the PR of the week to work, Anjali 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/3283&quot;&gt;added syntax support&lt;/a&gt; for Trino 
materialized views with commands: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE MATERIALIZED VIEW&lt;/code&gt;, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REFRESH MATERIALIZED VIEW&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP MATERIALIZED VIEW&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Before any of this was done, user &lt;a href=&quot;https://github.com/laurachenyu&quot;&gt;laurachenyu&lt;/a&gt; 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/4661&quot;&gt;integrated Coral with trino to enable querying hive views&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-showing-the-different-views-in-trino&quot;&gt;Demo: Showing the different views in Trino&lt;/h2&gt;

&lt;p&gt;In Trino, create some Hive tables in a hive catalog named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hdfs&lt;/code&gt; that represents
the underlying storage Trino writes to.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE SCHEMA hdfs.tiny
WITH (location = &apos;/tiny/&apos;);

CREATE TABLE hdfs.tiny.customer
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;/tiny/customer/&apos;
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE hdfs.tiny.orders
WITH (
  format = &apos;ORC&apos;,
  external_location = &apos;/tiny/orders/&apos;
) 
AS SELECT * FROM tpch.tiny.orders;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, create a logical Hive view (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive_view&lt;/code&gt;), and a materialized Hive view
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive_materialized_view&lt;/code&gt;) from the Hive CLI.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;USE tiny;

CREATE VIEW hive_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM customer c JOIN orders o ON c.custkey = o.custkey;

CREATE MATERIALIZED VIEW hive_materialized_view AS
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM customer c JOIN orders o ON c.custkey = o.custkey;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you create the views, you should check the state in the hive metastore.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT t.TBL_NAME, t.TBL_TYPE, t.VIEW_EXPANDED_TEXT, t.VIEW_ORIGINAL_TEXT 
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
WHERE d.NAME = &apos;tiny&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the Hive views exist, you can then query them from Trino.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE VIEW hdfs.tiny.trino_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* Fails: Caused by: java.sql.SQLException: Query failed (#20210516_032433_00002_6syuw): 
This connector does not support creating materialized views */
CREATE MATERIALIZED VIEW hdfs.tiny.trino_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* Fails: Caused by: java.sql.SQLException: Query failed (#20210516_101856_00009_ihjur): 
This connector does not support creating views */
CREATE VIEW iceberg.tiny.iceberg_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

CREATE MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* 
This REFRESH call failed during the show due to the fact that I created the 
materialized Trino view in the Iceberg (`iceberg`) catalog using tables from the
Hive(`hdfs`) catalog. I should have created the materialized view using the
iceberg catalog:

CREATE MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM iceberg.tiny.customer c JOIN iceberg.tiny.orders o ON c.custkey = o.custkey;
*/
REFRESH MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view;

/* query tables */

SELECT * FROM hdfs.tiny.customer LIMIT 3;

SELECT * FROM hdfs.tiny.orders LIMIT 3;

/* query views */

SELECT * FROM hdfs.tiny.trino_view LIMIT 3;

SELECT * FROM hdfs.tiny.hive_view LIMIT 3;

SELECT * FROM hdfs.tiny.hive_materialized_view LIMIT 3;

SELECT * FROM iceberg.tiny.iceberg_materialized_view LIMIT 3;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-are-jdbc-drivers-backwards-compatible-with-older-trino-versions&quot;&gt;Question of the week: Are JDBC drivers backwards compatible with older Trino versions?&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Full question:&lt;/strong&gt; Are JDBC drivers backwards compatible with older Trino 
versions? I’m trying to install the 354 driver on a multi-tenanted Tableau 
server where there might be older Trino versions in play. Do I need to upgrade 
my Trino clients right away when upgrading my server to Trino version from 
&amp;lt;=350 to &amp;gt;350?&lt;/p&gt;

&lt;p&gt;For this particular users case, the answer is that they won’t need to upgrade 
their clients assuming they are on Trino servers. If their server versions are
PrestoSQL version &amp;lt;= 350 then they will need to hold off on upgrading to a Trino
client.&lt;/p&gt;

&lt;p&gt;Trino’s JDBC drivers typically maintain compatibility with older server versions
(and vice versa). However, the project was renamed from PrestoSQL to Trino 
starting version 351, and as a consequence, JDBC drivers with version &amp;gt;= 351 are
not compatible with servers with version &amp;lt;= 350. More details at:
&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In short, you can have a PrestoSQL client with a Trino server, but you can’t 
have a Trino client with an PrestoSQL server.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Join for an awesome event on May 26th as Iceberg Creator, Ryan Blue, dives 
 into some interesting and less conventional use cases of Apache Iceberg.
 &lt;a href=&quot;https://www.meetup.com/trino-americas/events/278103777/&quot;&gt;Trino Americas meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://engineering.linkedin.com/blog/2020/coral&quot;&gt;https://engineering.linkedin.com/blog/2020/coral&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.arcadiadata.com/lp/tech-talk-on-join-optimization/&quot;&gt;https://www.arcadiadata.com/lp/tech-talk-on-join-optimization/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun enjoying the views...</summary>

      
      
    </entry>
  
    <entry>
      <title>Row pattern recognition with MATCH_RECOGNIZE</title>
      <link href="https://trino.io/blog/2021/05/19/row_pattern_matching.html" rel="alternate" type="text/html" title="Row pattern recognition with MATCH_RECOGNIZE" />
      <published>2021-05-19T00:00:00+00:00</published>
      <updated>2021-05-19T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/05/19/row_pattern_matching</id>
      <content type="html" xml:base="https://trino.io/blog/2021/05/19/row_pattern_matching.html">&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax was introduced in the latest SQL specification
of 2016. It is a super powerful tool for analyzing trends in your data. We are
proud to announce that Trino supports this great feature since
&lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;version 356&lt;/a&gt;. With
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you can define a pattern using the well-known regular
expression syntax, and match it to a set of rows. Upon finding a matching row
sequence, you can retrieve all kinds of detailed or summary information about
the match, and pass it on to be processed by the subsequent parts of your
query. This is a new level of what a pure SQL statement can do.&lt;/p&gt;

&lt;p&gt;This blog post gives you a taste of row pattern matching capabilities, and a
quick overview of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;a-regular-expression-and-a-table-a-fruitful-relationship&quot;&gt;A regular expression and a table: a fruitful relationship&lt;/h2&gt;

&lt;p&gt;The regex matching we all know is about searching for patterns in character
strings. But how does a regex match a sequence of rows? Certainly, a row of
data is a more complex structure than a character. And so, row pattern matching
is more expressive than regex matching in text. Unlike characters, which stay
constantly in their places in a string, rows aren’t assigned up-front to
pattern components. This is where the additional level of complexity comes
from: whether the row is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;, is conditional. It is revealed as
the pattern matching goes forward. It depends on the data in the row, but also
on the context of the current match and even on the match number. Also, a row
can match different labels at a time.&lt;/p&gt;

&lt;p&gt;Consider this simple example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PATTERN: A B+ C D?
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, let’s match it to the string &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;ABBCEE&quot;&lt;/code&gt;. There is exactly one way to
match it: the prefix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;ABBC&quot;&lt;/code&gt; is a match.&lt;/p&gt;

&lt;p&gt;Now, let’s see what it takes to match a pattern to rows of a table.
Consider the table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;numbers&lt;/code&gt; with a single column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;number&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/table-numbers.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You need &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;defining conditions&lt;/code&gt; to define how the rows of the table can be
mapped to pattern components &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;DEFINE:
    A &amp;lt;- true (matches every row)
    B &amp;lt;- number is greater than previous number
    C &amp;lt;- number is lower or equal to A
    D &amp;lt;- matches every row, but only in the first match;
         otherwise doesn&apos;t match any row
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, the conditions can refer to other pattern components (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;
 depends on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;), or the sequential match number (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When searching for a match, the engine goes row by row, and assigns labels
according to the pattern. Every time the pattern shows the next component
(label) to be matched, the defining condition of that component is evaluated
for the current row in the context of the partial match.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/first-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;After finding a match, you can step one row forward and search for another one.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/second-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;So far, two matches were found in the same set of rows. Interestingly, a row
that was labeled as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt; in the first match, became &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt; in the second match.
Let’s try to find another match.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/third-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;time-to-get-more-technical&quot;&gt;Time to get more technical&lt;/h2&gt;

&lt;p&gt;…and use some real &lt;s&gt;life&lt;/s&gt; money examples.&lt;/p&gt;

&lt;p&gt;In the preceding examples, the pattern consisted of components &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt;. They were chosen this way to capture the analogy between pattern
matching in a string and pattern matching in a set of rows. According to the
SQL specification, row pattern components can be named with arbitrary
identifiers, as long as they are compliant with the SQL identifier semantics,
so you don’t need to limit yourself to single-letter names, and instead you can
use more verbose labels.&lt;/p&gt;

&lt;p&gt;Officially, the pattern components, or labels, are called the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;primary pattern
variables&lt;/code&gt;. They are the basic components of the row pattern. Consider the
following example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PATTERN( START DOWN+ UP+ )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are three primary pattern variables: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;START&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOWN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UP&lt;/code&gt;. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+&lt;/code&gt; is
the “one or more” quantifier you know from the regex syntax. Intuitively, this
pattern should match a sequence of rows which are first “decreasing”, and then
“increasing”. You need to inform the engine how it should map rows to the
variables. In other words, you need to define what the “decreasing” and
“increasing” rows are:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;DEFINE DOWN AS price &amp;lt; PREV(price),
       UP AS price &amp;gt; PREV(price)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now it’s clear that “decreasing” and “increasing” is about the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; values.
There is no defining condition for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;START&lt;/code&gt; variable, which informs the
engine that the match can start anywhere.&lt;/p&gt;

&lt;p&gt;The preceding example shows the two key clauses of row pattern recognition:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATTERN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt;. Let’s see what other keywords there are in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOHNIZE&lt;/code&gt; clause.&lt;/p&gt;

&lt;h2 id=&quot;syntax-overview&quot;&gt;Syntax overview&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; syntax is long and rich enough to capture everything that
a pattern matching tool needs, and all the options which let you easily toggle
your matching strategies.&lt;/p&gt;

&lt;p&gt;Technically, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; is part of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT ...
    FROM some_table
        MATCH_RECOGNIZE (
          [ PARTITION BY column [, ...] ]
          [ ORDER BY column [, ...] ]
          [ MEASURES measure_definition [, ...] ]
          [ rows_per_match ]
          [ AFTER MATCH skip_to ]
          PATTERN ( row_pattern )
          [ SUBSET subset_definition [, ...] ]
          DEFINE variable_definition [, ...]
          )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; can be used in the query as one of the stages of processing
data. You can &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; from its results or even stream them into another
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATTERN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clauses are the heart of row pattern recognition.
They are also the only two required subclauses of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;. They were
touched upon in the previous section.&lt;/p&gt;

&lt;p&gt;The pattern syntax is close to regular expression syntax. It also supports some
extensions specific to row pattern recognition. They are explained in
&lt;a href=&quot;#pattern-syntax&quot;&gt;Row pattern syntax&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PARTITION BY&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clauses are similar to those in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt;
syntax. They help you structure the input data. You can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PARTITION BY&lt;/code&gt; to
break up your data into independent chunks. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; is useful to establish 
the order of rows before searching for the pattern. Typically, you want to
analyze series of events over time, so ordering by date is a good choice.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/partition-by-order-by.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause, you can specify what information you need about every
match that is found. In the example, if you’re interested in the order date,
the lowest value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; and the sequential number of the match, this is the
way to retrieve them:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;MEASURES order_date AS date,
         LAST(DOWN.price) AS bottom_price,
         MATCH_NUMBER() AS match_no
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bottom_price&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_no&lt;/code&gt; are exposed by the pattern recognition
clause as output columns.&lt;/p&gt;

&lt;p&gt;The expressions in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clauses allow you to combine the
input data with the information about the matched pattern. They support many
extensions and special constructs to help you get the most of your data, both
when defining the pattern, and retrieving useful information after a successful
match. The special keyword &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt; is one example. For the full list of the
magic spells, check &lt;a href=&quot;#expressions&quot;&gt;Expressions for special tasks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; clause has two useful toggles. The first of them lets you
choose whether the output includes all rows of the match, or a single-row
summary. For all rows, specify &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt;. For a single row, choose
the default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ONE ROW PER MATCH&lt;/code&gt;. There are also sub-options available, enabling
different handling of empty matches and unmatched rows.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/rows-per-match.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Another toggle is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP&lt;/code&gt; clause. It allows you to specify where
the row pattern matching resumes after finding a match. The default option is
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AFTER MATCH SKIP PAST LAST ROW&lt;/code&gt;, but you can also skip to the next row or to a
specific position in the match based on the matched pattern variables.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/after-match-skip.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SUBSET&lt;/code&gt; clause is where the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;union pattern variables&lt;/code&gt; are defined. They
are a concise way to refer to a group of primary pattern variables:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SUBSET U = (DOWN, UP)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The following expression returns the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; from the last row
matched either to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOWN&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UP&lt;/code&gt; primary variable:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;LAST(U.price)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;-row-pattern-syntax&quot;&gt;&lt;a name=&quot;pattern-syntax&quot;&gt;&lt;/a&gt; Row pattern syntax&lt;/h2&gt;

&lt;p&gt;The basic element of row pattern is the primary pattern variable. Other syntax
components include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concatenation&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;A B C
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Alternation&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;A | B | C
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Permutation&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PERMUTE(A, B, C)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Grouping&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(A B C)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Partition start anchor&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;^
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Partition end anchor&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Empty pattern&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Exclusion syntax&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{- row_pattern -}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Exclusion syntax is useful in combination with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt; option.
If you find some sections of the match uninteresting, you can wrap them in the
exclusion, and they are dropped from the output.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/exclusion.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quantifiers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Row pattern syntax supports all kinds of quantifiers: the basic ones &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;?&lt;/code&gt;, and others, which let you specify the exact number of repetitions, or the
accepted range: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n}&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n, m}&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n,}&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{,n}&lt;/code&gt;. Make sure you don’t confuse
those:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n}&lt;/code&gt; is for exactly n repetitions,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n,}&lt;/code&gt; is equal to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{n, ∞}&lt;/code&gt;,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{,n}&lt;/code&gt; is equal to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{0, n}&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quantifiers are greedy by default. It means that they prefer higher number of
repetitions over lower number. If you want it the other way, you can change a
quantifier to reluctant by appending &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;?&lt;/code&gt; immediately after it. So, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(pattern)?&lt;/code&gt;
prefers a single match of the pattern, while &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(pattern)??&lt;/code&gt; would rather omit
the pattern altogether.&lt;/p&gt;

&lt;h3 id=&quot;match-preference&quot;&gt;Match preference&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; is supposed to produce at most one match starting from a
specific row. If there are more matches available, the winner is chosen based
on the order of preference. The greedy and reluctant quantifiers are one
example of preference. Other pattern components have their own rules:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;pattern alternation prefers the left-hand components to the right-hand ones.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;pattern permutation is equivalent to alternation of all permutations of its
components. If multiple matches are possible, the match is chosen based on the
lexicographical order established by the order of components in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PERMUTE&lt;/code&gt;
list. For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PERMUTE(A, B, C)&lt;/code&gt;, the preference of options goes as follows:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A B C&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A C B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B A C&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B C A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C A B&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C B A&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;-expressions-for-special-tasks&quot;&gt;&lt;a name=&quot;expressions&quot;&gt;&lt;/a&gt; Expressions for special tasks&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; clause provides special expression syntax, available in
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clauses. Its purpose is to combine the input data
with the information about the match. The syntax includes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Pattern variable references&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They allow referring to certain components of the match, for example
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOWN.price&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UP.order_date&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Logical navigation operations: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FIRST&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They allow you to navigate over the rows of a match based on the pattern
variables assigned to them. For example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST(DOWN.price, 3)&lt;/code&gt; navigates to the
last row labeled as “DOWN”, goes three occurrences of the “DOWN” label
backwards, and gets the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; value from that row. The default offset is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt;:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST(DOWN.price)&lt;/code&gt; gets the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt; value from the last row labeled as “DOWN”.
If the logical navigation goes beyond the match bounds, the operation returns
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Physical navigation operations: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PREV&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NEXT&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They let you navigate over the rows of the partition by a specified offset.
Physical navigations use logical navigations as the starting point. For
example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NEXT(DOWN.price, 5)&lt;/code&gt; first navigates to the last row labeled as
“DOWN”. Starting from there, it goes five rows forward and gets the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt;
value from that row. In the preceding example, the logical navigation &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LAST&lt;/code&gt; is
implicit, but you can specify the nested logical navigation explicitly, for
example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NEXT(FIRST(DOWN.price, 4), 5)&lt;/code&gt;. The default offset is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;, which means
that the physical navigations by default go one row backwards, or one row
forward.&lt;/p&gt;

&lt;p&gt;The physical navigation can retrieve values beyond the match bounds. It gives
you great flexibility. For example, the defining conditions of pattern
variables can peek at the values ahead. Also, when computing row pattern
measures, you can refer to the wider context of the match.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CLASSIFIER&lt;/code&gt; function&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It returns the primary pattern variable associated with the row.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_NUMBER&lt;/code&gt; function&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It returns the sequential number of the match within the partition.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RUNNING&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; keywords&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The expressions in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; clause are evaluated when the pattern matching
is in progress. At each step, the engine only knows a part of the match. This
is the &lt;em&gt;running semantics&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The expressions of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause are evaluated when the match is
complete. The engine can see the whole match from the position of the final
row. This is the &lt;em&gt;final semantics&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;However, with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALL ROWS PER MATCH&lt;/code&gt; option, when the match result is
processed row by row, you can choose either approach to compute the measures.
To do that, you can specify the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RUNNING&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; keyword before the logical
navigation operation, for example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RUNNING LAST(DOWN.price)&lt;/code&gt; or
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL LAST(DOWN.price)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;running semantics&lt;/em&gt; is the default both in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEFINE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MESAURES&lt;/code&gt;
clauses. Note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FINAL&lt;/code&gt; only applies to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MEASURES&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;To sum up, here’s one complex measure expression combining different elements
of the special syntax:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/match-recognize/measure-example.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;trino-cli-show-off-time&quot;&gt;Trino CLI show-off time!&lt;/h2&gt;

&lt;p&gt;Now, let’s see the whole machinery come to life. This is the same example data
that we used before, and the same goal: detect a “V”-shape of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;price&lt;/code&gt;
values over time for different customers.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; WITH orders(customer_id, order_date, price) AS (VALUES
    (&apos;cust_1&apos;, DATE &apos;2020-05-11&apos;, 100),
    (&apos;cust_1&apos;, DATE &apos;2020-05-12&apos;, 200),
    (&apos;cust_2&apos;, DATE &apos;2020-05-13&apos;,   8),
    (&apos;cust_1&apos;, DATE &apos;2020-05-14&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-15&apos;,   4),
    (&apos;cust_1&apos;, DATE &apos;2020-05-16&apos;,  50),
    (&apos;cust_1&apos;, DATE &apos;2020-05-17&apos;, 100),
    (&apos;cust_2&apos;, DATE &apos;2020-05-18&apos;,   6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
    FROM orders
        MATCH_RECOGNIZE (
            PARTITION BY customer_id
            ORDER BY order_date
            MEASURES
                START.price AS start_price,
                LAST(DOWN.price) AS bottom_price,
                LAST(UP.price) AS final_price,
                START.order_date AS start_date,
                LAST(UP.order_date) AS final_date
            ONE ROW PER MATCH
            AFTER MATCH SKIP PAST LAST ROW
            PATTERN (START DOWN+ UP+)
            DEFINE
                DOWN AS price &amp;lt; PREV(price),
                UP AS price &amp;gt; PREV(price)
            );

 customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
 cust_1      |         200 |           50 |         100 | 2020-05-12 | 2020-05-17
 cust_2      |           8 |            4 |           6 | 2020-05-13 | 2020-05-18
(2 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Two matches are detected, one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_1&lt;/code&gt;, and one for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cust_2&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;empty-matches-explained&quot;&gt;Empty matches explained&lt;/h2&gt;

&lt;p&gt;An empty match is a legit result of row pattern recognition. There are
different pattern constructs that can result in an empty match. The empty
pattern syntax &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;()&lt;/code&gt; is the trivial one. Empty match can also result e.g. from
quantification: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A*&lt;/code&gt;, or alternation: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A | ()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;An empty match does not consume any input rows, but like every match, it is
associated with a row, called the &lt;em&gt;starting row&lt;/em&gt;. That is the row at which the
pattern matching started. Note that if the pattern allows an empty match, it
guarantees that no rows remain unmatched. Also, an empty match, as well as
non-empty matches, gets a sequential number, which can be retrieved by the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_NUMBER&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;Depending on your use case, you can consider empty matches informative or just
see them as a leftover of the algorithm.&lt;/p&gt;

&lt;p&gt;There’s one more thing linked to empty matches. Some patterns have the
dangerous potential of looping endlessly over a piece that doesn’t consume any
rows. It doesn’t have to be as explicit as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;()*&lt;/code&gt;. There are complex patterns
that don’t show their looping potential at first glance. We handled them
carefully so that you never have to waste your time on looping queries.&lt;/p&gt;

&lt;h2 id=&quot;in-a-few-words-whats-so-cool-about-row-pattern-matching&quot;&gt;In a few words, what’s so cool about row pattern matching?&lt;/h2&gt;

&lt;p&gt;From the SQL viewpoint, you can think of row pattern matching as extended
window functions. Window functions allow you to capture some dependencies in
rows of data based on their relative position or value. Row pattern matching
allows you to detect arbitrarily complicated dependencies, based not only on
the input values but also on the details of the actual match and on the match
number.&lt;/p&gt;

&lt;p&gt;Before the introduction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt;, you had to feed your data to
external tools to reason about trends and patterns. Now, you can achieve it
directly in your query, and even build your query upon the pattern recognition
clause to further process the match results.&lt;/p&gt;

&lt;p&gt;Row pattern matching is typically used:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;in trade applications for tracking trends or identifying customers with
specific behavioral patterns,&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;in shipping applications for tracking packages through all possible valid
paths,&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;in financial applications for detecting unusual incidents, which might signal
fraud.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What’s your use case?&lt;/p&gt;

&lt;p&gt;I hope you enjoy Trino’s new feature. Refer to
&lt;a href=&quot;https://trino.io/docs/current/sql/match-recognize.html&quot;&gt;Trino docs&lt;/a&gt; for even
more details, examples and usage tips. &lt;a href=&quot;/slack.html&quot;&gt;Please &lt;strong&gt;do&lt;/strong&gt; reach out to us with any
questions or issues&lt;/a&gt;. We plan to support row pattern matching in
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause soon, so stay tuned!&lt;/p&gt;</content>

      
        <author>
          <name>Kasia Findeisen (kasiafi)</name>
        </author>
      

      <summary>The MATCH_RECOGNIZE syntax was introduced in the latest SQL specification of 2016. It is a super powerful tool for analyzing trends in your data. We are proud to announce that Trino supports this great feature since version 356. With MATCH_RECOGNIZE, you can define a pattern using the well-known regular expression syntax, and match it to a set of rows. Upon finding a matching row sequence, you can retrieve all kinds of detailed or summary information about the match, and pass it on to be processed by the subsequent parts of your query. This is a new level of what a pure SQL statement can do. This blog post gives you a taste of row pattern matching capabilities, and a quick overview of the MATCH_RECOGNIZE syntax.</summary>

      
      
    </entry>
  
    <entry>
      <title>17: Trino connector resurfaces API calls</title>
      <link href="https://trino.io/episodes/17.html" rel="alternate" type="text/html" title="17: Trino connector resurfaces API calls" />
      <published>2021-05-13T00:00:00+00:00</published>
      <updated>2021-05-13T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/17</id>
      <content type="html" xml:base="https://trino.io/episodes/17.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/trino-resurface.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun is diving deep to find anomalies!
&lt;/p&gt;

&lt;h2 id=&quot;resurface-links&quot;&gt;Resurface links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/&quot;&gt;Resurface site&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/resurfaceio&quot;&gt;Resurface GitHub&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/slack&quot;&gt;Resurface Slack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Rob Dickinson, Co-founder and CEO of &lt;a href=&quot;https://resurface.io/&quot;&gt;Resurface&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/robfromboulder&quot;&gt;@robfromboulder&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Martin Traverso, creator of Trino/Presto, and CTO at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;@mtraverso&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-resurface-and-the-resurface-connector&quot;&gt;Concept of the week: Resurface and the Resurface connector&lt;/h2&gt;

&lt;h3 id=&quot;what-is-resurface&quot;&gt;What is Resurface?&lt;/h3&gt;
&lt;p&gt;Resurface is an API system of record, which is a fancy way of saying that 
Resurface is a purpose-built database for API requests and responses. Like a 
weblog or access log, but on steroids because Resurface runs on Trino.&lt;/p&gt;

&lt;p&gt;Why do you need a system of record for your APIs? Because otherwise you’re 
guessing about how your APIs are used and attacked, and guessing doesn’t feel 
good. Resurface helps your DevOps and security teams instantly find API 
failures, slowdowns, and attacks – easily, responsibly, and at scale.&lt;/p&gt;

&lt;h3 id=&quot;how-resurface-differs-from-logs--metrics&quot;&gt;How Resurface differs from logs &amp;amp; metrics&lt;/h3&gt;
&lt;p&gt;You probably use system monitoring tools, which tell you about what’s happening 
on your systems. What code is running, what code is slow, and what error codes 
are returned. That’s all great — but it still leaves a big gap between the 
system-level events you can see, and what your API consumers actually see.&lt;/p&gt;

&lt;p&gt;Resurface helps you fill this gap with your own API system of record. Now your 
customers, your DevOps team, and your security team all have the same view of 
every transaction, because there is a record of the requests and responses.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb1.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;The other obvious way to compare Resurface against other tools is to look at the
data model. System monitoring gives you time-series metrics, or timestamped log
messages with a severity and detail string. Resurface gives you all the request
and response data fields, including headers and payloads, in a schema where all
of those fields are discrete and searchable. Plus it adds a bunch of helpful
virtual and computed columns.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb2.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;the-indexing-problem&quot;&gt;The indexing Problem&lt;/h3&gt;

&lt;p&gt;Resurface has a very descriptive data model, but there’s a problem here – how
to partition and index this data for efficient searching. Partitioning based on
time is the obvious starting point, but within a time range, what then? Index
everything?&lt;/p&gt;

&lt;p&gt;Most databases work best when a subset of the columns are constrained at once –
but in their case, they have strong reasons for wanting to use all columns at 
once. A system monitoring tool might give you a count of “500 codes” – but they
want to detect silent failures, like malformed JSON payloads or airline tickets 
selling for less than twenty dollars. That means looking at the URL, content 
type, other headers, and payloads, all at the same time.&lt;/p&gt;

&lt;p&gt;They also want to classify kinds of API consumers by their behaviors – are they
using or attacking your API? To classify those behaviors. Again, they look at 
the URL, content type, payloads. If they can query for the yellow region below,
they find lost revenue that they can recover.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb4.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;Now you might be thinking – maybe the best solution is to do all this 
processing when the API calls are captured, but then how would you identify a 
new zero-day failure or attack? The definition of “responses failed” and 
“threats” needs to be changeable without having to reprocess any data, which 
really favors query-time processing.&lt;/p&gt;

&lt;p&gt;The example below is pretty much as simple as this gets. I struggled to find 
one of these queries that actually fits in a reasonable amount of space.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb5.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;So how to build a database that does these kinds of queries in reasonable time?&lt;/p&gt;

&lt;h3 id=&quot;the-resurface-connector&quot;&gt;The Resurface connector&lt;/h3&gt;

&lt;p&gt;The first prototype actually used the Trino memory connector, which gave them 
the kind of query performance that they were looking for, but wasn’t shippable 
(for obvious reasons).&lt;/p&gt;

&lt;p&gt;Then they tried Redis as a replacement in-memory db, but the problem is that the 
queries are gonna pull all the data in Redis over the network for every query.
Not cool.&lt;/p&gt;

&lt;p&gt;Trino allows you to move the queries closer to the data, and so that’s what they
did. They took inspiration from the “local file” connector, where the connector
reads directly from the filesystem instead of over the network.&lt;/p&gt;

&lt;p&gt;Then the question was, what file format to use?  They tried JSON, CSV, Protocol
Buffers, and ultimately found the fastest and simplest approach was just to
write a simple binary file format that requires no real parsing. When these
files fit in memory, their connector can process SQL queries at 4GB/sec per core. 
The connector was easy to write because they’re just mapping between fields in 
the binary files and the columns exposed to Trino. They built the first version
of their connector in a weekend!&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb3.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h3 id=&quot;why-not-just-use-avro&quot;&gt;Why not just use Avro?&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Simple requirements – basic versioning, no secondary objects, limited data 
 types&lt;/li&gt;
  &lt;li&gt;Zero-allocation reader for fast linear scan – one memcpy per physical column&lt;/li&gt;
  &lt;li&gt;Connector can report null/not-null without type conversion&lt;/li&gt;
  &lt;li&gt;Connector defers type conversion until getXXX() method&lt;/li&gt;
  &lt;li&gt;getSlice() just wraps an existing buffer (zero allocation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of these optimizations were realized by working backwards from the Trino 
connector API to get the best linear scan performance imaginable.&lt;/p&gt;

&lt;h3 id=&quot;combining-api-calls-with-other-data&quot;&gt;Combining API calls with other data&lt;/h3&gt;

&lt;p&gt;Now they can deliver API call data out to all the different kinds of SQL clients 
out there, and they’re also able to combine API call data with data stored in 
other databases.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/17/resurface-tcb6.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This is really exciting because your Resurface database plays nicely with all 
your other databases that are bridged together with Trino. That means that 
actual API traffic can be brought into your customer data mart, or combined 
with data from any other systems, in real time!&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-4022-add-soundex-function&quot;&gt;PR of the week: PR 4022 Add Soundex function&lt;/h2&gt;

&lt;p&gt;A big shoutout to &lt;a href=&quot;https://github.com/tooptoop4&quot;&gt;tooptoop4&lt;/a&gt; for their contribution to this weeks
&lt;a href=&quot;https://github.com/trinodb/trino/pull/4022&quot;&gt;PR of the week&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This PR adds the &lt;a href=&quot;https://en.wikipedia.org/wiki/Soundex&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;soundex()&lt;/code&gt; function&lt;/a&gt;, 
which is a phonetic function. These functions show up in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause of a
query to find words that sound similar. There’s a few examples in the demo
below.&lt;/p&gt;

&lt;p&gt;Thanks for this awesome contribution!&lt;/p&gt;

&lt;h2 id=&quot;demo-using-the-soundex-function&quot;&gt;Demo: Using the soundex function&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
SELECT * 
FROM (
  VALUES 
  (1, &apos;Bri&apos;), 
  (2, &apos;Bree&apos;), 
  (3, &apos;Bryan&apos;), 
  (4, &apos;Brian&apos;), 
  (5, &apos;Briann&apos;), 
  (6, &apos;Brianna&apos;), 
  (7, &apos;Briannas&apos;),
  (8, &apos;Bri Jan&apos;),  
  (9, &apos;Bri Yan&apos;),  
  (10, &apos;Bob&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Brian&apos;);

# Results:
# |id |name   |
# |---|-------|
# |3  |Bryan  |
# |4  |Brian  |
# |5  |Briann |
# |6  |Brianna|
# |9  |Bri Yan|

SELECT * 
FROM (
  VALUES 
  (1, &apos;Man&apos;), 
  (2, &apos;Fred&apos;), 
  (3, &apos;Manfred&apos;), 
  (4, &apos;Can fed&apos;), 
  (5, &apos;Tan bed&apos;), 
  (6, &apos;Man Fred&apos;), 
  (7, &apos;Man dread&apos;), 
  (8, &apos;Bob&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Manfred&apos;);

# Results:
# |id |name    |
# |---|--------|
# |3  |Manfred |
# |6  |Man Fred|

SELECT * 
FROM (
  VALUES 
  (1, &apos;Martin&apos;), 
  (2, &apos;Mar teen&apos;), 
  (3, &apos;Mar tin&apos;), 
  (4, &apos;Marteen&apos;), 
  (5, &apos;Mart in&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Martin&apos;);

# Results:
# |id |name    |
# |---|--------|
# |1  |Martin  |
# |2  |Mar teen|
# |3  |Mar tin |
# |4  |Marteen |
# |5  |Mart in |

SELECT * 
FROM (
  VALUES 
  (1, &apos;Robert&apos;), 
  (2, &apos;Rob&apos;), 
  (3, &apos;Bob&apos;), 
  (4, &apos;Bobert&apos;), 
  (5, &apos;Bobby&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Rob&apos;);

# Results:
# |id |name|
# |---|----|
# |2  |Rob |


SELECT * 
FROM (
  VALUES 
  (1, &apos;Christ&apos;), 
  (2, &apos;Christeen&apos;), 
  (3, &apos;Christian&apos;), 
  (4, &apos;Christine&apos;), 
  (5, &apos;Chris&apos;), 
  (6, &apos;Kristine&apos;)
) names(id, name)
WHERE soundex(name) = soundex(&apos;Christine&apos;);

# Results:
# |id |name     |
# |---|---------|
# |1  |Christ   |
# |2  |Christeen|
# |3  |Christian|
# |4  |Christine|

# What the results actually return

SELECT name, soundex(name)
FROM (
  VALUES 
  (1, &apos;Christ&apos;), 
  (2, &apos;Christeen&apos;), 
  (3, &apos;Christian&apos;), 
  (4, &apos;Christine&apos;), 
  (5, &apos;Chris&apos;), 
  (6, &apos;Kristine&apos;), 
  (6, &apos;Christine&apos;)
) names(id, name);

# Results:
# |name     |_col1|
# |---------|-----|
# |Christ   |C623 |
# |Christeen|C623 |
# |Christian|C623 |
# |Christine|C623 |
# |Chris    |C620 |
# |Kristine |K623 |

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-how-to-export-query-results-into-a-file-eg-ctas-but-into-a-single-file&quot;&gt;Question of the week: How to export query results into a file (e.g. CTAS, but into a single file)?&lt;/h2&gt;

&lt;p&gt;This is possible using the &lt;a href=&quot;https://trino.io/docs/current/client/cli.html&quot;&gt;Trino CLI&lt;/a&gt;’s
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--execute&lt;/code&gt; option in conjunction with the redirect operator (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt;). You may also
use other options, such as, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--output-format&lt;/code&gt; to specify the format of the data
going to the file (e.g. if you want a csv, tsv, json, headers, etc…)&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Output format for batch mode [ALIGNED, VERTICAL, TSV, TSV_HEADER, CSV, 
CSV_HEADER, CSV_UNQUOTED, CSV_HEADER_UNQUOTED, JSON, NULL] (default: CSV)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here is an example of the command you would run using the cli executable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino --execute &quot;select * from tpch.sf1.customer limit 5&quot; \
--server http://localhost:8080 \
--output-format CSV_HEADER &amp;gt; customer.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you’re running Trino in Docker, here is an example command to run this in a
temporary Trino container.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run --rm -ti \
    --network=trino-hdfs3_trino-network \
    --name export-trino-data \
    trinodb/trino:latest \
    trino --execute &quot;select * from tpch.sf1.customer limit 5&quot; \
    --server http://trino-coordinator:8080 \
    --output-format CSV_HEADER &amp;gt; customer.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you have a very complex query that takes up multiple lines, or you don’t 
want to spend half of your day escaping quotations, you can put your SQL into a
file and reference the query using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-f&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--file&lt;/code&gt; options. The query 
above could be represented as this query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino --file query.sql \
--server http://localhost:8080 \
--output-format CSV_HEADER &amp;gt; customer.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This query along with the following &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.sql&lt;/code&gt; file produces an equivalent query:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;select * 
from tpch.sf1.customer 
limit 5;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, one last trick is to stage the data using the memory connector to stage
the data and finally export it. The &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino Definitive Guide&lt;/a&gt; 
has example for adding Iris data set into memory connector storage with CLI.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Apache Iceberg: A table format for data lakes with unforeseen use cases
    &lt;ul&gt;
      &lt;li&gt;Americas meetup&lt;/li&gt;
      &lt;li&gt;May 26th, 2021 @ 5:30p EDT&lt;/li&gt;
      &lt;li&gt;Link: &lt;a href=&quot;https://www.meetup.com/trino-americas/events/278103777/&quot;&gt;https://www.meetup.com/trino-americas/events/278103777/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Trino Summit
    &lt;ul&gt;
      &lt;li&gt;Hybrid event&lt;/li&gt;
      &lt;li&gt;September 15th, 2021&lt;/li&gt;
      &lt;li&gt;Link: &lt;a href=&quot;http://starburst.io/trinosummit2021&quot;&gt;http://starburst.io/trinosummit2021&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/blog/why-we-love-trino&quot;&gt;https://resurface.io/blog/why-we-love-trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/blog/what-is-api-observability&quot;&gt;https://resurface.io/blog/what-is-api-observability&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://resurface.io/blog/forking-open-source&quot;&gt;https://resurface.io/blog/forking-open-source&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Francisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun is diving deep to find anomalies!</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino on ice I: A gentle introduction To Iceberg</title>
      <link href="https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg.html" rel="alternate" type="text/html" title="Trino on ice I: A gentle introduction To Iceberg" />
      <published>2021-05-03T00:00:00+00:00</published>
      <updated>2021-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg</id>
      <content type="html" xml:base="https://trino.io/blog/2021/05/03/a-gentle-introduction-to-iceberg.html">&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;100%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/trino-iceberg.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Welcome to the Trino on ice series, covering the details around how the Iceberg
table format works with the Trino query engine. The examples build on each
previous post, so it’s recommended to read the posts sequentially and reference
them as needed later. Here are links to the posts in this series:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Back in the &lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Gentle introduction to the Hive connector&lt;/a&gt; 
blog post, I discussed a commonly misunderstood architecture and uses of the 
Trino Hive connector. In short, while some may think the name indicates Trino 
makes a call to a running Hive instance, the Hive connector does not use the 
Hive runtime to answer queries. Instead, the connector is named Hive connector 
because it relies on Hive conventions and implementation details from the Hadoop
ecosystem - the invisible Hive specification.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;I call this specification invisible because it doesn’t exist. It lives in the 
Hive code and the minds of those who developed it. This is makes it very 
difficult for anybody else who has to integrate with any distributed object 
storage that uses Hive, since they had to rely on reverse engineering and 
keeping up with the changes. The way you interact with Hive changes based on 
&lt;a href=&quot;https://medium.com/hashmapinc/four-steps-for-migrating-from-hive-2-x-to-3-x-e85a8363a18&quot;&gt;which version of Hive or Hadoop&lt;/a&gt; 
you are running. It also varies if you are in the cloud or over an object store.
Spark has even &lt;a href=&quot;https://spark.apache.org/docs/2.4.4/sql-migration-guide-hive-compatibility.html&quot;&gt;modified the Hive spec&lt;/a&gt;
in some ways to fit the Hive model to their use cases. It’s a big mess that data 
engineers have put up with for years. Yet despite the confusion and lack of 
organization due to Hive’s number of unwritten assumptions, the Hive connector 
is the most popular connector in use for Trino. Virtually every big data query 
engine uses the Hive model today in some form. As a result it is used by 
numerous companies to store and access data in their data lakes.&lt;/p&gt;

&lt;p&gt;So how did something with no specification become so ubiquitous in data lakes? 
Hive was first in the large object storage and big data world as part of Hadoop.
Hadoop became popular from good marketing for Hadoop to solve the problems of 
dealing with the increase in data with the Web 2.0 boom . Of course, Hive didn’t
get everything wrong. In fact, without Hive, and the fact that it is open 
source, there may not have been a unified specification at all. Despite the many
hours data engineers have spent bashing their heads against the wall with all 
the unintended consequences of Hive, it still served a very useful purpose.&lt;/p&gt;

&lt;p&gt;So why did I just rant about Hive for so long if I’m here to tell you about 
&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;? It’s impossible for a teenager 
growing up today to truly appreciate music streaming services without knowing 
what it was like to have an iPod with limited storage, or listening to a 
scratched burnt CD that skips, or flipping your tape or record to side-B. The 
same way anyone born before the turn of the millennium really appreciates 
streaming services, so you too will appreciate Iceberg once you’ve learned the 
intricacies of managing a data lake built on Hive and Hadoop.&lt;/p&gt;

&lt;p&gt;If you haven’t used Hive before, this blog post outlines just a few pain points 
that come from this data warehousing software to give you proper context. If you have already
lived through these headaches, this post acts as a guide to Iceberg from 
Hive. This post is the first in a series of blog posts discussing Apache Iceberg in 
great detail, through the lens of the Trino query engine user. If you’re not 
aware of Trino (formerly PrestoSQL) yet, it is the project that houses the 
founding Presto community after the 
&lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;founders of Presto left Facebook&lt;/a&gt;.
This and the next couple of posts discuss the Iceberg specification and all
the features Iceberg has to offer, many times in comparison with Hive.&lt;/p&gt;

&lt;p&gt;Before jumping into the comparisons, what is Iceberg exactly? The first thing to
understand is that Iceberg is not a file format, but a table format. It may not
be clear what this means by just stating that, but the function of a table 
format becomes clearer as the improvements Iceberg brings from the Hive table 
standard materialize. Iceberg doesn’t replace file formats like ORC and Parquet,
but is the layer between the query engine and the data. Iceberg maps and indexes
the files in order to provide a higher level abstraction that handles the 
relational table format for data lakes. You will understand more about table 
formats through examples in this series.&lt;/p&gt;

&lt;h2 id=&quot;hidden-partitions&quot;&gt;Hidden Partitions&lt;/h2&gt;

&lt;h3 id=&quot;hive-partitions&quot;&gt;Hive Partitions&lt;/h3&gt;

&lt;p&gt;Since most developers and users interact with the table format via the query 
language, a noticeable difference is the flexibility you have while creating a 
partitioned table. Assume you are trying to create a table for tracking events 
occurring in our system. You run both sets of SQL commands from Trino, just 
using the Hive and Iceberg connectors which are designated by the catalog name 
(i.e. the catalog name starting with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.&lt;/code&gt; uses the Hive connector, while the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg.&lt;/code&gt; table uses the Iceberg connector). To begin with, the first DDL 
statement attempts to create an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;events&lt;/code&gt; table in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;logging&lt;/code&gt; schema in the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive&lt;/code&gt; catalog, which is configured to use the Hive connector. Trino also 
creates a partition on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;events&lt;/code&gt; table using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field which is a
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIMESTAMP&lt;/code&gt; field.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE hive.logging.events (
  level VARCHAR,
  event_time TIMESTAMP,
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  format = &apos;ORC&apos;,
  partitioned_by = ARRAY[&apos;event_time&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Running this in Trino using the Hive connector produces the following error message.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Partition keys must be the last columns in the table and in the same order as the table properties: [event_time]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The Hive DDL is very dependent on ordering for columns and specifically 
partition columns. Partition fields must be located in the final column 
positions and in the order of partitioning in the DDL statement. The next 
statement attempts to create the same table, but now with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field 
moved to the last column position.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE hive.logging.events (
  level VARCHAR,
  message VARCHAR,
  call_stack ARRAY(VARCHAR),
  event_time TIMESTAMP
) WITH (
  format = &apos;ORC&apos;,
  partitioned_by = ARRAY[&apos;event_time&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This time, the DDL command works successfully, but you likely don’t want to
partition your data on the plain timestamp. This results in a separate file for 
each distinct timestamp value in your table (likely almost a file for each 
event). In Hive, there’s no way to indicate the time granularity at which you 
want to partition natively. The method to support this scenario with Hive is to
create a new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; column, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time_day&lt;/code&gt; that is dependent on the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; column to create the date partition value.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE hive.logging.events (
  level VARCHAR,
  event_time TIMESTAMP,
  message VARCHAR,
  call_stack ARRAY(VARCHAR),
  event_time_day VARCHAR
) WITH (
  format = &apos;ORC&apos;,
  partitioned_by = ARRAY[&apos;event_time_day&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This method wastes space by adding a new column to your table. Even worse,
it puts the burden of knowledge on the user to include this new column for 
writing data. It is then necessary to use that separate column for any read 
access to take advantage of the performance gains from the partitioning.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO hive.logging.events
VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 12:00:00.000001&apos;,
  &apos;Oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;], 
  &apos;2021-04-01&apos;
),
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-02 15:55:55.555555&apos;,
  &apos;Double oh noes&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;],
  &apos;2021-04-02&apos;
),
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-02 00:00:11.1122222&apos;,
  &apos;Maybeh oh noes?&apos;,
  ARRAY [&apos;Bad things could be happening??&apos;], 
  &apos;2021-04-02&apos;
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Notice that the last partition value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;2021-04-01&apos;&lt;/code&gt; has to match the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIMESTAMP&lt;/code&gt; 
date during insertion. There is no validation in Hive to make sure this is 
happening because it only requires a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; and knows to partition based on 
different values.&lt;/p&gt;

&lt;p&gt;On the other hand, If a user runs the following query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *
FROM hive.logging.events
WHERE event_time &amp;lt; timestamp &apos;2021-04-02&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;they get the correct results back, but have to scan all the data in the table:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;event_time&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;call_stack&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;2021-04-01 12:00:00&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;Exception in thread “main” java.lang.NullPointerException&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This happens because the user forgot to include the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time_day &amp;lt; &apos;2021-04-02&apos;&lt;/code&gt; predicate in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; 
clause. This eliminates all the benefits that led us to create the partition in
the first place and yet frequently this is missed by the users of these tables.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *
FROM hive.logging.events
WHERE event_time &amp;lt; timestamp &apos;2021-04-02&apos; 
AND event_time_day &amp;lt; &apos;2021-04-02&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;event_time&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;call_stack&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;2021-04-01 12:00:00&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;Exception in thread “main” java.lang.NullPointerException&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;iceberg-partitions&quot;&gt;Iceberg Partitions&lt;/h3&gt;

&lt;p&gt;The following DDL statement illustrates how these issues are handled in Iceberg
via the Trino Iceberg connector.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CREATE TABLE iceberg.logging.events (
  level VARCHAR,
  event_time TIMESTAMP(6),
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  partitioning = ARRAY[&apos;day(event_time)&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Taking note of a few things. First, notice the partition on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; 
column that is defined without having to move it to the last position. There 
is also no need to create a separate field to handle the daily partition on the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field. The &lt;em&gt;&lt;strong&gt;partition specification&lt;/strong&gt;&lt;/em&gt; is maintained internally
by Iceberg, and neither the user nor the reader of this table needs to know 
anything about the partition specification to take advantage of it. This concept
is called &lt;em&gt;&lt;strong&gt;hidden partitioning&lt;/strong&gt;&lt;/em&gt; , where only the table creator/maintainer 
has to know the &lt;em&gt;&lt;strong&gt;partitioning specification&lt;/strong&gt;&lt;/em&gt;. Here is what the insert 
statements look like now:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;INSERT INTO iceberg.logging.events
VALUES
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-01 12:00:00.000001&apos;,
  &apos;Oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
),
(
  &apos;ERROR&apos;,
  timestamp &apos;2021-04-02 15:55:55.555555&apos;,
  &apos;Double oh noes&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]),
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-02 00:00:11.1122222&apos;,
  &apos;Maybeh oh noes?&apos;,
  ARRAY [&apos;Bad things could be happening??&apos;]
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; dates are no longer needed. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field is 
internally converted to the proper partition value to partition each row. Also,
notice that the same query that ran in Hive returns the same results. The big 
difference is that it doesn’t require any extra clause to indicate to filter 
partition as well as filter the results.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *
FROM iceberg.logging.events
WHERE event_time &amp;lt; timestamp &apos;2021-04-02&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;level&lt;/th&gt;
      &lt;th&gt;event_time&lt;/th&gt;
      &lt;th&gt;message&lt;/th&gt;
      &lt;th&gt;call_stack&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ERROR&lt;/td&gt;
      &lt;td&gt;2021-04-01 12:00:00&lt;/td&gt;
      &lt;td&gt;Oh noes&lt;/td&gt;
      &lt;td&gt;Exception in thread “main” java.lang.NullPointerException&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;So hopefully that gives you a glimpse into what a table format and specification
are, and why Iceberg is such a wonderful improvement over the existing and 
outdated method of storing your data in your data lake. While this post covers
a lot of aspects of Iceberg’s capabilities, this is just the tip of the Iceberg…&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/blog/trino-on-ice/see_myself_out.gif&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;If you want to play around with Iceberg using Trino, check out the 
&lt;a href=&quot;https://trino.io/docs/current/connector/iceberg.html&quot;&gt;Trino Iceberg docs&lt;/a&gt;.
The next post covers how table evolution works in Iceberg, as well as, how 
Iceberg is an improved storage format for cloud storage.&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series: Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec Trino on ice IV: Deep dive into Iceberg internals Back in the Gentle introduction to the Hive connector blog post, I discussed a commonly misunderstood architecture and uses of the Trino Hive connector. In short, while some may think the name indicates Trino makes a call to a running Hive instance, the Hive connector does not use the Hive runtime to answer queries. Instead, the connector is named Hive connector because it relies on Hive conventions and implementation details from the Hadoop ecosystem - the invisible Hive specification.</summary>

      
      
    </entry>
  
    <entry>
      <title>16: Make data fluid with Apache Druid</title>
      <link href="https://trino.io/episodes/16.html" rel="alternate" type="text/html" title="16: Make data fluid with Apache Druid" />
      <published>2021-04-29T00:00:00+00:00</published>
      <updated>2021-04-29T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/16</id>
      <content type="html" xml:base="https://trino.io/episodes/16.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/16/trino-druid.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun the speedy druid!
&lt;/p&gt;

&lt;h2 id=&quot;druid-links&quot;&gt;Druid links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://druid.apache.org/&quot;&gt;Apache Druid&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://druid.apache.org/community/&quot;&gt;Apache Druid Community&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.druidforum.org/&quot;&gt;Druid Forum&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Samarth Jain, Software Engineer at Netflix 
 (&lt;a href=&quot;https://www.linkedin.com/in/samarthjain11/&quot;&gt;@samarthjain11&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Parth Brahmbhatt, Senior Software Engineer at Netflix 
 (&lt;a href=&quot;https://twitter.com/brahmbhattparth/&quot;&gt;@brahmbhattparth&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Rachel Pedreschi, VP Community and Developer Relations at 
 &lt;a href=&quot;https://imply.io/&quot;&gt;Imply&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/rachelpedreschi&quot;&gt;@rachelpedreschi&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-356&quot;&gt;Release 356&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-356.html&quot;&gt;https://trino.io/docs/current/release/release-356.html&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;General:
    &lt;ul&gt;
      &lt;li&gt;MATCH_RECOGNIZE clause support, used to detect patterns in a set of rows 
within a single query&lt;/li&gt;
      &lt;li&gt;soundex function&lt;/li&gt;
      &lt;li&gt;Property to limit planning time (and improved behavior about cancel during 
planning)&lt;/li&gt;
      &lt;li&gt;A bunch of performance improvements around pushdown (and start of docs for 
pushdowns)&lt;/li&gt;
      &lt;li&gt;Misc improvements around materialized views support&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;JDBC driver - OAuth2 token caching in memory&lt;/li&gt;
  &lt;li&gt;BigQuery - create and drop schema&lt;/li&gt;
  &lt;li&gt;Hive - Parquet, ORC and Azure ADL improvements&lt;/li&gt;
  &lt;li&gt;Iceberg - SHOW TABLES even when tables created elsewhere&lt;/li&gt;
  &lt;li&gt;Kafka - SSL support&lt;/li&gt;
  &lt;li&gt;Metadata caching improvements for a bunch of connectors&lt;/li&gt;
  &lt;li&gt;SPI: couple of changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-apache-druid-and-realtime-analytics&quot;&gt;Concept of the week: Apache Druid and realtime analytics&lt;/h2&gt;

&lt;p&gt;This week covers Apache Druid, a modern, real-time OLAP database. Joining us is 
the head of developer relations at Imply, the company that creates an enterprise
 version of Druid, to cover what Druid is, and the use cases it solves.&lt;/p&gt;

&lt;p&gt;Here are the slides that Rachel uses in the show:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/1fKHCGSRJwUjB7&quot; width=&quot;800&quot; height=&quot;650&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; 
margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; 
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3 id=&quot;druid-architecture&quot;&gt;Druid Architecture&lt;/h3&gt;

&lt;p&gt;Druid has several process types:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Coordinator&lt;/strong&gt; processes manage data availability on the cluster.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Overlord&lt;/strong&gt; processes control the assignment of data ingestion workloads.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Broker&lt;/strong&gt; processes handle queries from external clients.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Router&lt;/strong&gt; processes are optional processes that can route requests to Brokers, Coordinators, and Overlords.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Historical&lt;/strong&gt; processes store queryable data.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;MiddleManager&lt;/strong&gt; processes are responsible for ingesting data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/16/druid-architecture.png&quot; /&gt;&lt;br /&gt;
The Druid architecture.
&lt;/p&gt;

&lt;p&gt;Druid processes can be deployed any way you like, but for ease of deployment we 
suggest organizing them into three server types: Master, Query, and Data.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Master: Runs Coordinator and Overlord processes, manages data availability and ingestion.&lt;/li&gt;
  &lt;li&gt;Query: Runs Broker and optional Router processes, handles queries from external clients.&lt;/li&gt;
  &lt;li&gt;Data: Runs Historical and MiddleManager processes, executes ingestion workloads and stores all queryable data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://druid.apache.org/docs/latest/design/architecture.html&quot;&gt;https://druid.apache.org/docs/latest/design/architecture.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-3522-add-druid-connector&quot;&gt;PR of the week: PR 3522 Add Druid connector&lt;/h2&gt;

&lt;p&gt;Our guest, Samarth, is the author of this weeks 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/3522&quot;&gt;PR of the week&lt;/a&gt;. 
&lt;a href=&quot;https://twitter.com/puneetjaiswal&quot;&gt;Puneet Jaiswal&lt;/a&gt; is the first engineer that
started work to add a Druid connector. Later, Samarth picked up the torch and 
the Trino Druid connector became available in 
&lt;a href=&quot;/docs/current/release/release-337.html&quot;&gt;release 337&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;An honorable mention goes to our other guest, Parth, for doing some 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/3697&quot;&gt;preliminary work&lt;/a&gt; that enabled 
aggregation pushdown in the SPI. This enabled the use of the Druid connector to
actually scale well with the completion of PR 4313 (see future work below).&lt;/p&gt;

&lt;p&gt;A &lt;a href=&quot;https://github.com/trinodb/trino/pull/3881&quot;&gt;third honorable PR&lt;/a&gt;, 
that was completed by &lt;a href=&quot;https://twitter.com/findepi&quot;&gt;@findepi&lt;/a&gt;, was adding 
pushdown to the jdbc client which appeared in release 337 along with the Druid 
connector.&lt;/p&gt;

&lt;p&gt;It is incredible to see the amount of hands that various features and connectors
pass through to get to the final release.&lt;/p&gt;

&lt;h3 id=&quot;future-work&quot;&gt;Future work:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/4249&quot;&gt;SPI and optimizer rule for connectors that can support complete topN (PR 4249)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/4313&quot;&gt;Implement aggregate pushdown for Druid (PR 4313)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/4554&quot;&gt;Optimizer rule to support aggregate pushdown with grouping sets (PR 4554)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-using-the-druid-web-ui-to-create-an-ingestion-spec-querying-via-trino&quot;&gt;Demo: Using the Druid Web UI to create an ingestion spec querying via Trino&lt;/h2&gt;

&lt;p&gt;Let’s start up the Druid cluster along with the required Zookeeper and 
PostgreSQL instance. Clone this repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino-druid&lt;/code&gt;
directory.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/druid/trino-druid

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To do batch insert, navigate to the Druid Web UI once it has finished starting 
up at &lt;a href=&quot;http://localhost:8888&quot;&gt;http://localhost:8888&lt;/a&gt;. Once that is done, click the “Load data” button, 
choose, “Example data”, and follow the prompts to create the native batch 
ingestion spec. Once the spec is created, run the job and ingest the data.
More information can be found here: &lt;a href=&quot;https://druid.apache.org/docs/latest/tutorials/index.html&quot;&gt;https://druid.apache.org/docs/latest/tutorials/index.html&lt;/a&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/16/druid-console.png&quot; /&gt;&lt;br /&gt;
The Druid architecture.
&lt;/p&gt;

&lt;p&gt;Once Druid completes the task, open up a Trino connection and validate that the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;druid&lt;/code&gt; catalog exists.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker exec -it trino-druid_trino-coordinator_1 trino

trino&amp;gt; SHOW CATALOGS;

 Catalog 
---------
 druid   
 system  
 tpcds   
 tpch    
(4 rows)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now show the tables under the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;druid.druid&lt;/code&gt; schema.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SHOW TABLES IN druid.druid;
   Table   
-----------
 wikipedia 
(1 row)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW CREATE TABLE&lt;/code&gt;  to see the column definitions.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SHOW CREATE TABLE druid.druid.wikipedia;
             Create Table             
--------------------------------------
 CREATE TABLE druid.druid.wikipedia ( 
    __time timestamp(3) NOT NULL,     
    added bigint NOT NULL,            
    channel varchar,                  
    cityname varchar,                 
    comment varchar,                  
    commentlength bigint NOT NULL,    
    countryisocode varchar,           
    countryname varchar,              
    deleted bigint NOT NULL,          
    delta bigint NOT NULL,            
    deltabucket bigint NOT NULL,      
    diffurl varchar,                  
    flags varchar,                    
    isanonymous varchar,              
    isminor varchar,                  
    isnew varchar,                    
    isrobot varchar,                  
    isunpatrolled varchar,            
    metrocode varchar,                
    namespace varchar,                
    page varchar,                     
    regionisocode varchar,            
    regionname varchar,               
    user varchar                      
 )                                    
(1 row)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, query the first 5 rows of data showing the user and how much they added.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trino&amp;gt; SELECT user, added FROM druid.druid.wikipedia LIMIT 5;
      user       | added 
-----------------+-------
 Lsjbot          |    31 
 ワーナー成増    |   125 
 181.230.118.178 |     2 
 JasonAQuest     |     0 
 Kolega2357      |     0 
(5 rows)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-why-doesnt-the-druid-connector-use-the-native-json-over-http-calls&quot;&gt;Question of the week: Why doesn’t the Druid connector use the native json over http calls?&lt;/h2&gt;

&lt;p&gt;To answer this question I’m going to quote Samarth and Parth on this from 
&lt;a href=&quot;https://trinodb.slack.com/archives/CHD6386E4/p1589311502029000?thread_ts=1586167749.002500&amp;amp;cid=CHD6386E4&quot;&gt;this super long but enlightening thread&lt;/a&gt;
on the subject.&lt;/p&gt;

&lt;h3 id=&quot;samarths-take&quot;&gt;Samarth’s take:&lt;/h3&gt;

&lt;p&gt;Pro JDBC:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Going forward, Druid SQL is going to be the de-facto way of accessing Druid 
 data with native JSON queries being more of an advanced level use case. A 
 benefit of down the SQL route is that we can take advantage of all the changes 
 made in the Druid SQL optimizer land like using vectorized query processing 
 when possible, when to use a TopN vs group by query type, etc.  If we were to 
 hit historicals directly, which don’t support SQL querying, we potentially 
 won’t be taking advantages of such optimizations unless we keep
 porting/applying them to the trino-druid connector which may not always be 
 possible.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If we end up letting a Trino node act as a Druid broker (which is what 
 would happen I assume when you let a Trino node do the final merging), then, 
 you would need to allocate  similar kinds of resources (direct memory buffers, 
 etc.) to all the Trino worker nodes as a Druid broker which may not be ideal.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;This is not necessarily a limitation but adds complexity - with your proposed 
 implementation, the Trino cluster will need to maintain state about what Druid
 segments are hosted on what data nodes (middle managers and historicals). The 
 Druid broker already maintains that state and having to replicate and store all
 that state on the Trino coordinator will demand more resources out of it.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To your point on SCAN query overwhelming the broker - that shouldn’t be the 
 case as Druid scan query type streams results through broker instead of 
 materializing all of them in memory. See: &lt;a href=&quot;https://druid.apache.org/docs/latest/querying/scan-query.html&quot;&gt;https://druid.apache.org/docs/latest/querying/scan-query.html&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pro HTTP:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;One use case where directly hitting the historicals may help is when the 
 group by key space is large (like a group by on UUID like column). For a very 
 large data set, a Druid broker can get overwhelmed when performing the giant 
 merge. By hitting historicals directly, we can let historicals do first level 
 merge followed by multiple Trino workers doing the second level merge. I am 
 not sure if solving for this limited use case is worth going the http native
 query route, though. IMHO, Druid generally isn’t built for pulling lots of 
 data out of it. You can do it, but whether you want to push that work down to 
 Druid cluster or let Trino directly pull it down for you is debatable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I would advocate for going the Druid SQL route at least for the initial version 
of the connector. This would provide a solution for the majority of the use 
cases that Druid generally is used for (OLAP style queries over pre-aggregated 
data). We could in the next version of the connector, possibly focus on adding a
new mode of the connector which can make native JSON queries directly to the 
Druid historicals and middle managers instead of submitting SQL queries to the 
broker.&lt;/p&gt;

&lt;h3 id=&quot;parths-take&quot;&gt;Parth’s take:&lt;/h3&gt;

&lt;p&gt;Our general take is that Druid is designed as OLAP cube and so it is really fast
when it comes to aggregate queries over reasonable cardinality dimensions and 
will not work well for use cases that are treating it like a regular data 
warehouse and trying to do pure select scans with filter. The primary reason 
most of our users would look to Trino’s Druid connector is:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;To be able to join already aggregated data in Druid to some other datastore 
 in our warehouse.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To gain access through tooling that doesn’t have good support for Druid 
 inherently for dashboarding use cases (think Tableau).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Even if we wanted to support the use cases that Druid is not designed for in a 
more efficient manner by going thorough historicals directly, it has other 
implications. We are now talking about partial aggregation pushdown which is 
more complicated IMO than our current approach of complete pushdown. We could 
choose to take the approach that others have taken where we can incrementally 
add a mode to Druid connector to either use JDBC or go directly to historical, 
but I really don’t think it’s a good idea to block the current development in 
hopes of a more efficient future version specially when this is just 
implementation detail that we can switch anytime without breaking any user 
queries.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Trino Summit:
&lt;a href=&quot;http://starburst.io/trinosummit2021&quot;&gt;http://starburst.io/trinosummit2021&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06&quot;&gt;https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://imply.io/post/apache-druid-joins&quot;&gt;https://imply.io/post/apache-druid-joins&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/gumgum-tech/optimized-real-time-analytics-using-spark-streaming-and-apache-druid-d872a86ed99d&quot;&gt;https://medium.com/gumgum-tech/optimized-real-time-analytics-using-spark-streaming-and-apache-druid-d872a86ed99d&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.inovex.de/blog/a-close-look-at-the-workings-of-apache-druid/&quot;&gt;https://www.inovex.de/blog/a-close-look-at-the-workings-of-apache-druid/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&quot;&gt;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mf8Hb0coI6o&quot;&gt;https://www.youtube.com/watch?v=mf8Hb0coI6o&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&quot;&gt;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QNmSXMQ-gY4&quot;&gt;https://www.youtube.com/watch?v=QNmSXMQ-gY4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun the speedy druid!</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino: The Definitive Guide</title>
      <link href="https://trino.io/blog/2021/04/21/the-definitive-guide.html" rel="alternate" type="text/html" title="Trino: The Definitive Guide" />
      <published>2021-04-21T00:00:00+00:00</published>
      <updated>2021-04-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/04/21/the-definitive-guide</id>
      <content type="html" xml:base="https://trino.io/blog/2021/04/21/the-definitive-guide.html">&lt;p&gt;Just over a year ago we &lt;a href=&quot;https://trino.io/blog/2020/04/11/the-definitive-guide.html&quot;&gt;announced the availability of the first book about
Trino&lt;/a&gt; - our
definitive guide. Back then the project was still called Presto, and the rename
with the end of 2020 was a good reason for us to give the book a refresh.&lt;/p&gt;

&lt;p&gt;Today, we are happy to announce that a new edition now titled &lt;strong&gt;Trino: The
Definitive Guide&lt;/strong&gt; is available.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;get-a-free-copy-of-trino-the-definitive-guide-from-starburst-now&quot;&gt;&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Get a free copy of Trino: The Definitive Guide&lt;/a&gt; from &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt; now!&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/ttdg-cover.png&quot; align=&quot;right&quot; style=&quot;float: right; margin-left: 20px; margin-bottom: 20px; width: 100%; max-width: 350px;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The new edition of the book from O’Reilly is available in digital formats
as well as physical copies. You can find more information about the book on &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;our
permanent page about it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The book is now updated to Trino release 354 for all filenames, installation
methods, command, names and properties. We addressed all problems found by our
readers and reported to us as well.&lt;/p&gt;

&lt;p&gt;Our major supporter, &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;, allowed us to work
on the book and bring it across the finish line again. You can get a
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;free digital copy from Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? Go get a copy, check out the &lt;a href=&quot;https://github.com/trinodb/trino-the-definitive-guide&quot;&gt;updated example code
repository&lt;/a&gt;,
provide feedback and contact us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Looking forward to it all!&lt;/p&gt;

&lt;p&gt;Matt, Manfred and Martin&lt;/p&gt;</content>

      
        <author>
          <name>Matt Fuller, Manfred Moser and Martin Traverso</name>
        </author>
      

      <summary>Just over a year ago we announced the availability of the first book about Trino - our definitive guide. Back then the project was still called Presto, and the rename with the end of 2020 was a good reason for us to give the book a refresh. Today, we are happy to announce that a new edition now titled Trino: The Definitive Guide is available. Get a free copy of Trino: The Definitive Guide from Starburst now!</summary>

      
      
    </entry>
  
    <entry>
      <title>15: Iceberg right ahead!</title>
      <link href="https://trino.io/episodes/15.html" rel="alternate" type="text/html" title="15: Iceberg right ahead!" />
      <published>2021-04-15T00:00:00+00:00</published>
      <updated>2021-04-15T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/15</id>
      <content type="html" xml:base="https://trino.io/episodes/15.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/15/trino-iceberg.png&quot; /&gt;&lt;br /&gt;
Looks like Commander Bun Bun is safe on this Iceberg&lt;br /&gt;
&lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;iceberg-links&quot;&gt;Iceberg links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/community/&quot;&gt;Apache Iceberg Community&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Ryan Blue, creator of Iceberg, and Senior Software Engineer at 
 Netflix (&lt;a href=&quot;https://github.com/rdblue&quot;&gt;@rdblue&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;David Phillips, creator of Trino/Presto, and CTO at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/electrum32&quot;&gt;@electrum32&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-355&quot;&gt;Release 355&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-355.html&quot;&gt;https://trino.io/docs/current/release/release-355.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin’s list:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Multiple password authentication plugins&lt;/li&gt;
  &lt;li&gt;Column and table lineage reporting in query events&lt;/li&gt;
  &lt;li&gt;Improved planning performance for queries against Phoenix or SQL Server&lt;/li&gt;
  &lt;li&gt;Improved performance for ORDER BY … LIMIT queries against Phoenix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Security overview and TLS pages and authentication types&lt;/li&gt;
  &lt;li&gt;Reiterate multiple authentication providers (ldap1, ldap2, password)&lt;/li&gt;
  &lt;li&gt;Improved parallelism for table bucket count is small compared to number of nodes.&lt;/li&gt;
  &lt;li&gt;Include information about Spill to disk in EXPLAIN ANALYZE&lt;/li&gt;
  &lt;li&gt;Unixtime function changes&lt;/li&gt;
  &lt;li&gt;Hive view support improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-apache-iceberg-and-the-iceberg-spec&quot;&gt;Concept of the week: Apache Iceberg and the Iceberg spec&lt;/h2&gt;

&lt;h3 id=&quot;interview-with-ryan-blue&quot;&gt;Interview with Ryan Blue&lt;/h3&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/14.html&quot;&gt;the previous episode&lt;/a&gt;, we covered the 
differences between the Iceberg table format, and the Hive table format from a 
technical standpoint in the context of Trino. We highly recommend watching it
before this episode. In this episode we ask Ryan about the origins of Apache 
Iceberg and why he started the project. We cover some details of the 
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;Iceberg specification&lt;/a&gt; which is a nice change
from the ad-hoc specification that people adhere to when using Hive tables. Then
Ryan dives into several amazing use cases how Netflix and others use Iceberg.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-7233-fix-queries-on-tables-without-snapshot-id&quot;&gt;PR of the week: PR 7233 Fix queries on tables without snapshot id&lt;/h2&gt;

&lt;p&gt;This week’s &lt;a href=&quot;https://github.com/trinodb/trino/pull/7233&quot;&gt;PR of the week&lt;/a&gt; was 
submitted by one of the Trino maintainers,
&lt;a href=&quot;https://twitter.com/desai_pratham&quot;&gt;Pratham Desai&lt;/a&gt;. Pratham is a Software 
Engineer at LinkedIn who commits a lot of time in the Trino community helping
out on the slack channel, contributing code, and doing PR reviews. Thank you for
all you do Pratham!&lt;/p&gt;

&lt;p&gt;Had Brian known about this PR, he wouldn’t have had the issue he did with 
reading the empty snapshot created with the Iceberg Java API and would have been 
able to read and insert into the table just fine. If you come across this issue,
we introduced this feature in 
&lt;a href=&quot;/docs/current/release/release-344.html&quot;&gt;release 344&lt;/a&gt;!&lt;/p&gt;

&lt;h3 id=&quot;another-future-development-for-the-trino-iceberg-connector&quot;&gt;Another future development for the Trino Iceberg connector&lt;/h3&gt;

&lt;p&gt;Along with the future developments we discussed in the previous episode, another
core Iceberg functionality that we want to add in Trino is support for
&lt;a href=&quot;https://github.com/trinodb/trino/issues/7580&quot;&gt;partition migration&lt;/a&gt;. We also 
discussed future support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; capabilities for the Iceberg 
connector.&lt;/p&gt;

&lt;h2 id=&quot;demo-creating-tables-with-iceberg-and-reading-the-data-in-trino&quot;&gt;Demo: Creating tables with Iceberg and reading the data in Trino&lt;/h2&gt;

&lt;p&gt;For this weeks’ demo, we continue to use the Iceberg Java API to create a table.
You also have the option to use Trino, Spark, or other to ingest and query the
data, but I wanted to use vanilla Iceberg API’s to experience the API and
hopefully solidify my learning of Iceberg concepts in the process. Make sure you
follow the instructions in the repository if you don’t have Docker or Java
installed.&lt;/p&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone this 
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In your favorite IDE, open the files under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/iceberg-java&lt;/code&gt; into your
project and run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IcebergMain&lt;/code&gt; class.&lt;/p&gt;

&lt;p&gt;This class creates a logging table if it doesn’t exist along with the logging 
schema. Once you run this code, you can check to see that the table in Trino 
exists in the metastore under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TABLE_PARAMS&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now we transition from the Java API to running queries over Iceberg using Trino.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/**
 * This is the equivalent of running IcebergMain in the iceberg-java project.
 * Go ahead and inspect the java code you can use to interact with Iceberg
 * tables and metadata.
 */
CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = &apos;ORC&apos;,
   partitioning = ARRAY[&apos;hour(event_time)&apos;,&apos;level&apos;]
)

/**
 * Read From Trino
 */

SELECT * FROM iceberg.logging.logs;

/**
 * Write data from Trino and check data and snapshots
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Oh noes&apos;,
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
);

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Write more data from Trino and check data and snapshots
 */
INSERT INTO iceberg.logging.logs 
VALUES 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
), 
(
  &apos;ERROR&apos;, 
  timestamp &apos;2021-04-01 15:55:23.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Double oh noes&apos;, 
  ARRAY [&apos;Exception in thread &quot;main&quot; java.lang.NullPointerException&apos;]
), 
(
  &apos;WARN&apos;, 
  timestamp &apos;2021-04-01 15:55:23.383345&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;Maybeh oh noes?&apos;, 
  ARRAY [&apos;bad things could be happening&apos;]
);

 
SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Read data from an old snapshot (Time travel)
 */

SELECT * FROM iceberg.logging.&quot;logs@2806470637437034115&quot;;

/**
 * Add new column, notice there is no snapshots of the metadata
 */

ALTER TABLE iceberg.logging.logs ADD COLUMN severity INTEGER;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Insert new data with new column
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  &apos;INFO&apos;, 
  timestamp &apos;2021-04-01 19:59:59.999999&apos; AT TIME ZONE &apos;America/Los_Angeles&apos;, 
  &apos;es muy bueno&apos;, 
  ARRAY [&apos;It is all normal&apos;], 
  1
);

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Rename column and drop column
 */

ALTER TABLE iceberg.logging.logs RENAME COLUMN severity TO priority;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

ALTER TABLE iceberg.logging.logs DROP COLUMN priority;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

/**
 * Travel back to previous snapshots
 */

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

SELECT * FROM iceberg.logging.&quot;logs@&amp;lt;insert-earlier-snapshot&amp;gt;&quot;;

CALL system.rollback_to_snapshot(&apos;logging&apos;, &apos;logs&apos;, &amp;lt;insert-earlier-snapshot&amp;gt;)

/**
 * Back to the future snapshot
 */

SELECT * FROM iceberg.logging.&quot;logs$snapshots&quot;;

SELECT * FROM iceberg.logging.&quot;logs@&amp;lt;insert-latest-snapshot&amp;gt;&quot;;

CALL system.rollback_to_snapshot(&apos;logging&apos;, &apos;logs&apos;, &amp;lt;insert-latest-snapshot&amp;gt;)

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging.&quot;logs$partitions&quot;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;question-of-the-week-what-do-i-do-to-restart-the-test-pipeline-if-it-fails-on-me&quot;&gt;Question of the week: What do I do to restart the test pipeline if it fails on me?&lt;/h2&gt;

&lt;p&gt;When developing with Trino, there is an automated build that acts as 
verification of any PR. It is powered by a GitHub actions definition and runs 
all the tests in Trino when developers add new code. Sometimes test unrelated to
the changes in your PR fail, which makes your PR show that it shouldn’t be 
merged due to a failure, but is actually unrelated.&lt;/p&gt;

&lt;p&gt;Developers are aware of these flaky tests, and need a mechanism to resubmit 
their PR and rerun the tests. There is unfortunately no way to enable users to 
rerun tests through GitHub without write permissions to the Trino repository, so
you have to do a dummy commit.&lt;/p&gt;

&lt;p&gt;This can easily be done using this one line hack 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git commit --amend --no-edit &amp;amp;&amp;amp; git push -f&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The good news is, we have gone through some extensive lengths to identify flaky
tests in the last year. These test failures are much rarer now, and we are 
constantly improving the build stability as an ongoing effort.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;h3 id=&quot;wtd-portland&quot;&gt;WTD Portland&lt;/h3&gt;

&lt;p&gt;Interested in supporting the Trino project, but don’t know where to start? A 
good place to start with a little less barrier to entry, is adding to the 
documentation. We will be supporting the 
&lt;a href=&quot;https://trino.io/blog/2021/04/14/wtd-writing-day.html&quot;&gt;writing day&lt;/a&gt; at the
Write the Docs (WTD) Portland conference this April! Join us to learn how to get involved!&lt;/p&gt;

&lt;h3 id=&quot;virtual-trino-meetups&quot;&gt;Virtual Trino meetups&lt;/h3&gt;

&lt;p&gt;Come join us for the inaugural Virtual Trino meetup on April 21st in the virtual
meetup group in your region! See &lt;a href=&quot;./community.html&quot;&gt;the community page&lt;/a&gt; for more
details.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/277246268/&quot;&gt;Trino Americas meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/events/277246173/&quot;&gt;Trino EMEA meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/events/277246078/&quot;&gt;Trino APAC meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At these meetups, the four Trino/Presto founders will be updating everyone on 
the state of Trino. We’ll discuss the rebrand, talk about the recent features, 
and discuss the trajectory of the project. Then we will host a hangout and an
ask me anything (AMA) session. Hope to see you all there!&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&quot;&gt;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&quot;&gt;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&quot;&gt;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&quot;&gt;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&quot;&gt;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&quot;&gt;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mf8Hb0coI6o&quot;&gt;https://www.youtube.com/watch?v=mf8Hb0coI6o&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&quot;&gt;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QNmSXMQ-gY4&quot;&gt;https://www.youtube.com/watch?v=QNmSXMQ-gY4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;Virtual Trino Americas&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;Virtual Trino EMEA&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;Virtual Trino APAC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;Trino Boston&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;Trino NYC&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;Trino San Fransisco&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;Trino Los Angeles&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;Trino Chicago&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Query Tuning Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Security Training&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Performance and Tuning Training&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Looks like Commander Bun Bun is safe on this Iceberg https://joshdata.me/iceberger.html</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino at Writing Day</title>
      <link href="https://trino.io/blog/2021/04/14/wtd-writing-day.html" rel="alternate" type="text/html" title="Trino at Writing Day" />
      <published>2021-04-14T00:00:00+00:00</published>
      <updated>2021-04-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/04/14/wtd-writing-day</id>
      <content type="html" xml:base="https://trino.io/blog/2021/04/14/wtd-writing-day.html">&lt;p&gt;First time Trino blogger, long time lurker on the Trino slack. My name is 
&lt;a href=&quot;https://twitter.com/ZelWms&quot;&gt;Rose Williams&lt;/a&gt; and I’m an open source docs enthusiast! 
I’ve had the pleasure of contributing to this community for the past few months. 
Recently I’ve been working with &lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;Brian Olsen&lt;/a&gt;, our fearless 
developer advocate, as well as some of our other Trino doc contributors, to get 
Trino ready for the Write the Docs &lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/writing-day/&quot;&gt;Writing Day&lt;/a&gt; open source event!&lt;/p&gt;

&lt;p&gt;If you’re not familiar with &lt;a href=&quot;https://www.writethedocs.org&quot;&gt;Write the Docs&lt;/a&gt;, it’s
a global community of people who care about documentation.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“We consider everyone who cares about communication, documentation, and their
users to be a member of our community. This can be programmers, tech writers,
developer advocates, customer support, marketers, and anyone else who wants
people to have great experiences with software.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/writing-day/&quot;&gt;Writing Day&lt;/a&gt; is
the first day of their upcoming virtual documentation conference, &lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/&quot;&gt;Write the
Docs Portland (PST)&lt;/a&gt; April
25-27, 2021. The goal of Writing Day is to get a bunch of interesting people in
a room together and introduce them to cool open source projects that they can
onboard and contribute to.&lt;/p&gt;

&lt;p&gt;Writing Day is open to all conference attendees and several Trino enthusiasts are
attending as mentors. Leading up to the conference, we’re focused on identifying
docs issues that are ideal for first time contributors. If you’re a regular
Trino contributor, you might notice that we’re going through and tagging items
as “good first issue” and “docs” - we’ll be using those tags to create an 
&lt;a href=&quot;https://github.com/trinodb/trino/issues?q=is%3Aopen+label%3Adocs+label%3A%22good+first+issue%22&quot;&gt;issues filter&lt;/a&gt; 
for the event. We’re also doing some work on the Trino docs readme to
help folks onboard faster.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/tickets/&quot;&gt;Snag a ticket&lt;/a&gt; if
you’re interested in participating, we hope to see you there! Our goal is to
continue curating good first issues for future writers and developers.&lt;/p&gt;

&lt;p&gt;Join the new &lt;a href=&quot;https://trinodb.slack.com/messages/C01TEP0HJTH&quot;&gt;#documentation channel&lt;/a&gt; 
on the &lt;a href=&quot;./slack.html&quot;&gt;Trino slack&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/trinodb/trino/stargazers&quot;&gt;favorite the Trino project&lt;/a&gt; on GitHub.&lt;/p&gt;

&lt;p&gt;If you’re interested in learning more about &lt;a href=&quot;https://www.writethedocs.org&quot;&gt;Write the Docs&lt;/a&gt; 
or &lt;a href=&quot;https://www.writethedocs.org/conf/portland/2021/writing-day/&quot;&gt;Writing Day&lt;/a&gt;, 
feel free to reach out to me (&lt;a href=&quot;https://twitter.com/ZelWms&quot;&gt;Rose Williams&lt;/a&gt;), 
&lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;Brian Olsen&lt;/a&gt;, or 
&lt;a href=&quot;https://twitter.com/mosabua&quot;&gt;Manfred Moser&lt;/a&gt; on twitter or the &lt;a href=&quot;./slack.html&quot;&gt;Trino slack&lt;/a&gt;. You 
can also check out the Write the Docs &lt;a href=&quot;https://www.writethedocs.org/slack/&quot;&gt;slack community&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you have an open source project that you’re interested in bringing to Writing
Day, chat with me, &lt;a href=&quot;https://twitter.com/ZelWms&quot;&gt;Rose Williams&lt;/a&gt;, on twitter or 
on the Trino or Write the Doc slack communities.&lt;/p&gt;</content>

      
        <author>
          <name>Rose Williams (she/her)</name>
        </author>
      

      <summary>First time Trino blogger, long time lurker on the Trino slack. My name is Rose Williams and I’m an open source docs enthusiast! I’ve had the pleasure of contributing to this community for the past few months. Recently I’ve been working with Brian Olsen, our fearless developer advocate, as well as some of our other Trino doc contributors, to get Trino ready for the Write the Docs Writing Day open source event!</summary>

      
      
    </entry>
  
    <entry>
      <title>14: Iceberg: March of the Trinos</title>
      <link href="https://trino.io/episodes/14.html" rel="alternate" type="text/html" title="14: Iceberg: March of the Trinos" />
      <published>2021-04-01T00:00:00+00:00</published>
      <updated>2021-04-01T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/14</id>
      <content type="html" xml:base="https://trino.io/episodes/14.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/14/trino-penguin.png&quot; /&gt;&lt;br /&gt;
March of the Trinos! Be careful Commander Bun Bun! That Iceberg doesn&apos;t look stable!&lt;br /&gt;
&lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;
&lt;/p&gt;

&lt;h2 id=&quot;iceberg-links&quot;&gt;Iceberg links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;Apache Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://iceberg.apache.org/community/&quot;&gt;Apache Iceberg Community&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;David Phillips, creator of Trino/Presto, and CTO at 
 &lt;a href=&quot;https://starburst.io&quot;&gt;Starburst&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/electrum32&quot;&gt;@electrum32&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-354&quot;&gt;Release 354&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-354.html&quot;&gt;https://trino.io/docs/current/release/release-354.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin’s list:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for OAuth 2.0 in CLI&lt;/li&gt;
  &lt;li&gt;Support for MemSQL 3.2&lt;/li&gt;
  &lt;li&gt;Pushdown of ORDER BY … LIMIT for MemSQL, MySQL and SQL Server connectors&lt;/li&gt;
  &lt;li&gt;Support for time(p) in SQL Server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;LEFT, RIGHT and FULL JOIN&lt;/li&gt;
  &lt;li&gt;Preferred write partitioning on by default (needs statistics)&lt;/li&gt;
  &lt;li&gt;Small but useful fix on Elasticsearch (single value array)&lt;/li&gt;
  &lt;li&gt;Hive connector&lt;/li&gt;
  &lt;li&gt;Fix ACID table DELETE and UPDATE - critical fix is in! Boom!&lt;/li&gt;
  &lt;li&gt;Avro format improvement&lt;/li&gt;
  &lt;li&gt;CSV and Glue metadata improvement&lt;/li&gt;
  &lt;li&gt;Iceberg - date and timestamp improvement&lt;/li&gt;
  &lt;li&gt;CREATE SCHEMA fixes  in MySQL, PostgreSQL, Redshift and SQL Server&lt;/li&gt;
  &lt;li&gt;Bunch of other fixes in those connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-apache-iceberg-and-the-table-format&quot;&gt;Concept of the week: Apache Iceberg and the table format&lt;/h2&gt;

&lt;h3 id=&quot;the-hive-table-format&quot;&gt;The Hive table format&lt;/h3&gt;

&lt;p&gt;For the last decade or so, big data professionals’ only option to query their 
data was to, in some way shape or form, use the Hive model. The Hive model is
very simple, but it enabled running queries over files in a distributed file
system.&lt;/p&gt;

&lt;p&gt;To accomplish this, Hive uses a metastore service which stores and manages
metadata. For Hive and Trino, this metadata acts as a pointer to the files
containing the data, contains the file format, and has the column structure and
types. This enabled Hive to query the correct files and data within those files
for a SQL query. For more information on Hive’s architecture, read the
&lt;a href=&quot;/blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;Gentle Introduction to Hive&lt;/a&gt;
blog. After the initial model gained adoption, Hive added other features such as
partitioning. It uses the directory structures of the filesystems to split the 
files of data partitioned on a special column into different directories. We 
talk about this in more depth &lt;a href=&quot;/episodes/5.html&quot;&gt;a few episodes back&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Hive model solved some initial issues facing engineers in big data, but 
there were quite a few issues with this model. It is very rigid and not able
to adapt to your needs as requirements change. For example, if you started
partitioning your data splitting by date and segmenting by month, that table is
stuck with that partitioning forever. The only way to update it is to create a
new table with your new partition values, and migrate all of your data from
the old table to the new table. With the common data sizes such a migration is 
often a long process, sometimes even impossible. Another issue stems from the 
separation of data stored in the metastore and data stored in the file system. 
The source of many issues in Hive is caused by the Hive metastore getting out of
sync. A third but not final issue, is that running operations against the 
metastore is a timely process when running operations like list files on more 
modern object storage.&lt;/p&gt;

&lt;p&gt;As all these problems amassed over the years, clearly something needed to be 
done. In the last few years, a few candidate table formats have come to the 
forefront of data engineering trends. Examples are, Apache Iceberg, Apache Hudi,
and the proprietary Databricks’ Deltalake. The goal of these systems is to 
modernize the old Hive data structure. To Trino, Iceberg is particularly 
promising due to the list of promising features like schema versioning support 
and hidden partitioning that made it particularly attractive. Let’s talk about 
some of these features in detail.&lt;/p&gt;

&lt;h3 id=&quot;the-iceberg-table-format&quot;&gt;The Iceberg table format&lt;/h3&gt;

&lt;p&gt;Iceberg, is a new table format developed at Netflix that aims to replace older 
table formats like Hive to add better flexibility as the schema evolves, atomic
operations, speed, and just dependability. To be clear, it’s not a new file
format, as it still uses ORC, Parquet, and Avro, but a table format. Netflix 
donated Iceberg to the Apache Software Foundation and it is now a top level
project!&lt;/p&gt;

&lt;p&gt;Iceberg handles both the data on disk just like Hive, but instead it stores the
metadata in manifest files on disk along with the data itself. These &lt;em&gt;manifest 
files&lt;/em&gt; are AVRO files that contain table metadata that lists a subset of data 
files. &lt;em&gt;Manifest lists&lt;/em&gt; are a special type of manifest file that point to other 
manifest files. &lt;em&gt;Snapshots&lt;/em&gt; contain a manifest list that points to all the 
manifest files that belong to the snapshot. Another huge difference from Hive is
that the manifest files keep track of table data at the file level as opposed to
directory level that Hive uses. By doing so, Iceberg avoids having to list all 
files in a directory, which becomes a very common and expensive operation.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
  &lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/14/iceberg-metadata.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;By tracking files this way, we not only get better performance from object
storage, it also enables serializable isolation. This addresses the lack of
consistency between the metadata and file state experienced in Hive.&lt;/p&gt;

&lt;p&gt;One of the greater advantages to Iceberg over Hive is the in-place table
evolution. This means that you can add, drop, or rename a column, as well as, 
reorder and update a column without any expensive refactoring of tables or 
moving data around and there is no adverse effects on your data or metadata.&lt;/p&gt;

&lt;p&gt;Partition evolution and hidden partitions are particularly invaluable. In 
Iceberg, the &lt;em&gt;partition spec&lt;/em&gt; is a description of how to partition data in a 
table consisting of a list of source columns and transforms. Once the spec is 
created, it generates a partition tuple that is applied uniformly to the files 
created with that spec. Unlike Hive, that requires you to modify and send a 
special column that acts as the partition value, Iceberg stores partition values
unmodified. Here’s an example partition spec generated in the Java API.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;PartitionSpec spec = PartitionSpec.builderFor(schema)
        .hour(&quot;event_time&quot;)
        .identity(&quot;level&quot;)
        .build();
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This example creates a separate hourly partition on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;event_time&lt;/code&gt; field and 
use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;identity()&lt;/code&gt; function on level to generate another level of partitioning
on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;level&lt;/code&gt; field. If at a later time, you decide you are getting too many 
small files because your partitions are too small, then you  can update the 
partition spec and Iceberg starts writing new files by the updated spec. 
Again, this is all without creating a new table and moving data around and all 
the queries return correctly. This kind of evolution is a problem with Hive.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
  &lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/14/partition-spec-evolution.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;If all that isn’t enough, you can also do time travel and version rollback with
Iceberg. As we mentioned above, Iceberg keeps track of various snapshots of 
your data in time through manifest files. As long as you keep those older 
snapshots around, the files associated with those snapshots stick around as 
well. This allows you to move around to previous views of the data. This is 
useful for testing, recovery, and many other purposes. Just as
you can time travel, you can make the time travel permanent by rolling back
any unintended changes and deleting the undesired snapshot.&lt;/p&gt;

&lt;p&gt;Iceberg is also able to offer fast scan planning by filtering out the metadata
files that are irrelevant to the scan, and using the partition spec to only find
files containing responses to the data. Iceberg filters the metadata using
partition value ranges and seeing if that is contained within the files of
the metadata. Then while processing the list of manifest files, Iceberg will
filter files by query predicates included in the partition, then apply column
stats to help prune out files that don’t match. Iceberg also uses multiple
concurrent writers to speed things up as a final measure.&lt;/p&gt;

&lt;p&gt;Saving the best for last; Iceberg is a community standard and has 
&lt;a href=&quot;https://iceberg.apache.org/spec/&quot;&gt;a full written specification&lt;/a&gt; which is a nice
change from Hive which is an ad-hoc specification that people adhere to in some 
ways. There have been many issues over the years due to the variance of how the
unwritten specification gets interpreted. This not only enables people to 
understand how to use it, but documents how others can implement the same
features with an entirely different systems. Let’s wait to do a deep dive on the
spec for the next episode when we bring on Ryan Blue, creator of Iceberg, to dig
into these details.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1067-add-iceberg-connector&quot;&gt;PR of the week: PR 1067 Add Iceberg connector&lt;/h2&gt;

&lt;p&gt;A huge shoutout goes to &lt;a href=&quot;https://github.com/Parth-Brahmbhatt&quot;&gt;Parth Brahmbhatt&lt;/a&gt;,
a Senior Software Engineer at Netflix who created this weeks’ 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1067&quot;&gt;PR of the week&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/docs/current/release/release-318.html&quot;&gt;Release 318&lt;/a&gt;, 
introduced this code that supported querying tables from Apache Iceberg in
Trino. While the code existed, the Iceberg connector code wasn’t officially
released or documented until a little over a year later in 
&lt;a href=&quot;/docs/current/release/release-341.html&quot;&gt;release 341&lt;/a&gt; once the connector reached
maturity.&lt;/p&gt;

&lt;h3 id=&quot;future-development-for-the-trino-iceberg-connector&quot;&gt;Future development for the Trino Iceberg connector&lt;/h3&gt;

&lt;p&gt;Still, some strange artifacts that we’re still facing today in the connector.
For example, if you create a table with the Iceberg Java API, &lt;a href=&quot;https://github.com/apache/iceberg/blob/996ed979f396f2c7cc12ca824a3fe758f2c486ce/hive/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L222&quot;&gt;it creates
Iceberg tables with &amp;lt;table_type, ICEBERG&amp;gt;&lt;/a&gt;
but Trino &lt;a href=&quot;https://github.com/prestosql/presto/blob/master/presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/HiveTableOperations.java#L190&quot;&gt;creates and reads with &amp;lt;table_type, iceberg&amp;gt;&lt;/a&gt;.
See &lt;a href=&quot;https://github.com/trinodb/trino/issues/1592&quot;&gt;Issue 1592&lt;/a&gt; for status and 
details. In general, we can track some of the broader changes that are being 
made to &lt;a href=&quot;https://github.com/trinodb/trino/issues/1324&quot;&gt;the Iceberg connector here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;demo-creating-tables-with-iceberg-and-reading-the-data-in-trino&quot;&gt;Demo: Creating tables with Iceberg and reading the data in Trino&lt;/h2&gt;

&lt;p&gt;For this weeks’ demo, I wanted to play around with the Iceberg Java API directly.
You also have the option to use Trino, Spark, or other to ingest and query the
data, but I wanted to use vanilla Iceberg API’s to experience the API and
hopefully solidify my learning of Iceberg concepts in the process. Make sure you
follow the instructions in the repository if you don’t have Docker or Java
installed.&lt;/p&gt;

&lt;p&gt;Let’s start up a local Trino coordinator and Hive metastore. Clone this 
repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/trino-iceberg-minio&lt;/code&gt; directory. Then
start up the containers using Docker Compose.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In your favorite IDE, open the files under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iceberg/iceberg-java&lt;/code&gt; into your
project and run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IcebergMain&lt;/code&gt; class.&lt;/p&gt;

&lt;p&gt;This class creates a logging table if it doesn’t exist. Once you run this code,
you can check to see that the table in Trino exists in the metastore under
TABLE_PARAMS. But, if run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW TABLES IN iceberg.logging;&lt;/code&gt; you’ll notice that
the table doesn’t show up due to &lt;a href=&quot;https://github.com/trinodb/trino/issues/1592&quot;&gt;the issue we discussed above&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let’s update the TABLE_PARAMS entry in the metastore db and then query the table
again.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-why-does-trino-still-depend-on-the-hive-metastore-if-metadata-for-iceberg-saves-to-the-filesystem&quot;&gt;Question of the week: Why does Trino still depend on the Hive metastore if metadata for Iceberg saves to the filesystem?&lt;/h2&gt;

&lt;p&gt;We kept the metastore as many tests run around using the metastore that exist
for the Hive connector, and we want to give the Iceberg connector ample time to
mature before we migrate entirely away from the metastore. We also wanted to 
make the metastore the initial method of use in Iceberg that got developed as
most developers initially would be migrating from their existing Hive catalog,
and we wanted this transition to use existing tested components.&lt;/p&gt;

&lt;p&gt;Currently, the metastore isn’t used the same way as in Hive. Trino stores a
top-level directory that points to the metadata manifest file location and other
statistics around the table in the TABLE_PARAMS table of the metastore. There
is a &lt;a href=&quot;https://github.com/trinodb/trino/pull/6977&quot;&gt;pull request created by Jack Ye&lt;/a&gt;
to migrate away from the requirement to use the Hive metastore when using 
Iceberg with Trino.&lt;/p&gt;

&lt;h2 id=&quot;tip-of-the-iceberg&quot;&gt;Tip of the Iceberg&lt;/h2&gt;

&lt;p&gt;Last bit of some fun with Iceberg. Let’s do a little experiment called, “Will 
the iceberg tip?”:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Go to &lt;a href=&quot;https://iceberg.apache.org/&quot;&gt;https://iceberg.apache.org/&lt;/a&gt; and take a look at the logo.&lt;/li&gt;
  &lt;li&gt;Now go to &lt;a href=&quot;https://joshdata.me/iceberger.html&quot;&gt;https://joshdata.me/iceberger.html&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Draw the Apache Iceberg logo and see what happens.&lt;/li&gt;
  &lt;li&gt;Now draw the iceberg in the image above that Commander Bun Bun is on.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When drawing the iceberg like the image with Commander Bun Bun, the iceberg tips
over. Careful Commander Bun Bun! It looks like the Apache logo wins! Shout out 
to &lt;a href=&quot;https://twitter.com/JoshData&quot;&gt;Joshua Tauberer&lt;/a&gt; for the web page. Shout out 
to &lt;a href=&quot;https://twitter.com/GlacialMeg&quot;&gt;Megan Thompson-Munson&lt;/a&gt; for the tweet that 
started the page. Shout out to 
&lt;a href=&quot;https://www.linkedin.com/in/bartonwright/&quot;&gt;Barton Wright&lt;/a&gt; from Manfred’s team 
of writers for being the geek to find this. Shout out to 
&lt;a href=&quot;https://twitter.com/aliLoney&quot;&gt;Ali&lt;/a&gt; for being a good sport and setting Command 
Bun Bun on the iceberg.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Come join us for the inaugural Virtual Trino meetup on April 21st in the virtual
meetup group in your region!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/events/277246268/&quot;&gt;Americas meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/events/277246173/&quot;&gt;EMEA meetup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/events/277246078/&quot;&gt;APAC meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this meetup, the four Trino/Presto founders will be updating everyone on the
state of Trino. We’ll discuss the rebrand, talk about the recent features, and 
discuss the trajectory of the project. Then we will host a hangout and AMA. Hope
to see you all there!&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/05/03/a-gentle-introduction-to-iceberg.html&quot;&gt;Trino on ice I: A gentle introduction to Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/12/in-place-table-evolution-and-cloud-compatibility-with-iceberg.html&quot;&gt;Trino on ice II: In-place table evolution and cloud compatibility with Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/07/30/iceberg-concurrency-snapshots-spec.html&quot;&gt;Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2021/08/12/deep-dive-into-iceberg-internals.html&quot;&gt;Trino on ice IV: Deep dive into Iceberg internals&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&quot;&gt;https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&quot;&gt;https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&quot;&gt;https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&quot;&gt;https://medium.com/adobetech/high-throughput-ingestion-with-iceberg-ccf7877a413f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&quot;&gt;https://medium.com/adobetech/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&quot;&gt;https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Videos&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=mf8Hb0coI6o&quot;&gt;https://www.youtube.com/watch?v=mf8Hb0coI6o&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&quot;&gt;https://www.youtube.com/watch?v=Kxbzr7UP1dI&amp;amp;t=1274s&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=QNmSXMQ-gY4&quot;&gt;https://www.youtube.com/watch?v=QNmSXMQ-gY4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup Groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;https://www.meetup.com/trino-americas/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;https://www.meetup.com/trino-emea/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-apac/&quot;&gt;https://www.meetup.com/trino-apac/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;https://www.meetup.com/trino-boston/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;https://www.meetup.com/trino-nyc/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;https://www.meetup.com/trino-san-francisco/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;https://www.meetup.com/trino-los-angeles/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West (US)
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;https://www.meetup.com/trino-chicago/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>March of the Trinos! Be careful Commander Bun Bun! That Iceberg doesn&apos;t look stable! https://joshdata.me/iceberger.html</summary>

      
      
    </entry>
  
    <entry>
      <title>13: Trino takes a sip of Pinot!</title>
      <link href="https://trino.io/episodes/13.html" rel="alternate" type="text/html" title="13: Trino takes a sip of Pinot!" />
      <published>2021-03-18T00:00:00+00:00</published>
      <updated>2021-03-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/13</id>
      <content type="html" xml:base="https://trino.io/episodes/13.html">&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/trinot
.png&quot; /&gt;&lt;br /&gt;
Commander Bun Bun loves sippin&apos; on Pinot after a hard day of data exploration!
&lt;/p&gt;

&lt;h2 id=&quot;pinot-links&quot;&gt;Pinot links&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://communityinviter.com/apps/apache-pinot/apache-pinot&quot;&gt;Apache Pinot Slack&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/apache-pinot/events/275991991/&quot;&gt;Pinot Meetup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Xiang Fu, project management chair and committer at &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Apache Pinot&lt;/a&gt;
  and co-founder of stealth mode startup (&lt;a href=&quot;https://twitter.com/xiangfu0&quot;&gt;@xiangfu0&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Elon Azoulay, software engineer at stealth mode startup (&lt;a href=&quot;https://twitter.com/ElonAzoulay&quot;&gt;@ElonAzoulay&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-353&quot;&gt;Release 353&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-353.html&quot;&gt;https://trino.io/docs/current/release/release-353.html&lt;/a&gt;
Martin’s list:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;New ClickHouse connector&lt;/li&gt;
  &lt;li&gt;Support for correlated subqueries involving UNNEST&lt;/li&gt;
  &lt;li&gt;CREATE/DROP TABLE in BigQuery connector&lt;/li&gt;
  &lt;li&gt;Reading and writing column stats in Glue Metastore&lt;/li&gt;
  &lt;li&gt;Support for Apache Phoenix 5.1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s notes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;New geometry functions&lt;/li&gt;
  &lt;li&gt;A whole bunch of correctness and performance improvements&lt;/li&gt;
  &lt;li&gt;Env var (and hence secrets) support for RPM-based installs&lt;/li&gt;
  &lt;li&gt;Hive - performance for bucketed table inserts&lt;/li&gt;
  &lt;li&gt;Kafka - schema registry improvements&lt;/li&gt;
  &lt;li&gt;Experimental join pushdown in a bunch of JDBC connectors&lt;/li&gt;
  &lt;li&gt;Also a bunch of fixes on JDBC connectors&lt;/li&gt;
  &lt;li&gt;Quite a list of changes on the SPI - ensure to check if you have a plugin&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-data-cubes-and-apache-pinot&quot;&gt;Concept of the week: Data cubes and Apache Pinot&lt;/h2&gt;

&lt;p&gt;Before diving into Pinot, I think it’s worthwhile to discuss some theoretical
background to motivate some of the use cases Pinot solves for. We cover the 
concept of data cubes and how they are used in traditional data warehousing to 
speed up queries and minimize unnecessary work on your OLAP system.&lt;/p&gt;

&lt;h3 id=&quot;data-cubes-and-molap-multi-dimensional-online-analytics-processing&quot;&gt;Data cubes and MOLAP (Multi-dimensional online analytics processing)&lt;/h3&gt;

&lt;p&gt;In data analytics, there are many access patterns that tend to repeat themselves
over and over again. It is very common to need to split and merge data based on 
the date and time values. Or perhaps you ask a lot of questions based on a 
specific customer, or even a specific product. Answering these questions 
typically involves aggregation of data like sums, averages, counts, etc… 
Wouldn’t it make sense to cache some of these intermediary results?&lt;/p&gt;

&lt;p&gt;A common way to visualize the columns that are commonly bucketed to some values
or range of values is to show them as a cube, that is sliced up into smaller
dimensions. This actually derives from the traditional form of OLAP, 
multi-dimensional OLAP (MOLAP).&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/data_cube.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;This cube represents a caching of data aggregations that are grouped by commonly
used dimensions. For example, the displayed cube would be the pre-aggregation of
the following query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT part, store, customer, COUNT(*)
FROM cube_table
GROUP BY part, store, customer
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we want to get the data for a particular customer, we can take a “slice” of
that cube by specifying a particular customer. The following query returns the
green square above from our cube.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT part, store, COUNT(*)
FROM cube_table
WHERE customer = &quot;Bob&quot;
GROUP BY part, store
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now what if we want to flatten one of the dimensions? While this can be managed
with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; as before, but depending on the system may ignore any cached
data and scan over all the rows. For this, SQL reserved a special set of
keywords around cubes. We won’t dive into that in depth now, but for our current
goal of flattening a dimension, we can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROLLUP&lt;/code&gt;. Using the keyword &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROLLUP&lt;/code&gt;
indicates to the underlying system that you intend to aggregate over the 
pre-materialized data rather than scan over all rows to compute again. This
gives you the total count of parts per store using the counts of the data cube.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT part, store, COUNT(*)
FROM cube_table
GROUP BY ROLLUP (part, store)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, although we used simple counts, you can precompute a lot of other aggregate
data like sums, min, max, percentile, etc… These can service various queries
that are commonly queried and don’t require a new computation every time. That
is the goal of MOLAP and data cubes.&lt;/p&gt;

&lt;h3 id=&quot;apache-pinot&quot;&gt;Apache Pinot&lt;/h3&gt;

&lt;p&gt;Now let’s move on to Apache Pinot. It is a realtime distributed OLAP datastore, 
designed to answer OLAP queries with low latency. Although there may be a lot of
words there that overlap with the Trino description, the key differentiators are
realtime and low latency. Trino performs batch processing and is not a realtime
system where Pinot is great for ingesting data in batch or stream. The other key
word, low latency could technically apply to both Pinot and Trino but in the
context of realtime subsecond latency, Trino is slow compared to Pinot. This
is due to the specialized indexes that Pinot uses to store the data that we
cover shortly. Importantly, another big distinction is that Trino does not store
any data itself. It is purely a query engine. Xiang has a really great summary
slide that easily shows the strengths of each system and why they work so well
together.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/latency_flexibility.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p&gt;While Trino is not as fast as Pinot, it is able to handle a broader set of
use cases like performing broad joins over open data formats in data lakes. 
This is what motivated work on the Trino Pinot connector. You can have the speed
of Pinot, while having the flexibility of Trino.&lt;/p&gt;

&lt;p&gt;Now that you understand the common use case for Pinot, it’s important to know 
the main goals of Pinot.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;One primary goal is the keep response times of aggregation queries
  predictable, regardless of how many requests Pinot handles. As it scales
  you won’t see a degradation of performance. This is achieved by Pinot’s
  custom indices and storage formats.
    &lt;p align=&quot;center&quot;&gt;
    &lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/data_value.jpeg&quot; /&gt;&lt;br /&gt;
 &lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Another goal of Pinot is to revive the value of data from a historical
  context. Data reaches a particular point in its lifecycle where it becomes
  less valuable as it ages. While all data is able to add some value no matter
  what the age, there’s a tradeoff of scanning multiple rows to glean
  information from antiquated data. Pinot aims to remove this tradeoff as most 
  questions around historical data are queried in aggregate and this can be
  summarized and queried at a low cost.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;The final goal is to manage dimension explosion. One of the difficulties
  with managing a system that caches all this historic data is handling
  dimension explosion that occurs when you cache every possible combination of
  data. Above we showed a three-dimensional cube, but Pinot can handle a much
  larger number of dimensions. However, just because you can, doesn’t mean you
  should. Pinot has a lot of smarts around using the data, and some good
  defaults to determine the maximum number of buckets per dimension. This helps
  balance an exploding cache yet maintains fast results.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;pinot-architecture&quot;&gt;Pinot architecture&lt;/h3&gt;

&lt;p&gt;We just covered Pinot theory and goals, let’s take a quick look at the
architecture.&lt;/p&gt;

&lt;p&gt;A &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/cluster&quot;&gt;Pinot cluster&lt;/a&gt; 
consists of a &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/controller&quot;&gt;controller&lt;/a&gt;, 
&lt;a href=&quot;https://docs.pinot.apache.org/basics/components/broker&quot;&gt;broker&lt;/a&gt;, 
&lt;a href=&quot;https://docs.pinot.apache.org/basics/components/server&quot;&gt;server&lt;/a&gt;, and
optionally a &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/minion&quot;&gt;minion&lt;/a&gt;
to purge data.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/pinot_architecture.svg&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-2028-add-pinot-connector&quot;&gt;PR of the week: PR 2028 Add Pinot connector&lt;/h2&gt;

&lt;p&gt;Our guest on the show today, Elon Azoulay, is the author of 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/2028&quot;&gt;this PR&lt;/a&gt;, so we can ask him all
about it now.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/trino_pinot_connector.png&quot; /&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/connector/pinot.html#configuration&quot;&gt;Basic configuration (Pinot controller url, Pinot segment limit)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;2 ways to connect to Pinot - broker and server, and their tradeoffs 
 (i.e. segment limit for server)&lt;/li&gt;
  &lt;li&gt;Talk about broker passthrough queries, i.e select * from “select … from
  pinot_table …&lt;/li&gt;
  &lt;li&gt;Server limit that we eventually want to eliminate broker query parsing
    &lt;ul&gt;
      &lt;li&gt;How to crash the Pinot server.&lt;/li&gt;
      &lt;li&gt;Streaming server alternative&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;future-pinot-features-in-trino&quot;&gt;Future Pinot features in Trino&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/6069&quot;&gt;Aggregation pushdown (PR 6069)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p align=&quot;center&quot;&gt;
 &lt;img align=&quot;center&quot; width=&quot;60%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/13/aggregation_pushdown.png&quot; /&gt;&lt;br /&gt;
 &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7162&quot;&gt;Pinot insert (PR 7162)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7164&quot;&gt;Pinot create table (PR 7164)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7160&quot;&gt;Pinot drop table (PR 7160)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/7163&quot;&gt;Pinot 6 (PR 7163)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Pinot filter clause parsing (see question of the week below)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-pinot-batch-insertion-and-query-using-trino-pinot-connector&quot;&gt;Demo: Pinot batch insertion and query using Trino Pinot connector&lt;/h2&gt;

&lt;p&gt;To put this PR to the test, we set up a Pinot cluster using Docker Compose.&lt;/p&gt;

&lt;p&gt;To load the data, we’re going to use a simple batch import, but you can also 
&lt;a href=&quot;https://docs.pinot.apache.org/basics/data-import/upsert&quot;&gt;insert the data in a stream&lt;/a&gt;
using &lt;a href=&quot;https://kafka.apache.org/&quot;&gt;Kafka&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let’s start up the Pinot cluster along with the required Zookeeper and Kafka
broker. Clone this repository and navigate to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot/trino-pinot&lt;/code&gt; directory.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/pinot/trino-pinot

docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To do batch insert, we will stage a csv file to read the data in. Create a 
directory underneath a temp folder locally and then submit this to Pinot.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;mkdir -p /tmp/pinot-quick-start/rawdata

echo &quot;studentID,firstName,lastName,gender,subject,score,timestampInEpoch
200,Lucy,Smith,Female,Maths,3.8,1570863600000
200,Lucy,Smith,Female,English,3.5,1571036400000
201,Bob,King,Male,Maths,3.2,1571900400000
202,Nick,Young,Male,Physics,3.6,1572418800000&quot; &amp;gt; /tmp/pinot-quick-start/rawdata/transcript.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In order for Pinot to understand the CSV data, we must provide it a 
&lt;a href=&quot;https://docs.pinot.apache.org/configuration-reference/schema&quot;&gt;schema&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;echo &quot;{
    \&quot;schemaName\&quot;: \&quot;transcript\&quot;,
    \&quot;dimensionFieldSpecs\&quot;: [
      {
        \&quot;name\&quot;: \&quot;studentID\&quot;,
        \&quot;dataType\&quot;: \&quot;INT\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;firstName\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;lastName\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;gender\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      },
      {
        \&quot;name\&quot;: \&quot;subject\&quot;,
        \&quot;dataType\&quot;: \&quot;STRING\&quot;
      }
    ],
    \&quot;metricFieldSpecs\&quot;: [
      {
        \&quot;name\&quot;: \&quot;score\&quot;,
        \&quot;dataType\&quot;: \&quot;FLOAT\&quot;
      }
    ],
    \&quot;dateTimeFieldSpecs\&quot;: [{
      \&quot;name\&quot;: \&quot;timestampInEpoch\&quot;,
      \&quot;dataType\&quot;: \&quot;LONG\&quot;,
      \&quot;format\&quot; : \&quot;1:MILLISECONDS:EPOCH\&quot;,
      \&quot;granularity\&quot;: \&quot;1:MILLISECONDS\&quot;
    }]
}&quot; &amp;gt; /tmp/pinot-quick-start/transcript-schema.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we are almost ready to create the &lt;a href=&quot;https://docs.pinot.apache.org/basics/components/table&quot;&gt;table&lt;/a&gt;. 
Instead of adding table configurations as part of the SQL command, Pinot enables
you to store table configurations as a file. This is a nice option that
decouples the DDL which makes for simpler scripting in batch setups.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;echo &quot;{
    \&quot;tableName\&quot;: \&quot;transcript\&quot;,
    \&quot;segmentsConfig\&quot; : {
      \&quot;timeColumnName\&quot;: \&quot;timestampInEpoch\&quot;,
      \&quot;timeType\&quot;: \&quot;MILLISECONDS\&quot;,
      \&quot;replication\&quot; : \&quot;1\&quot;,
      \&quot;schemaName\&quot; : \&quot;transcript\&quot;
    },
    \&quot;tableIndexConfig\&quot; : {
      \&quot;invertedIndexColumns\&quot; : [],
      \&quot;loadMode\&quot;  : \&quot;MMAP\&quot;
    },
    \&quot;tenants\&quot; : {
      \&quot;broker\&quot;:\&quot;DefaultTenant\&quot;,
      \&quot;server\&quot;:\&quot;DefaultTenant\&quot;
    },
    \&quot;tableType\&quot;:\&quot;OFFLINE\&quot;,
    \&quot;metadata\&quot;: {}
}&quot; &amp;gt; /tmp/pinot-quick-start/transcript-table-offline.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once you create these three files and verify that docker containers are running,
we can now run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Add Table&lt;/code&gt; command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run --rm -ti \
    --network=trino-pinot_trino-network \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-batch-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 -exec
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that the table exists, we can see it in the 
&lt;a href=&quot;http://localhost:9000/#/tables&quot;&gt;Pinot web UI&lt;/a&gt;. Let’s insert some data using a 
batch job specification:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;echo &quot;executionFrameworkSpec:
  name: &apos;standalone&apos;
  segmentGenerationJobRunnerClassName: &apos;org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner&apos;
  segmentTarPushJobRunnerClassName: &apos;org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner&apos;
  segmentUriPushJobRunnerClassName: &apos;org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner&apos;
jobType: SegmentCreationAndTarPush
inputDirURI: &apos;/tmp/pinot-quick-start/rawdata/&apos;
includeFileNamePattern: &apos;glob:**/*.csv&apos;
outputDirURI: &apos;/tmp/pinot-quick-start/segments/&apos;
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: &apos;csv&apos;
  className: &apos;org.apache.pinot.plugin.inputformat.csv.CSVRecordReader&apos;
  configClassName: &apos;org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig&apos;
tableSpec:
  tableName: &apos;transcript&apos;
  schemaURI: &apos;http://pinot-controller:9000/tables/transcript/schema&apos;
  tableConfigURI: &apos;http://pinot-controller:9000/tables/transcript&apos;
pinotClusterSpecs:
  - controllerURI: &apos;http://pinot-controller:9000&apos;&quot; &amp;gt; /tmp/pinot-quick-start/docker-job-spec.yml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now run this batch job by running the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LaunchDataIngestionJob&lt;/code&gt; task.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run --rm -ti \
    --network=trino-pinot_trino-network \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-data-ingestion-job \
    apachepinot/pinot:latest LaunchDataIngestionJob \
    -jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We modified this demo from the tutorials available on the Pinot website:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot&quot;&gt;https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.pinot.apache.org/basics/getting-started/running-pinot-in-docker&quot;&gt;https://docs.pinot.apache.org/basics/getting-started/running-pinot-in-docker&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;question-of-the-week-why-does-my-passthrough-query-not-work-in-the-pinot-connector&quot;&gt;Question of the week: Why does my passthrough query not work in the Pinot connector?&lt;/h2&gt;

&lt;p&gt;The passthrough queries may be failing due to upper case constants that need to
be surrounded with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPPER()&lt;/code&gt;. For example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;Foo&apos;&lt;/code&gt; in this query would be 
rendered as all lowercase once it is passed to Pinot:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = &apos;FOO&apos; GROUP BY col1, col2&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The fix is to pass &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;Foo&apos;&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPPER()&lt;/code&gt; in the passthrough query.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = UPPER(&apos;FOO&apos;) GROUP BY col1, col2&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It could also be due to parsing of functions in filters. A workaround is to put
the filter outside of the double quotes, which can work in some cases. For
example, column table names can be mixed case as the connector will auto resolve
them. If there are mixed case constants would not work with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;upper()&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = &apos;Foo&apos; GROUP BY col1, col2&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The filter can be hoisted into the outer query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT * 
FROM &quot;SELECT col1, col2, COUNT(*) FROM pinot_table GROUP BY col1, col2&quot; WHERE col2 = &apos;Foo&apos;;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There is ongoing work to improve this parsing: 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/7161&quot;&gt;Pinot filter clause parsing (PR 7161)&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-i-cc672caea307&quot;&gt;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-i-cc672caea307&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-ii-3d09ff937713&quot;&gt;https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-ii-3d09ff937713&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/exploring-olap-on-kubernetes-with-apache-pinot-32f12233dc0b&quot;&gt;https://medium.com/apache-pinot-developer-blog/exploring-olap-on-kubernetes-with-apache-pinot-32f12233dc0b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog/building-a-climate-dashboard-with-apache-pinot-and-superset-d3ee8cb7941d&quot;&gt;https://medium.com/apache-pinot-developer-blog/building-a-climate-dashboard-with-apache-pinot-and-superset-d3ee8cb7941d&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/apache-pinot-developer-blog&quot;&gt;https://medium.com/apache-pinot-developer-blog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&quot;&gt;https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trino Meetup Groups&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Virtual
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-americas/&quot;&gt;https://www.meetup.com/trino-americas/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-emea/&quot;&gt;https://www.meetup.com/trino-emea/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Trino APAC - Coming Soon&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;East Coast
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-boston/&quot;&gt;https://www.meetup.com/trino-boston/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-nyc/&quot;&gt;https://www.meetup.com/trino-nyc/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;West Coast
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-san-francisco/&quot;&gt;https://www.meetup.com/trino-san-francisco/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-los-angeles/&quot;&gt;https://www.meetup.com/trino-los-angeles/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Mid West
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/trino-chicago/&quot;&gt;https://www.meetup.com/trino-chicago/&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Commander Bun Bun loves sippin&apos; on Pinot after a hard day of data exploration!</summary>

      
      
    </entry>
  
    <entry>
      <title>Introducing new window features</title>
      <link href="https://trino.io/blog/2021/03/10/introducing-new-window-features.html" rel="alternate" type="text/html" title="Introducing new window features" />
      <published>2021-03-10T00:00:00+00:00</published>
      <updated>2021-03-10T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/03/10/introducing-new-window-features</id>
      <content type="html" xml:base="https://trino.io/blog/2021/03/10/introducing-new-window-features.html">&lt;p&gt;In Trino, we are thrilled to get feedback and feature requests from our
fantastic community, and we’re tirelessly motivated to meet the expectations!
The SQL specification is another source of inspiration. From time to time, we
go through those encrypted scrolls to give you a new feature that you didn’t
even know you needed!&lt;/p&gt;

&lt;p&gt;Recently, there was a push in Trino to extend support for window functions.
In this post, we explain the complexities of window function, and describe a
couple of our recent additions. If “window” doesn’t sound familiar, read on.
Already a window expert? Skip to &lt;a href=&quot;#new features&quot;&gt;what’s new&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A window is the structure you run your window function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OVER&lt;/code&gt;. It has three
components:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;partitioning&lt;/li&gt;
  &lt;li&gt;ordering&lt;/li&gt;
  &lt;li&gt;frame&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You use partitioning to break your input data into independent chunks. Ordering
is to order rows within the partition. And frame is a kind of “sliding window”.
For every processed row, the frame encloses a certain portion of the sorted
partition. Your window function processes this portion and yields the result
for the row.&lt;/p&gt;

&lt;p&gt;A “running average” is one simple example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT avg(totalprice) OVER (
    PARTITION BY custkey
    ORDER BY orderdate
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM orders
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For a particular customer identified by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt;, it sorts their orders by
date and computes a sequence of average prices since the beginning up to each
consecutive entry. The window frame for a row includes all rows from the start
up to and including that row.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/running-average.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;According to standard SQL, there are 3 ways to specify the frame. The first way
is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt; (like in the example). With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt;, you can specify frame bounds by a
physical offset from the current row. While &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW&lt;/code&gt; means “between the beginning of the partition and the current
row”, you can also specify precisely where the frame starts and ends, for
example with: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS BETWEEN 10 PRECEDING AND 5 FOLLOWING&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; is a more complicated way of defining frame on ordered data. It does
not rely on physical offset (in rows), but on logical offset (in value). That
is, the frame includes rows where the value is within a certain range from the
value in the current row.&lt;/p&gt;

&lt;p&gt;Until recently, Trino only supported &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; in limited cases.
You could use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE UNBOUNDED PRECEDING&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT ROW&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNBOUNDED
FOLLOWING&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNBOUNDED PRECEDING&lt;/code&gt; includes all rows since the partition start,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNBOUNDED FOLLOWING&lt;/code&gt; includes all rows until the partition end,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CURRENT ROW&lt;/code&gt; is trickier. It includes all rows where values of the sort key
are the same as in the current row. We call them a &lt;em&gt;peer group&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s time to introduce the first new feature:&lt;/p&gt;

&lt;h2 id=&quot;-full-support-for-frame-type-range&quot;&gt;&lt;a name=&quot;new features&quot;&gt;&lt;/a&gt; Full support for frame type RANGE&lt;/h2&gt;

&lt;p&gt;Since &lt;a href=&quot;https://trino.io/docs/current/release/release-346.html&quot;&gt;version 346&lt;/a&gt;, it is
possible to specify &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; with an offset value. The frame includes all rows
whose value is within this range from the current row.&lt;/p&gt;

&lt;p&gt;Let’s modify our example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT avg(totalprice) OVER (
    PARTITION BY custkey
    ORDER BY orderdate
    RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
FROM orders
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, for every row, we get the average price from the preceding month. Note that
the offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;interval &apos;1&apos; month&lt;/code&gt; applies to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orderdate&lt;/code&gt;, which is the sorting
column.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/running-average-range.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Of course, we don’t have to order by date. The sorting column can be of any
numeric or date/time type, and the offset must be compatible. Also, the offset
doesn’t have to be a literal. It can come in another column of a table or,
generally, it can be any expression, as long as the type matches.&lt;/p&gt;

&lt;p&gt;A frame of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; does not quite fit in the abstraction of a “sliding
window”. Frames can be bigger or smaller depending not only on the offset
values but also on the actual input data. A long series of similar entries can
produce a huge frame, while a gap in input values can result in an empty frame.&lt;/p&gt;

&lt;p&gt;For illustration, imagine a group of students, and the results of some test they
took. Our table has two columns: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;student_id&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result&lt;/code&gt;, which is the number
of points. For each student, let’s find how many students did better by 1 to 2
points:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH students_results(student_id, result) AS (VALUES
    (&apos;student_1&apos;, 17),
    (&apos;student_2&apos;, 16),
    (&apos;student_3&apos;, 18),
    (&apos;student_4&apos;, 18),
    (&apos;student_5&apos;, 10),
    (&apos;student_6&apos;, 20),
    (&apos;student_7&apos;, 16))
SELECT
    student_id,
    result,
    count(*) OVER (
        ORDER BY result
        RANGE BETWEEN 1 FOLLOWING AND 2 FOLLOWING) AS close_better_scores_count
FROM students_results;

 student_id | result | close_better_scores_count
------------+--------+---------------------------
 student_5  |     10 |                         0
 student_7  |     16 |                         3
 student_2  |     16 |                         3
 student_1  |     17 |                         2
 student_3  |     18 |                         1
 student_4  |     18 |                         1
 student_6  |     20 |                         0
(7 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that the frame does not contain the current row. For a particular student,
it only includes students with better results, and not themselves. For the
unfortunate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;student_5&lt;/code&gt;, there are no students with similar test results. The
frame is also empty for the lucky &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;student_6&lt;/code&gt; who scored the most points.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/students-range.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Besides &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt;, there is another way to specify the frame on
ordered data. And yes, Trino supports this mechanism! Let me introduce the
second of our recent additions:&lt;/p&gt;

&lt;h2 id=&quot;support-for-frame-type-groups&quot;&gt;Support for frame type GROUPS&lt;/h2&gt;

&lt;p&gt;This feature, added in
&lt;a href=&quot;https://trino.io/docs/current/release/release-346.html&quot;&gt;version 346&lt;/a&gt;, allows you to
include or exclude the whole &lt;em&gt;peer groups&lt;/em&gt; of rows in ordered data.&lt;/p&gt;

&lt;p&gt;For illustration, let’s consider again the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;students_results&lt;/code&gt; table. For each
student, let’s find the gap between their result and the result of a student (or
students) who did slightly better.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH students_results(student_id, result) AS (VALUES
    (&apos;student_1&apos;, 17),
    (&apos;student_2&apos;, 16),
    (&apos;student_3&apos;, 18),
    (&apos;student_4&apos;, 18),
    (&apos;student_5&apos;, 10),
    (&apos;student_6&apos;, 20),
    (&apos;student_7&apos;, 16))
SELECT
    student_id,
    result,
    max(result) OVER (
        ORDER BY result
        GROUPS BETWEEN CURRENT ROW AND 1 FOLLOWING) - result AS gap_till_better_score
FROM students_results;

 student_id | result | gap_till_better_score
------------+--------+-----------------------
 student_5  |     10 |                     6
 student_7  |     16 |                     1
 student_2  |     16 |                     1
 student_1  |     17 |                     1
 student_3  |     18 |                     2
 student_4  |     18 |                     2
 student_6  |     20 |                     0
(7 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The window function for each student returns the closest better result. The
frame of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt; used here, includes all entries equal to the current
entry in terms of points (that is the student’s &lt;em&gt;peer group&lt;/em&gt;), and the next
group.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/students-groups.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In frames of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt;, like in other frame types, the offset doesn’t have
to be constant. It can be any expression, as long as its type is exact numeric
with scale 0. Simply put, we can skip any integer number of groups.&lt;/p&gt;

&lt;h3 id=&quot;under-the-covers&quot;&gt;Under the covers&lt;/h3&gt;

&lt;p&gt;How do we deal with finding the frame bounds effectively? With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROWS&lt;/code&gt; it’s easy.
We only need to skip a determined number of rows forward or backwards.&lt;/p&gt;

&lt;p&gt;With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt;, we need to examine the actual values to see if they fall within
the given range. Our approach is optimized for the case where the offset values
are constant for all rows. Our solution involves caching frame bounds computed
for the preceding row, and using them as the starting point to find frame
bounds for the current row. Ideally, we never have to move the frame bounds
back as we process subsequent rows. In such a case, the amortized cost of frame
bound calculations per row is constant.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/sliding-frame-range.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Our strategy for determining frame bounds for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt; is similar. We cache the
frame bounds computed for the preceding row and use them as the starting point
for the current row. If the frame offset is constant, frame bounds slide from
one peer group to another every time the processed row leaves one peer group and
enters the next one.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/sliding-frame-groups.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;support-for-window-clause&quot;&gt;Support for WINDOW clause&lt;/h2&gt;

&lt;p&gt;As all the preceding examples show, a window function is a big chunk of syntax.
What if we wanted to use several window functions over the same window? Say, we
need an average price and a total price from the preceding month. And the top
price. Does it have to look like the below?&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT
    avg(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW),
    sum(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW),
    max(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
FROM orders
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Well, no more. Starting with
&lt;a href=&quot;https://trino.io/docs/current/release/release-352.html&quot;&gt;Trino 352&lt;/a&gt;, you can
predefine a window specification, and then use it or redefine it wherever you
need. This is thanks to the third of our new additions: support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt;
clause.&lt;/p&gt;

&lt;p&gt;Technically speaking, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause is part of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM&lt;/code&gt; clause:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT …
    FROM …
        WHERE …
        GROUP BY …
        HAVING …
        WINDOW …
ORDER BY …
OFFSET …
LIMIT / FETCH …
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause, you can define any number of named windows. Then you
can simply refer to them by their names in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; list or an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;
clause.&lt;/p&gt;

&lt;p&gt;Let’s check how the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause helps with our example query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
	avg(totalprice) OVER w,
	sum(totalprice) OVER w,
	max(totalprice) OVER w
FROM orders
WINDOW w AS (
    PARTITION BY custkey
    ORDER BY orderdate
    RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To be even more concise, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause allows you to define more
specialized windows from existing window definitions:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WINDOW 
	w1 AS (PARTITION BY custkey),
	w2 AS (w1 ORDER BY orderdate),
	w3 AS (w2 RANGE BETWEEN interval &apos;1&apos; month PRECEDING AND CURRENT ROW)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Alternatively you can define the window only partially and then complete it
where it’s used:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
	avg(totalprice) OVER (w ROWS BETWEEN 10 PRECEDING AND CURRENT ROW) AS recent_average,
	sum(totalprice) OVER (w ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) AS next_buys,
FROM orders
    WINDOW w AS (PARTITION BY custkey ORDER BY orderdate)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are some ANSI rules, though, you need to follow when redefining windows:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PARTITION BY&lt;/code&gt; is only allowed in the base definition,&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; can only be specified once in the named windows reference chain,&lt;/li&gt;
  &lt;li&gt;frame can only be specified in the final definition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In case you wonder, there’s no need to worry if some predefined windows are
eventually unused. Unused windows do not affect the efficiency of your query
execution. Partitioning, sorting and frame bound computations are costly
operations. That’s why we made sure that unused window parts do not appear in
the query plan.&lt;/p&gt;

&lt;p&gt;There’s one last detail about the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause that needs clarification. The
columns referenced in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause are columns of the input table. In the
following example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;country_code&lt;/code&gt; is clearly a column of the table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;countries&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;... FROM countries WINDOW w AS (ORDER BY country_code)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Obvious enough. Why am I telling this?&lt;/p&gt;

&lt;p&gt;Window functions can be used in two different clauses of a query, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;. With the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause, there is a rule that column references
used there refer to the output table rather than the input table. Consider this
query:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;WITH countries(country_code) AS (VALUES &apos;pol&apos;, &apos;CAN&apos;, &apos;USA&apos;)
SELECT upper(country_code) AS country_code
    FROM countries
    WINDOW w AS (ORDER BY country_code)
ORDER BY row_number() OVER w
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Window &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;w&lt;/code&gt; is used in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause. So, does the window’s ordering use
the original &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;country_code&lt;/code&gt; column from the input table, or does it “see” the
uppercased &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;country_code&lt;/code&gt; from the output table?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/country-code.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The SQL spec is clear about it: a column reference in the named window always
refers to the original column, no matter where you use this window. In the
example, the result is ordered according to the original values: lowercase &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pol&lt;/code&gt;
after uppercase &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;USA&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/window-features/country-code-result.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As expected:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; country_code
--------------
 CAN
 USA
 POL
(3 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And here the story ends. Thanks for your attention! I hope you enjoy Trino’s
new superpowers. In case of questions or issues — &lt;a href=&quot;/slack.html&quot;&gt;you
know where to find us&lt;/a&gt;. More goodies are on the way, so stay tuned! How
about regex matching on tables?&lt;/p&gt;</content>

      
        <author>
          <name>Kasia Findeisen (kasiafi)</name>
        </author>
      

      <summary>In Trino, we are thrilled to get feedback and feature requests from our fantastic community, and we’re tirelessly motivated to meet the expectations! The SQL specification is another source of inspiration. From time to time, we go through those encrypted scrolls to give you a new feature that you didn’t even know you needed!</summary>

      
      
    </entry>
  
    <entry>
      <title>12: Trino gets super visual with Apache Superset!</title>
      <link href="https://trino.io/episodes/12.html" rel="alternate" type="text/html" title="12: Trino gets super visual with Apache Superset!" />
      <published>2021-03-04T00:00:00+00:00</published>
      <updated>2021-03-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/12</id>
      <content type="html" xml:base="https://trino.io/episodes/12.html">&lt;h2 id=&quot;guests&quot;&gt;Guests&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Srini Kadamati, Developer Advocate at &lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt;
 (&lt;a href=&quot;https://twitter.com/SriniKadamati&quot;&gt;@SriniKadamati&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Dr. Beto Dealmeida, Staff Engineer at &lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt; (&lt;a href=&quot;https://twitter.com/dealmeida&quot;&gt;@dealmeida&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-353--almost&quot;&gt;Release 353 – Almost&lt;/h2&gt;

&lt;p&gt;353 is right around the corner. Last show we said this would be a small release.
While there was a correctness issue we resolved, there didn’t seem to be much
demand to get it out quick as we initially thought. So it was decided to
continue adding more features to 353. It should be coming out shortly!&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-trino-clients-python-and-apache-superset&quot;&gt;Concept of the week: Trino clients, Python, and Apache Superset&lt;/h2&gt;

&lt;p&gt;What is the general data flow from a connected data source?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trino workers request data from the data source with specific connector&lt;/li&gt;
  &lt;li&gt;Workers process data and send it to the coordinator&lt;/li&gt;
  &lt;li&gt;Coordinator does final processing&lt;/li&gt;
  &lt;li&gt;Supplies the data via HTTP / REST stream to requestor&lt;/li&gt;
  &lt;li&gt;Requestor is a “client” such as JDBC driver, or Trino CLI&lt;/li&gt;
  &lt;li&gt;Client translates data further and provides to application (Java application
using JDBC driver) or user interface/directly to user (output in CLI)&lt;/li&gt;
  &lt;li&gt;User views part of data and scrolls down&lt;/li&gt;
  &lt;li&gt;Client requests more data from coordinator via HTTP / REST (and see above)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What clients are provided by Trino project?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/client/jdbc.html&quot;&gt;JDBC driver&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/client/cli.html&quot;&gt;Trino CLI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-python-client&quot;&gt;Trino Python client&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino-go-client&quot;&gt;Trino Go client&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What other clients are there?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.starburst.io/data-consumer/clients/odbc.html&quot;&gt;ODBC driver from Starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/ecosystem/client.html&quot;&gt;Various other clients&lt;/a&gt; from the open source community
    &lt;ul&gt;
      &lt;li&gt;R&lt;/li&gt;
      &lt;li&gt;NodeJS/Javascript&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What happens in the Python world?&lt;/p&gt;

&lt;p&gt;Disclaimer: I am not a Pythonista or Pythoneer.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;DB-API 2.0
    &lt;ul&gt;
      &lt;li&gt;PEP 249 &lt;a href=&quot;https://www.python.org/dev/peps/pep-0249/&quot;&gt;https://www.python.org/dev/peps/pep-0249/&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Python standard library&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;trino-python-client
    &lt;ul&gt;
      &lt;li&gt;Wraps complexity of Trino HTTP / REST&lt;/li&gt;
      &lt;li&gt;Supports authentication and such&lt;/li&gt;
      &lt;li&gt;Provides DB API endpoints / implementation&lt;/li&gt;
      &lt;li&gt;Preferred method to query Trino&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;SQLAlchemy https://www.sqlalchemy.org/
    &lt;ul&gt;
      &lt;li&gt;SQL toolkit&lt;/li&gt;
      &lt;li&gt;ORM mapper&lt;/li&gt;
      &lt;li&gt;Widely used, eg. in Apache Superset&lt;/li&gt;
      &lt;li&gt;Supports dialects&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PyHive
    &lt;ul&gt;
      &lt;li&gt;Not really a SQL wrapper&lt;/li&gt;
      &lt;li&gt;Aimed at Hive QL&lt;/li&gt;
      &lt;li&gt;Only kind of useful for Trino, limited compatibility&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;JDBC driver (Java !) and PySpark
    &lt;ul&gt;
      &lt;li&gt;Possible, but a hack really&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PyJDBC
    &lt;ul&gt;
      &lt;li&gt;Wraps DB API around any JDBC driver&lt;/li&gt;
      &lt;li&gt;Kind of a hack since it goes through JDBC to HTTP, when Trino python
client does the same more directly&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;PyODBC
    &lt;ul&gt;
      &lt;li&gt;Similar hack to PyJDBC&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Potentially also possible to talk to via HTTP directly
    &lt;ul&gt;
      &lt;li&gt;That’s like reimplementing the trino-python-client&lt;/li&gt;
      &lt;li&gt;Also see question of the week later&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beyond that, it will vary from application to application.&lt;/p&gt;

&lt;p&gt;Let’s find out from our guests how this hangs together in Apache Superset, since
it is using Python.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-superset-pr-13105-feat-first-step-native-support-trino&quot;&gt;PR of the week: Superset PR 13105 feat: first step native support Trino&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/apache/superset/pull/13105&quot;&gt;https://github.com/apache/superset/pull/13105&lt;/a&gt; that
was graciously added by &lt;a href=&quot;https://github.com/dungdm93&quot;&gt;dungdm93&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The first thing we need to understand about this addition is the concept of a
database engine in Superset. A database engine handles a lot of the custom
interactions between various databases and maps them to the interface that 
Superset understands. If certain concepts are missing in a certain database, 
like time granularity or SQL syntax, the database engine for that database
indicated to Superset that this is not available. As a result the option does 
not show in Superset, or a concise error message is reported. By default, 
database engines use the &lt;a href=&quot;https://github.com/apache/superset/blob/master/superset/db_engine_specs/base.py&quot;&gt;base.py&lt;/a&gt;
methods, but each engine, like Trino, add the custom mappings with a specific
engine implementation,
&lt;a href=&quot;https://github.com/apache/superset/blob/master/superset/db_engine_specs/trino.py&quot;&gt;trino.py&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The pull request adds a few basic custom changes to enable Trino usage with 
Superset. One change ensures that complex timestamps from Trino are truncated to
a format that Superset is able to support during time aggregation operations.&lt;/p&gt;

&lt;p&gt;This opens a vast amount of functionality for using Trino and Superset. We 
wanted to feature this because it goes to show how a small code change, even
one that is not in the Trino repository, can have a vast effect on those
using Superset and Trino.&lt;/p&gt;

&lt;p&gt;Thank you so much to &lt;a href=&quot;https://github.com/dungdm93&quot;&gt;dungdm93&lt;/a&gt; for making this
change and further linking Trino into a fantastic project like &lt;a href=&quot;https://superset.apache.org/&quot;&gt;Apache
 Superset&lt;/a&gt;!&lt;/p&gt;

&lt;h2 id=&quot;demo-superset-querying-trino-to-create-visualization-dashboard&quot;&gt;Demo: Superset querying Trino to create visualization dashboard&lt;/h2&gt;

&lt;p&gt;To put this PR to the test, we need to connect Apache Superset to Trino as our
datasource.&lt;/p&gt;

&lt;p&gt;First, you need to follow &lt;a href=&quot;https://superset.apache.org/docs/installation/installing-superset-using-docker-compose&quot;&gt;these instructions&lt;/a&gt;
to install Docker (if you don’t already have it installed), and then clone the 
Superset repository:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$ git clone https://github.com/apache/superset.git&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Next, you need to set up the database driver for Trino. Navigate to the root
directory of the local Superset repository you just downloaded and run the
following.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;echo &quot;sqlalchemy-trino&quot; &amp;gt;&amp;gt; ./docker/requirements-local.txt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This tells Superset scripts to install the sqlalchemy-trino library upon
startup. We know the name by looking up &lt;a href=&quot;https://superset.apache.org/docs/databases/trino&quot;&gt;the Trino driver page&lt;/a&gt;
for the driver documentation and how to use the connection string. If you were
to install these directly on a Superset node, you would refer to &lt;a href=&quot;https://superset.apache.org/docs/databases/installing-database-drivers&quot;&gt;this database
 drivers page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now run the following command to start up Superset and make sure you’re in the 
root folder of the repo.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-compose -f docker-compose-non-dev.yml up&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After Superset is running, you need to start Trino as well. We did so using a
separate docker-compose app.&lt;/p&gt;

&lt;p&gt;As soon as this is done, you can navigate to Superset’s homepage &lt;a href=&quot;http://localhost:8088&quot;&gt;http://localhost:8088&lt;/a&gt;
and scroll to the &lt;strong&gt;Data&lt;/strong&gt; &amp;gt; &lt;strong&gt;Databases&lt;/strong&gt; menu.&lt;/p&gt;

&lt;p&gt;Click the &lt;strong&gt;+Database&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;Set Name to “Trino” and URI to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino://trino@host.docker.internal:8080&lt;/code&gt;
and click &lt;strong&gt;Add&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you want to allow CTAS, CVAS, or DML operations, you’ll want to edit
the Database you just created and click on the &lt;strong&gt;SQL LAB SETTINGS&lt;/strong&gt; tab and
 select
in the operations you want to allow.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/12/connection_settings.png&quot; /&gt;&lt;br /&gt;
 Connection settings that allows for creation/manipulation of tables.
&lt;/p&gt;

&lt;p&gt;You should be able to verify under &lt;strong&gt;SQL Lab&lt;/strong&gt; &amp;gt; &lt;strong&gt;SQL Editor&lt;/strong&gt; and run a SELECT
query.&lt;/p&gt;

&lt;p&gt;We cover adding charts and creating a dashboard in the show. We linked some
blogs from &lt;a href=&quot;https://preset.io/&quot;&gt;Preset&lt;/a&gt; around how to do a lot of this workflow 
in great detail. Find these blogs linked below! Here’s a taste of what we
created in Superset with some &lt;a href=&quot;https://transtats.bts.gov/Fields.asp?gnoyr_VQ=FGJ&quot;&gt;BTS On-Time : Reporting Carrier On-Time
 Performance (1987-present)&lt;/a&gt;
and &lt;a href=&quot;https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf&quot;&gt;Covid Cases&lt;/a&gt; 
reported by the CDC.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/12/covid_flights_data.png&quot; /&gt;&lt;br /&gt;
 COVID-19 and flights data dashboard!
&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-use-the-trino-rest-api&quot;&gt;Question of the week: How do I use the Trino REST api?&lt;/h2&gt;

&lt;p&gt;I want to just use the REST API of Trino. Where is the documentation? How do I do that?&lt;/p&gt;

&lt;h3 id=&quot;the-short-answer&quot;&gt;The short answer:&lt;/h3&gt;

&lt;p&gt;Don’t do that. Use a Trino client instead.&lt;/p&gt;

&lt;h3 id=&quot;the-long-answer&quot;&gt;The long answer:&lt;/h3&gt;

&lt;p&gt;The typical desired use case for using the REST API is to run a query and get 
the result. However that part of the API is not really a traditional REST API 
(HTTP POST, HTTP GET). That just doesn’t work for large datasets to be returned.
Instead, it is a constant open connection and stream of data and interaction
between client and Trino.&lt;/p&gt;

&lt;p&gt;The clients take care of all this complexity and provide it in standard API for
the various platforms (JDBC, …). Use the clients!&lt;/p&gt;

&lt;p&gt;And if there is no client, or the existing client is not good enough. Create an
open source one or contribute improvements.&lt;/p&gt;

&lt;h3 id=&quot;the-exception&quot;&gt;The exception:&lt;/h3&gt;

&lt;p&gt;There are other simple, pure REST API endpoints that you can use just straight
out of the box. Try &lt;a href=&quot;http://localhost:8080/v1/info&quot;&gt;http://localhost:8080/v1/info&lt;/a&gt; or
&lt;a href=&quot;http://localhost:8080/v1/status&quot;&gt;http://localhost:8080/v1/status&lt;/a&gt;.
You could use those for a liveness/readiness probe in k8s or for cluster status
display. By the way, the Web UI uses those and others..&lt;/p&gt;

&lt;h3 id=&quot;last-note&quot;&gt;Last note&lt;/h3&gt;

&lt;p&gt;If you really can’t help yourself, here are some docs.
&lt;a href=&quot;https://github.com/trinodb/trino/wiki/HTTP-Protocol&quot;&gt;https://github.com/trinodb/trino/wiki/HTTP-Protocol&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-03-03-druid-prophet-pt1/&quot;&gt;https://preset.io/blog/2021-03-03-druid-prophet-pt1/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-02-11-superset-geodata/&quot;&gt;https://preset.io/blog/2021-02-11-superset-geodata/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-01-18-superset-1-0/&quot;&gt;https://preset.io/blog/2021-01-18-superset-1-0/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2021-1-18-recap-2020/&quot;&gt;https://preset.io/blog/2021-1-18-recap-2020/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2020-09-22-slack-dashboard/&quot;&gt;https://preset.io/blog/2020-09-22-slack-dashboard/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2020-10-02-slack-dashboard-part-2/&quot;&gt;https://preset.io/blog/2020-10-02-slack-dashboard-part-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://preset.io/blog/2020-10-08-bigquery-superset-part-2/&quot;&gt;https://preset.io/blog/2020-10-08-bigquery-superset-part-2/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Guests Srini Kadamati, Developer Advocate at Preset (@SriniKadamati) Dr. Beto Dealmeida, Staff Engineer at Preset (@dealmeida)</summary>

      
      
    </entry>
  
    <entry>
      <title>11: Dynamic filtering and dynamic partition pruning</title>
      <link href="https://trino.io/episodes/11.html" rel="alternate" type="text/html" title="11: Dynamic filtering and dynamic partition pruning" />
      <published>2021-02-18T00:00:00+00:00</published>
      <updated>2021-02-18T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/11</id>
      <content type="html" xml:base="https://trino.io/episodes/11.html">&lt;h2 id=&quot;release-352&quot;&gt;Release 352&lt;/h2&gt;

&lt;p&gt;Release notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-352.html&quot;&gt;https://trino.io/docs/current/release/release-352.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No new release to discuss yet except that 353 will be around the corner to fix
a low-impact correctness issue that came out in 352
&lt;a href=&quot;https://github.com/trinodb/trino/pull/6895&quot;&gt;https://github.com/trinodb/trino/pull/6895&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-dynamic-filtering&quot;&gt;Concept of the week: Dynamic filtering&lt;/h2&gt;

&lt;p&gt;So we’ve covered a lot on the Trino Community Broadcast to build our way up
to tackling this pretty big subject in the space called dynamic filtering. If
you haven’t seen episodes five through nine, you may want to go back and watch
those for some context for this episode. Episode eight actually diverted to the
Trino rebrand so we won’t discuss that one. For the recap;&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;/episodes/5.html&quot;&gt;episode five&lt;/a&gt;, we spoke about Hive partitions. 
In  order to save you time when you run a query, Hive stores data under
directories named by the values of the data written underneath that directory.
Take this directory structure for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; table partitioned by the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orderdate&lt;/code&gt; field:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;orders
├── orderdate=1992-01-01
│   ├── orders_1992-01-01_1.orc
│   ├── orders_1992-01-01_2.orc
│   ├── orders_1992-01-01_3.orc
│   └── ...
├── orderdate=1992-01-02
│   └── ...
├── orderdate=1992-01-03
│   └── ...
└── ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When querying for data under January 1st, 1992, according to the Hive model,
query engines like Hive and Trino will only scan ORC files under the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders/orderdate=1992-01-01&lt;/code&gt; directory. The idea is to avoid scanning
unnecessary data by grouping rows based on a field commonly used in a query.&lt;/p&gt;

&lt;p&gt;In episode &lt;a href=&quot;/episodes/6.html&quot;&gt;six&lt;/a&gt; and &lt;a href=&quot;/episodes/7.html&quot;&gt;seven&lt;/a&gt;,
we discussed a bit about how a query gets represented internally to Trino once
you submit your SQL query. First, the Parser converts SQL to an abstract syntax
tree (AST) format. Then the planner generates a different tree structure called
the intermediate representation (IR) that contains nodes representing the steps
that need to be performed in order to answer the query. The leaves of the tree 
get executed first, and the parents of each node are dependent on the action of
its child completing before it can start. Finally, the planner and 
cost-based-optimizer (CBO) runs various updates on the IR to optimize the query
plan until it is ready to be executed. To sum it all up, the planner and CBO
generate and optimize the plan by running optimization rules. Refer to chapter 
four in 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;
pg. 50 for more information.&lt;/p&gt;

&lt;p&gt;In episode &lt;a href=&quot;/episodes/9.html&quot;&gt;nine&lt;/a&gt;, we discussed how hash-joins work
by first drawing a nested-loop analogy to how joins work. We then discussed how
it is advantageous to read the inner loop into memory to avoid a lot of extra
disk calls. Since it is ideal to read an entire table into memory, you likely
want to make sure the table that is built in memory is the smaller size of the
two tables. This smaller table called the build table. The table that gets
streamed is called the probe table. We discussed a bit how hash-joins work which
is a common mechanism to execute joins in a distributed and parallel fashion.&lt;/p&gt;

&lt;p&gt;Another nomenclature akin to build table and probe tables are dimension and
fact table, respectively. This nomenclature comes from the &lt;a href=&quot;https://en.wikipedia.org/wiki/Star_schema&quot;&gt;star schema&lt;/a&gt;
from data warehousing. Typically, there are large tables called fact tables
would live at the center of the schema. These tables typically have many foreign
keys, and a bit of quantitative or measuarable columns of the event or instance.
The foreign keys connect these big fact tables to smaller dimension tables that,
when joined, provide human readable context to enrich the recordings in the fact
table. The schema ends up looking like a star with the fact table at the center.
In essence, you just need to remember when someone is describing a fact table
they are saying it is a bigger table that is likely going to end up on the probe
side of a join, where a dimension is more likely a candidate to fit into memory
on the build side of a join.&lt;/p&gt;

&lt;p&gt;So let’s get onto the dynamic filtering shall we? First, let’s cover a few
concepts about dynamic filtering, then compare some variations of this concept.&lt;/p&gt;

&lt;p&gt;Dynamic filtering takes advantage of joins with big fact tables to smaller
dimension tables. What makes this filtering different from other types of
filtering is that you are using the smaller build table that is loaded at query
time to generate a list of values that exist in the join column between the
build table and probe table. We know that only values that match these criteria
are going to be returned from the probe side, so we can use this dynamically
generated list as a pushdown predicate on the join column of the probe side.
This means we are still scanning this data, but only sending the subset that
answers the query. We can look at &lt;a href=&quot;/blog/2019/06/30/dynamic-filtering.html&quot;&gt;the blog written for the original local
 dynamic filtering implementation&lt;/a&gt;
by Roman Zeyde for more insights on the original implementation for dynamic
filtering before Raunaq’s changes.&lt;/p&gt;

&lt;p&gt;Local dynamic filtering is definitely beneficial as it allows skipping 
unnecessary stripes or row-groups in the ORC or Parquet reader. However, it
works only for broadcast joins, and its effectiveness depends upon the 
selectivity of the min and max indices maintained in ORC or Parquet files. What
if we could prune entire partitions from the query execution based on dynamic
filters? In the next iteration of dynamic filtering, called dynamic partition 
pruning, we do just that. We take advantage of the partitioned layout of Hive 
tables to avoid generating splits on partitions that won’t exist in the final
query result. The coordinator can identify partitions for pruning based on the 
dynamic filters sent to it from the workers processing the build side of join.
This only works if the query contains a join condition on a column that is 
partitioned.&lt;/p&gt;

&lt;p&gt;With that basic understanding, let’s move on to the PR that implement dynamic
partition pruning!&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1072-implement-dynamic-partition-pruning&quot;&gt;PR of the week: PR 1072 Implement dynamic partition pruning&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/1072&quot;&gt;https://github.com/trinodb/trino/pull/1072&lt;/a&gt; we
return with Raunaq Morarka and Karol Sobczak. This PR effectively brings in the 
second iteration of dynamic filtering, dynamic partition pruning, where instead
of relying on local dynamic filtering we collect dynamic filters from the
workers in the coordinator and prune out extra splits that aren’t needed with
the partition layout of the probe side table. A query like this for example, 
seen in &lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;Raunaq’s blog about dynamic partition pruning&lt;/a&gt;
shows that if we partition &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ss_sold_date_sk&lt;/code&gt; we can take
advantage of this information by sending it to the coordinator.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT COUNT(*) FROM 
sales JOIN items ON sales.item_id = date_dim.items.id
WHERE items.price &amp;gt; 1000;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Below we show how the execution of this would look in a distributed manner if
you partitioned the sales table on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;item_id&lt;/code&gt;. This is a visual reference for
those listening in on the podcast:&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
1:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering1.png&quot; /&gt;&lt;br /&gt;
 Query is sent to the coordinator to be parsed, analyzed, and planned.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
2:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering2.png&quot; /&gt;&lt;br /&gt;
 All workers get a subset of the items (build) table and each worker filters
 out items with price &amp;gt; 1000.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
3:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering3.png&quot; /&gt;&lt;br /&gt;
 All workers create dynamic filter for their item subset and send it to the 
 coordinator.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
4:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering4.png&quot; /&gt;&lt;br /&gt;
 Coordinator uses dynamic filter list to prune out splits and partitions that
 do not overlap with the DF and submits splits to run on workers.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
5:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering5.png&quot; /&gt;&lt;br /&gt;
 Workers run splits over the sales (probe) table.
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
6:
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/11/Dynamic
 Filtering6.png&quot; /&gt;&lt;br /&gt;
 Workers return final rows to be assembled into the final result on the
 coordinator.
&lt;/p&gt;

&lt;h2 id=&quot;pr-demo-pr-1072-implement-dynamic-partition-pruning&quot;&gt;PR Demo: PR 1072 Implement dynamic partition pruning&lt;/h2&gt;

&lt;p&gt;For this PR demo, we have set up one r5.4xlarge coordinator and four r5.4xlarge
workers in a cluster. We have a sf100 size tpcds dataset. We will run some of
the TPC-DS queries and perhaps a few others.&lt;/p&gt;

&lt;p&gt;The first query we run through in the TPC-DS queries was &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/resources/sql-tests/testcases/tpcds/q54.sql&quot;&gt;query 54&lt;/a&gt;.
With this query, we are using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive&lt;/code&gt; catalog pointing to AWS S3 and AWS Glue
as our metastore. We initially disable dynamic filtering then compare it to
the times when dynamic filtering is enabled. With dynamic filtering we find the
query to run at about 92 seconds, where with dynamic filtering it runs for 42
seconds. We see similar findings for the semijoin we execute below and discuss
some implications of how the planner actually optimizes the semijoin into an
inner join.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/* turn dynamic filtering on or off to compare */
SET SESSION enable_dynamic_filtering=false;

SELECT ss_sold_date_sk, COUNT(*) from store_sales WHERE ss_sold_date_sk IN (
  SELECT ws_sold_date_sk FROM (
    SELECT ws_sold_date_sk, COUNT(*) FROM web_sales GROUP BY 1 ORDER BY 2 LIMIT 100
  )
)
GROUP BY 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/codex/how-to-build-a-modern-data-lake-with-minio-db0455eec053&quot;&gt;https://medium.com/codex/how-to-build-a-modern-data-lake-with-minio-db0455eec053&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/quintoandar-tech-blog/building-a-sql-engine-infrastructure-at-quintoandar-73540e136c4e&quot;&gt;https://medium.com/quintoandar-tech-blog/building-a-sql-engine-infrastructure-at-quintoandar-73540e136c4e&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/codex/modern-data-platform-using-open-source-technologies-212ba8273eab&quot;&gt;https://medium.com/codex/modern-data-platform-using-open-source-technologies-212ba8273eab&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Big Data Technology Warsaw Summit - Workshop Feb 23 - 24 &lt;a href=&quot;https://bigdatatechwarsaw.eu/agenda/&quot;&gt;https://bigdatatechwarsaw.eu/agenda/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Big Data Technology Warsaw Summit - Conference Feb 25 - 26 &lt;a href=&quot;https://bigdatatechwarsaw.eu/agenda/&quot;&gt;https://bigdatatechwarsaw.eu/agenda/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Past Events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Starburst Datanova - on demand &lt;a href=&quot;https://www.starburst.io/info/datanova/&quot;&gt;https://www.starburst.io/info/datanova/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 352</summary>

      
      
    </entry>
  
    <entry>
      <title>10: Naming the bunny!</title>
      <link href="https://trino.io/episodes/10.html" rel="alternate" type="text/html" title="10: Naming the bunny!" />
      <published>2021-02-04T00:00:00+00:00</published>
      <updated>2021-02-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/10</id>
      <content type="html" xml:base="https://trino.io/episodes/10.html">&lt;h2 id=&quot;release-352&quot;&gt;Release 352&lt;/h2&gt;
&lt;p&gt;Release Notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-352.html&quot;&gt;https://trino.io/docs/current/release/release-352.html&lt;/a&gt;
At the time of recording 352 was not out yet. We will discuss a few of the
 changes coming down the pipeline to look forward to!&lt;/p&gt;

&lt;h2 id=&quot;naming-our-new-bunny&quot;&gt;Naming our new bunny!&lt;/h2&gt;
&lt;p&gt;That’s right, you submitted your names, and we are now happy to announce the top
names that were chosen, and we will choose the name by a community poll.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/trino-og.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The running names are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lepi: short for Lepus, the constellation under Orion that is in the shape of
 a bunny and said to be chased by Orion or Orion’s dogs. They cannot catch it
  because the bunny is fast &lt;a href=&quot;https://en.wikipedia.org/wiki/Lepus_(constellation)&quot;&gt;https://en.wikipedia.org/wiki/Lepus_(constellation)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Neut: early name used by community members through hearsay to unofficially 
 name the bunny until it had a real name. This name, which is a portmanteau when
  combined with Trino (Neut-Trino) became popular among a few members.&lt;/li&gt;
  &lt;li&gt;Nu: math symbol, with a similar prefix use of Nu + Trino to refer to
  the neutrino origins. Also in physics nu represents any of three kinds of
  neutrino in particle physics.&lt;/li&gt;
  &lt;li&gt;Commander Bun Bun:  a name suggested by a community member’s child who loves 
 the bunny!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto® Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 352 Release Notes discussed: https://trino.io/docs/current/release/release-352.html At the time of recording 352 was not out yet. We will discuss a few of the changes coming down the pipeline to look forward to!</summary>

      
      
    </entry>
  
    <entry>
      <title>9: Distributed hash-joins, and how to migrate to Trino</title>
      <link href="https://trino.io/episodes/9.html" rel="alternate" type="text/html" title="9: Distributed hash-joins, and how to migrate to Trino" />
      <published>2021-01-21T00:00:00+00:00</published>
      <updated>2021-01-21T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/9</id>
      <content type="html" xml:base="https://trino.io/episodes/9.html">&lt;script type=&quot;text/x-mathjax-config&quot;&gt;
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ [&apos;$&apos;,&apos;$&apos;], [&quot;\\(&quot;,&quot;\\)&quot;] ],
      processEscapes: true
    }
  });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;
&lt;/script&gt;

&lt;h2 id=&quot;release-351&quot;&gt;Release 351&lt;/h2&gt;
&lt;p&gt;Release Notes discussed: &lt;a href=&quot;https://trino.io/docs/current/release/release-351.html&quot;&gt;https://trino.io/docs/current/release/release-351.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This release was really all about renaming everything from a client perspective
to use Trino instead of Presto. Manfred will cover all the work that was done
to do this for the release&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week-how-do-i-migrate-from-presto-releases-earlier-than-350-to-trino-releases-351&quot;&gt;Question of the week: How do I migrate from presto releases earlier than 350 to Trino releases 351?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;concept-of-the-week-distributed-hash-join&quot;&gt;Concept of the week: Distributed Hash-join&lt;/h2&gt;
&lt;p&gt;Joins are one of the most useful and powerful operations performed by databases.
There are many approaches to joining data. Various types of indices can
facilitate joins. The order in which a join gets executed can vary depending
on geographic distribution of the data, selectivity of the query where the
fewer rows that get returned from a query the higher the selectivity, and the
information available from indexes and table statistics can inform an
execution engine how to form a query. One thing that stays consistent about
virtually every query engine in the world is that they occur over two tables
at a time no matter how many tables exist in the query. Some joins may occur
in parallel but any given join will only involve two tables.&lt;/p&gt;

&lt;p&gt;If you wrote a simple program that did what a join does, it might look something
like a nested loop:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;public class CartesianProductNestedLoop {
    public static void main(String[] args) {
        int[] outerTable = {2, 4, 6, 8, 10, 12};
        int[] innerTable = {1, 2, 3, 4};

        for (int o : outerTable) {
            for (int i : innerTable) {
                System.out.println(o + &quot;, &quot; + i);
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since there is no predicate such as something you would see in a WHERE clause, 
the join returns the cartesian product of these two tables. It is useful also
to portray these joins in relation algbegra. For example, the join above is
written as $O \times I$ where $O$ is the outer table and $I$ is the inner table.
$\times$ indicates that the join we are using is the cartesian product as we
see below. Another useful way to view this is to visualize the join as a graph.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/cartesian.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE: When using relational algebra or using a graph to represent a join, it
is convention that the table in the outer loop of this join is always shown on
the left. This distinction becomes important as you will see below.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is the output from the cartesian product join above.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2, 1
2, 2
2, 3
2, 4
4, 1
4, 2
4, 3
4, 4
6, 1
6, 2
6, 3
6, 4
8, 1
8, 2
8, 3
8, 4
10, 1
10, 2
10, 3
10, 4
12, 1
12, 2
12, 3
12, 4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notice also that we are treating these tables the same since we have to read 
each of the values to print out the cartesian product it doesn’t make a 
difference which table is the inner table and which is the outer yet. We could 
swap these tables for inner and outer and still get the same performance of $O
(n^2)$.&lt;/p&gt;

&lt;p&gt;Now, what if you did have some criteria that filtered out some rows that get 
returned from this product. Since it is quite common to join tables by an id,
the most common criteria for a join is that the values are equal since values
in rows with matching ids are related. Initially we can get away with just
adding an if statement, print when true, and be done with it. Let’s
do that.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;public class NaturalJoinNestedLoop {
    public static void main(String[] args) {
        int[] outerTable = {2, 4, 6, 8, 10, 12};
        int[] innerTable = {1, 2, 3, 4};

        for (int o : outerTable) {
            for (int i : innerTable) {
                if(o == i){
                    System.out.println(o + &quot;, &quot; + i);
                }
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Lets assume that the integers in these tables are values of a column called id 
in both tables that uniquely identify a row in each table. When you have a
commonly named column like this, the operation of joining based on columns that
share the same name is a natural join. In relational algebra it is denoted with
a litte bowtie, for example, $O \bowtie I$. We could also use the Equi-join
notation that specifies the exact join columns $O \bowtie_(O.id = I.id) I$. The
graph will look about the same as before but we only change the operation we
are performing.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/natural_join.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Now we only get the output of two rows as we should expect.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2, 2
4, 4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One important aspect that that gets glossed over in this simple example is that
the data is small and in memory versus a database initially has to retrieve the 
data from disk. Reading values from a disk using random access is 
&lt;a href=&quot;https://queue.acm.org/detail.cfm?id=1563874&quot;&gt;100,000 times faster on memory&lt;/a&gt;.
That being said, it’s really important to consider the fact that reading the 
values over and over again is going to be an exponential exercise, multiplied by 
100,000.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/disk_vs_mem.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;It would be better if we could read one table into memory once, and reuse those 
values as you scan over the data of the other table. There is a common name for 
both of these. Trino first reads the inner table into memory, to avoid having
to read this table for each row in the outer table. We call this table the build
table, as with the first scan you build the table in memory. Trino then streams 
the rows from the outer table and performs the join with the build table. We
call this table the probe table.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;import java.util.*;

public class BuildProbeLoops {
    public static void main(String[] args) {
        int[] probeTable = {2, 4, 6, 8, 10, 12};
        int[] buildTable = {1, 2, 3, 4};
        Map&amp;lt;Integer, Integer&amp;gt; buildTableCache = new HashMap&amp;lt;&amp;gt;();

        for (int row : buildTable) {
            //in this case the row is actually just the join column
            int hash = row;

            buildTableCache.put(hash, row);
        }

        for (int row : probeTable) {
            //in this case the row is actually just the join column
            int hash = row;

            Integer buildRow = buildTableCache.get(hash);
            if(buildRow != null){
                System.out.println(buildRow + &quot;, &quot; + row);
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;While it may seem redundant to do all of this extra work for this simple
example, this saves minutes to hours when reading from disk and the data you are
reading is big enough. The runtime complexity has now dropped from $O(n^2)$ to 
just a linear runtime of $O(n)$. The relational algebra for this table is now
$P \bowtie B$, where $P$ is the probe table and $B$ is the build table. Notice 
the relational algebra for this hasn’t changed, we just now specify that we do
a build on the inner table and probe the outer table.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/natural_join2.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;One thing to consider is the size of each table, if we are fitting one of the
tables into memory, it’s probably best we choose the smaller table to use as
the build table. Hopefully this helps you understand now why we now specify
between a build and a probe table. This will help in our discussions about
query optimization and dynamic filtering which we will discuss on the next
show.&lt;/p&gt;

&lt;p&gt;Another interesting subtopic of this that we won’t get into today are &lt;a href=&quot;http://www.oaktable.net/content/right-deep-left-deep-and-bushy-joins&quot;&gt;left
-deep and right-deep plans&lt;/a&gt;.
Since now we know that the probe table is always on the left and our build table
is on the right, the shape of our query matters. Consider the difference between
these two trees.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/left_deep.png&quot; /&gt;
&lt;img width=&quot;33%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/right_deep.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The left-deep tree vs right-deep trees have big implications on the speed of the
query. This is a bit tangential for our talk today. Let’s  finally move on to
hash-joins!&lt;/p&gt;

&lt;p&gt;In Trino, a hash-join is the common algorithm that is used to join tables. In
fact the last snippet of code is really all that is invovled in implementing a 
hash-join. So in explaining probe and build, we have already covered how the
algorithm works conceptually.&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;50%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/tables.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The big difference is that trino implements a distributed hash-join over two
types of parallelism.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Joined tables are distributed over the worker nodes to achieve inter-node
 parallelism. Instead of the hash value simply being used to match with other
  rows, it is also used to route to specific Trino worker nodes. Rows that meet
  the equijoin criteria then are processed by the workers for a set of ids.&lt;/li&gt;
  &lt;li&gt;Within the node, workers can use the hash to further distribute the rows
  across multithreaded applications. This intranode-parallelism allows for there
  to be a single thread for every hash partition.&lt;/li&gt;
  &lt;li&gt;Finally, once all of these threads are finished determining which rows pass
 the join criteria, the probe side then begins to emit rows in larger batches,
 which can quickly be thrown out or kept based on which partitions exist on a 
 given worker.&lt;/li&gt;
&lt;/ol&gt;

&lt;p align=&quot;center&quot;&gt;
&lt;img align=&quot;center&quot; width=&quot;75%&quot; height=&quot;100%&quot; src=&quot;/assets/episode/9/parallelism.png&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Great resources on this topic where some of the examples above derive:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Hash_join&quot;&gt;https://en.wikipedia.org/wiki/Hash_join&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/5-query-opt/join-order2.html&quot;&gt;http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/5-query-opt/join-order2.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;how-to-contribute-documentation-and-testimonials&quot;&gt;How to contribute documentation and testimonials&lt;/h2&gt;
&lt;p&gt;Instead of a PR this week Manfred discusses some notes on how to contribute to 
documentation and testimonials.&lt;/p&gt;

&lt;p&gt;If you want to show us some 💕, please &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/.github/star.png&quot;&gt;give us a ⭐ on Github&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto® Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary></summary>

      
      
    </entry>
  
    <entry>
      <title>8: Trino: A ludicrously fast query engine: past, present, and future</title>
      <link href="https://trino.io/episodes/8.html" rel="alternate" type="text/html" title="8: Trino: A ludicrously fast query engine: past, present, and future" />
      <published>2021-01-11T00:00:00+00:00</published>
      <updated>2021-01-11T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/8</id>
      <content type="html" xml:base="https://trino.io/episodes/8.html">&lt;h2 id=&quot;in-this-episode&quot;&gt;In this episode…&lt;/h2&gt;

&lt;p&gt;Well, we’re back, and no longer waving the Presto® flag like we did before. If 
you haven’t heard, Presto® SQL is now Trino (
&lt;a href=&quot;https://trino.io/blog/2020/12/27/announcing-trino.html&quot;&gt;Read more about that here&lt;/a&gt;).
In this episode, we sit down with the four original creators of Presto® and
discuss the journey in more detail of what led us to our current trajectory with
the Presto® SQL project and why that is now being renamed to Trino. We also
discuss how this affects those that are using Trino. If you are developing on
Trino and have the old namespace check out the 
&lt;a href=&quot;https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html&quot;&gt;guide to migrate here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also discuss the differences between the two projects. It is actually a lot
after two years since the split, and we recommend looking at the
&lt;a href=&quot;https://trino.io/blog/2020/01/01/2019-summary.html&quot;&gt;blog we wrote at the end of 2019&lt;/a&gt;
and keep your eyes peeled for the blog we are writing to summarize the changes
in 2020!&lt;/p&gt;

&lt;p&gt;Finally, we cover some sneak peeks at the roadmap for Trino in 2021.&lt;/p&gt;

&lt;p&gt;If you want to show us some 💕, please &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/.github/star.png&quot;&gt;give us a ⭐ on Github&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto® Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Trino, check out the definitive guide from 
OReilly. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this episode…</summary>

      
      
    </entry>
  
    <entry>
      <title>Trino in 2020 - An amazing year in review</title>
      <link href="https://trino.io/blog/2021/01/08/2020-review.html" rel="alternate" type="text/html" title="Trino in 2020 - An amazing year in review" />
      <published>2021-01-08T00:00:00+00:00</published>
      <updated>2021-01-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/01/08/2020-review</id>
      <content type="html" xml:base="https://trino.io/blog/2021/01/08/2020-review.html">&lt;p&gt;&lt;strong&gt;Wow!&lt;/strong&gt; If you would have to sum up what happened in the last year in this
great community, &lt;strong&gt;wow&lt;/strong&gt; would be it. It is truly awe-inspiring to be part of
this incredible journey of Trino. Oh yeah, on that note. Our community and
project &lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;chose the new name Trino&lt;/a&gt;,
to be able to continue to innovate and develop freely as a community of peers.
Presto® and Presto® SQL are a thing of the past.&lt;/p&gt;

&lt;p&gt;Now that is out of the way, let’s dive right in and see what all our community
members across the globe have created with us!&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;a href=&quot;/blog/2020/01/01/2019-summary.html&quot;&gt;2019 was a big year for us&lt;/a&gt;, but check
out how 2020 eclipsed even that!&lt;/p&gt;
&lt;h2 id=&quot;by-the-numbers&quot;&gt;By the numbers&lt;/h2&gt;

&lt;p&gt;Even the size and growth of &lt;a href=&quot;/slack.html&quot;&gt;our community on Slack&lt;/a&gt; is impressive:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Started in January 2020 with ~1600 members and 280 weekly active&lt;/li&gt;
  &lt;li&gt;Over 3200 members by December 2020&lt;/li&gt;
  &lt;li&gt;560 members active weekly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The innovation and change of &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the source code on GitHub&lt;/a&gt; is a result of the hard work of the community:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Over 4000 commits merged&lt;/li&gt;
  &lt;li&gt;More than 2800 pull requests received&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/release.html#releases-2020&quot;&gt;23 releases&lt;/a&gt;, nearly every two
weeks basically!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you can see, much of the excitement around the name change has quickly
increased the number of stars we have on GitHub. While some of this certainly
stems from an initial buzz around a shiny new name, we also believe that this
name change has brought clarity to the community. Trino is an improved version,
supported by the founders and creators of Presto®, along with the major
contributors.&lt;/p&gt;

&lt;p&gt;And if you have not done so already, make sure to &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;star the
repository&lt;/a&gt; and &lt;a href=&quot;/slack.html&quot;&gt;join us on slack&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;features-and-code&quot;&gt;Features and code&lt;/h2&gt;

&lt;p&gt;While everything mentioned is already exciting, the true work is visible in the
new features and improvements in Trino. It is a long list, but read on. You
won’t want to miss anything.&lt;/p&gt;

&lt;h3 id=&quot;improvements-to-ansi-sql-support&quot;&gt;Improvements to ANSI SQL support&lt;/h3&gt;

&lt;p&gt;A core feature of Trino is the ability to use the same standard SQL for any
connected data source. These improvements empower all users.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Variable-precision temporal types, with precision down to picoseconds
(10&lt;sup&gt;−12&lt;/sup&gt;s). This a very important feature for any time critical
systems such as financial transactions processing&lt;/li&gt;
  &lt;li&gt;Correct, and now SQL specification compliant timestamp semantics, making
migration of SQL statements from other compliant systems such as many RDBMSs
easier&lt;/li&gt;
  &lt;li&gt;Implicit coercions for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; clause&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RANGE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPS&lt;/code&gt;-based window frames&lt;/li&gt;
  &lt;li&gt;More support for various shapes of correlated subqueries&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTERSECT ALL&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXCEPT ALL&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Parameter support in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt; clause&lt;/li&gt;
  &lt;li&gt;Experimental support for &lt;a href=&quot;/docs/current/sql/select.html?highlight=recursive#with-recursive-clause&quot;&gt;recursive queries&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Enforcement of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT NULL&lt;/code&gt; constraints when inserting data&lt;/li&gt;
  &lt;li&gt;Quantified comparisons (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt; ALL (...)&lt;/code&gt;) in aggregation queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;other-query-improvements&quot;&gt;Other query improvements&lt;/h3&gt;

&lt;p&gt;A number of other features were added to make querying your data sources with
Trino even more powerful:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/language/types.html#t-digest&quot;&gt;T-digest data type&lt;/a&gt; and functions
for approximate quantile computations&lt;/li&gt;
  &lt;li&gt;Support for setting and reading column comments&lt;/li&gt;
  &lt;li&gt;Numerous new functions including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;concat_ws()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;regexp_count()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;regexp_position()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;contains_sequence()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;murmur3()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_unixtime_nanos()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_iso8601_timestamp_nanos()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;human_readable_seconds()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bitwise&lt;/code&gt; operations, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;luhn_check()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;approx_most_frequent()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;translate()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;starts_with()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;p&gt;Trino is already &lt;a href=&quot;/index.html&quot;&gt;ludicrously fast&lt;/a&gt;. But then again, even faster is
better, so we worked on that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Improved pushdown of complex operations into connectors, including
&lt;a href=&quot;/docs/current/optimizer/pushdown.html&quot;&gt;aggregation pushdown&lt;/a&gt; and TopN
pushdown.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/14/dynamic-partition-pruning.html&quot;&gt;Dynamic filtering and partition pruning&lt;/a&gt;, which can improve performance of
highly selective joins manyfold.&lt;/li&gt;
  &lt;li&gt;Cost-based decisions for queries containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN &amp;lt;subquery&amp;gt;&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause.&lt;/li&gt;
  &lt;li&gt;Information_schema performance improvements, which benefit third-party BI
tools that need to inspect table metadata, for example DBeaver, Datagrip,
Power BI, Tableau, Looker, and others.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/14/dereference-pushdown.html&quot;&gt;Faster queries on nested data in Parquet and ORC&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Faster and more accurate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;approx_percentile&lt;/code&gt;, based on t-digest data structure.&lt;/li&gt;
  &lt;li&gt;Support of Bloom filters in ORC.&lt;/li&gt;
  &lt;li&gt;Experimental, optimized Parquet writer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;p&gt;The more data you access with Trino, the more it becomes critical to secure it.
With that in mind we added a lot of improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;a href=&quot;/docs/current/admin/web-interface.html&quot;&gt;Web UI&lt;/a&gt; now requires
authentication. Various actions such as viewing query details, killing
queries, etc., are protected with authorization checks based on the identity
of the user. Additionally, the UI now supports OAuth2 for user identification.&lt;/li&gt;
  &lt;li&gt;External and internal APIs are now properly secured with authentication and
authorization checks. Importantly, this fixes a &lt;a href=&quot;https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-15087&quot;&gt;CVE reported
vulnerability&lt;/a&gt;
that affects all older versions of Presto®.&lt;/li&gt;
  &lt;li&gt;A &lt;a href=&quot;/docs/current/security/secrets.html&quot;&gt;new mechanism to externalize secrets in configuration
 files&lt;/a&gt; that makes it easier to integrate
 with third-party secret managers and deployment tools.&lt;/li&gt;
  &lt;li&gt;Support for JSON Web Key (JWK) authentication and &lt;a href=&quot;/docs/current/develop/certificate-authenticator.html&quot;&gt;pluggable certificate
authenticators&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Add new &lt;a href=&quot;docs/current/security/salesforce.html&quot;&gt;Salesforce authenticator&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;The query engine and access control SPIs now support injecting row filters and
column masks.&lt;/li&gt;
  &lt;li&gt;New syntax for managing permissions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GRANT/REVOKE&lt;/code&gt; on schema,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALTER TABLE/SCHEMA/VIEW ... SET AUTHORIZATION&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;data-sources&quot;&gt;Data sources&lt;/h2&gt;

&lt;p&gt;Trino empowers you to use one platform to access all data sources. Connectors
enable this and we added numerous new connectors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/iceberg.html&quot;&gt;Iceberg&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/prometheus.html&quot;&gt;Prometheus&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/oracle.html&quot;&gt;Oracle&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/pinot.html&quot;&gt;Pinot&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/druid.html&quot;&gt;Druid&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/bigquery.html&quot;&gt;BigQuery&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/memsql.html&quot;&gt;MemSQL&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All other connectors received a large host of improvements. Let’s just look at
two popular connectors:&lt;/p&gt;

&lt;h3 id=&quot;hive-connector-for-hdfs-s3-azure-and-cloud-object-storage-systems&quot;&gt;Hive connector for HDFS, S3, Azure and cloud object storage systems&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Complex Hive views, allows integration with Hive or simplifying
migration from Hive&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;ACID transactional tables&lt;/a&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DELETE&lt;/code&gt; support&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/hive-caching.html&quot;&gt;Built-in storage caching&lt;/a&gt; and
support for &lt;a href=&quot;/docs/current/connector/hive-alluxio.html&quot;&gt;external caching with
Alluxio&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;New procedures: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.drop_stats()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register_partition()&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unregister_partition()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/docs/current/connector/hive-azure.html&quot;&gt;Azure object storage&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/docs/current/connector/hive-s3.html&quot;&gt;S3 encrypted files, flexible S3 security mappings and
Intelligent-Tiering S3 storage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;elasticsearch-connector&quot;&gt;Elasticsearch connector&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;/docs/current/connector/elasticsearch.html&quot;&gt;Elasticsearch connector&lt;/a&gt;
received numerous powerful improvements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Password authentication&lt;/li&gt;
  &lt;li&gt;Support for index aliases&lt;/li&gt;
  &lt;li&gt;Support for array types, Nested, and IP type&lt;/li&gt;
  &lt;li&gt;Support for Elasticsearch 7.x&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;runtime-improvements&quot;&gt;Runtime improvements&lt;/h2&gt;

&lt;p&gt;Operating and maintaining a Trino cluster takes a significant amount of
resources. So any work to improve the runtime needs have a significant positive
impact:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/installation/deployment.html#java-runtime-environment&quot;&gt;Requirement to use Java
11&lt;/a&gt;, with
better GC performance, overall performance, and improved container
support&lt;/li&gt;
  &lt;li&gt;Support for ARM64-based processors to run Trino&lt;/li&gt;
  &lt;li&gt;Support for minimum number of workers before query starts, useful for
implementing autoscaling&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/25/data-integrity-protection.html&quot;&gt;Data integrity checks for network transfers&lt;/a&gt; to prevent data corruption during
processing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;everything-else&quot;&gt;Everything else&lt;/h2&gt;

&lt;p&gt;There is so much more to capture, and you really would have to read all the
&lt;a href=&quot;/docs/current/release.html#releases-2020&quot;&gt;release notes&lt;/a&gt; in detail to know it
all. To safe you from that, here are a few more noteworthy changes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Experimental support for materialized views in Iceberg connector&lt;/li&gt;
  &lt;li&gt;JDBC driver backward compatibility tests&lt;/li&gt;
  &lt;li&gt;Support for multiple event listeners&lt;/li&gt;
  &lt;li&gt;Added Python client support for exec with parameters&lt;/li&gt;
  &lt;li&gt;New look and navigation for the &lt;a href=&quot;/docs/current/index.html&quot;&gt;documentation&lt;/a&gt;, and
lots of new content&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;community-resources-and-events&quot;&gt;Community resources and events&lt;/h2&gt;

&lt;p&gt;Beyond the raw code and helping each other, the community collaborated on other
helpful resources like books and in-depth video tutorials.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/mattsfuller&quot;&gt;Matt&lt;/a&gt;, &lt;a href=&quot;https://github.com/mosabua&quot;&gt;Manfred&lt;/a&gt;,
and &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin&lt;/a&gt;  published the book &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The
Definitive Guide&lt;/a&gt; with O’Reilly. Over 5000
readers took advantage of the &lt;a href=&quot;/blog/2020/04/11/the-definitive-guide.html&quot;&gt;free digital copy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Brian and Manfred launched the live streaming event &lt;a href=&quot;/broadcast/index.html&quot;&gt;Trino Community
Broadcast&lt;/a&gt;, and grew their audience and back catalog to
include some very useful material. If you have not seen it yet, go and &lt;a href=&quot;/broadcast/episodes.html&quot;&gt;watch
some old episodes&lt;/a&gt; and join us in the next ones.&lt;/p&gt;

&lt;p&gt;We also had a number of other online events and presentations, with direct
participation of our community members:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A &lt;a href=&quot;/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020.html&quot;&gt;dedicated conference event&lt;/a&gt;
for the community in Japan was very successful.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;/blog/2020/09/28/argentina-big-data-meetup.html&quot;&gt;Argentina Big Data Meetup&lt;/a&gt; had a large audience from the
community in South America&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A series of virtual events around the project started with a roadmap and
overview meeting and included a number real world use case examples at scale:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Trino&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;Trino at Pinterest&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;Trino Migration at ARM Treasure Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;Trino at Zuora&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another series of training classes with the project founders was hugely
successful. It includes very valuable content for any Trino user, from beginners
to experts, that you should not miss:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;Advanced SQL in Trino with David&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Understanding and Tuning Trino Query Processing with Martin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/13/training-security.html&quot;&gt;Securing Trino with Dain&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2020/08/27/training-performance.html&quot;&gt;Configuring and Tuning Trino with Dain&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;2020 was a wild ride for us all. Trino and the Trino community definitely
emerged as a winner, and we are looking forward to a very bright future with you
all.&lt;/p&gt;

&lt;p&gt;A couple of ongoing work is already underway and very promising:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Optimized Parquet reader, on par with ORC reader support&lt;/li&gt;
  &lt;li&gt;Support for SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MERGE&lt;/code&gt; statements&lt;/li&gt;
  &lt;li&gt;Oauth2 support for JDBC&lt;/li&gt;
  &lt;li&gt;Support for SQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WINDOW&lt;/code&gt; clause and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MATCH_RECOGNIZE&lt;/code&gt; usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re starting the new year with a shiny new name, a cute little bunny, and a
very vibrant community. The future is looking great for Trino!&lt;/p&gt;

&lt;p&gt;Don’t hesitate and miss out on all the benefits of Trino. Join us &lt;a href=&quot;/slack.html&quot;&gt;on
Slack&lt;/a&gt; to get started!&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Manfred Moser, Brian Olsen</name>
        </author>
      

      <summary>Wow! If you would have to sum up what happened in the last year in this great community, wow would be it. It is truly awe-inspiring to be part of this incredible journey of Trino. Oh yeah, on that note. Our community and project chose the new name Trino, to be able to continue to innovate and develop freely as a community of peers. Presto® and Presto® SQL are a thing of the past. Now that is out of the way, let’s dive right in and see what all our community members across the globe have created with us!</summary>

      
      
    </entry>
  
    <entry>
      <title>Migrating from PrestoSQL to Trino</title>
      <link href="https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html" rel="alternate" type="text/html" title="Migrating from PrestoSQL to Trino" />
      <published>2021-01-04T00:00:00+00:00</published>
      <updated>2021-01-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html">&lt;p&gt;As we previously announced, we’re
&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;rebranding Presto SQL as Trino&lt;/a&gt;.
Now comes the hard part: migrating to the new version of the software.
We just released the first version,
&lt;a href=&quot;/docs/current/release/release-351.html&quot;&gt;Trino 351&lt;/a&gt;,
which uses the name Trino everywhere, both internally and externally.
Unfortunately, there are some unavoidable compatibility aspects that
administrators of Trino need to know about. We hope this post makes the
transition as smooth as possible.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;things-that-havent-changed&quot;&gt;Things that haven’t changed&lt;/h1&gt;

&lt;p&gt;Let’s start with the good news. For end users running queries against Trino,
everything should be the same. There are no changes to the SQL language,
SQL functions, session properties, etc.&lt;/p&gt;

&lt;p&gt;Users now see &lt;em&gt;Trino&lt;/em&gt; in error messages, a different logo in the web UI,
and error stack traces have a different package name, but otherwise they
won’t know that anything has changed. All of their views, reports,
or other stored queries will work as before.&lt;/p&gt;

&lt;p&gt;Similarly for administrators, except for a few things noted in the
&lt;a href=&quot;/docs/current/release/release-351.html&quot;&gt;Trino 351 release notes&lt;/a&gt;,
all the configuration properties are the same.&lt;/p&gt;

&lt;h1 id=&quot;client-protocol-compatiblity&quot;&gt;Client protocol compatiblity&lt;/h1&gt;

&lt;p&gt;The client protocol is how clients, such as the
&lt;a href=&quot;docs/current/client/cli.html&quot;&gt;CLI&lt;/a&gt; or
&lt;a href=&quot;/docs/current/client/jdbc.html&quot;&gt;JDBC driver&lt;/a&gt;,
talk to Trino. It uses standard HTTP as the underlying communications
protocol, with some custom HTTP headers to communicate values
to and from Trino. Unfortunately, those header names started with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Presto-&lt;/code&gt; and thus had to be changed to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X-Trino-&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The Trino CLI and JDBC driver send the new headers, so they are
&lt;strong&gt;only compatible with Trino versions 351 and newer&lt;/strong&gt;. Users should
wait to upgrade the CLI or JDBC driver until the Trino servers they
talk to have been upgraded.&lt;/p&gt;

&lt;p&gt;Out of the box, the Trino server does not work with older clients.
However, in order to support a graceful transition, you can allow the
server to support older clients by adding a configuration property:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;protocol.v1.alternate-header-name=Presto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;We recommend using version 350 of CLI and JDBC driver as the transition version&lt;/strong&gt;.
It has all the newest features such as variable precision timestamps,
has been tested with a range of older server versions, and is the last
version to support older servers.&lt;/p&gt;

&lt;h1 id=&quot;jdbc-driver&quot;&gt;JDBC driver&lt;/h1&gt;

&lt;p&gt;The URL prefix for the JDBC driver now starts with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:trino:&lt;/code&gt; instead
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:presto:&lt;/code&gt;. This means that any client applications using the
JDBC driver need to update their connection configuration. The old
prefix is still supported, but will be removed in a future release.&lt;/p&gt;

&lt;p&gt;The class name of the driver is now &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.trino.jdbc.TrinoDriver&lt;/code&gt;. This is
of no concern to most users, as the driver is normally accessed via the
standard JDBC auto-discovery mechanism based on the URL. As with the URL prefix,
the old name is still supported, but will be removed in a future release.&lt;/p&gt;

&lt;h1 id=&quot;server-rpm&quot;&gt;Server RPM&lt;/h1&gt;

&lt;p&gt;The name of the RPM has changed, so it is treated as a different RPM, and
thus you cannot simply upgrade from the old version to the new version.
All of the directories for the RPM that contained the name &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto&lt;/code&gt; now
use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; instead. You likely want to uninstall the old RPM, rename
the config and log directories, then install the new RPM.&lt;/p&gt;

&lt;h1 id=&quot;docker-image&quot;&gt;Docker image&lt;/h1&gt;

&lt;p&gt;The &lt;a href=&quot;https://hub.docker.com/r/trinodb/trino&quot;&gt;Trino Docker image&lt;/a&gt; is now
published as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trinodb/trino&lt;/code&gt;. The supported configuration directory is
now &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/trino&lt;/code&gt;. The CLI is now named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;jmx-mbean-naming&quot;&gt;JMX MBean naming&lt;/h1&gt;

&lt;p&gt;Trino runs on the JVM, which has the JMX framework as a standard way to expose
system and application metrics. Trino exposes a huge number of JMX metrics for
administrators to monitor their clusters. You might be using these metrics
via your monitoring system, or perhaps you are accessing them in SQL via the
Trino &lt;a href=&quot;/docs/current/connector/jmx.html&quot;&gt;JMX connector&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The metrics for Trino server now start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto&lt;/code&gt;. You
might need to update this name in your monitoring system, or you can revert
to the old name:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;jmx.base-name=presto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Similarly, the metrics for the Elasticsearch, Hive, Iceberg, Raptor, and Thrift
connectors now start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino.plugin&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;presto.plugin&lt;/code&gt;. Again,
you might need to update these names in your monitoring system, or you can
revert to the old name. For example, for the Hive connector:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;jmx.base-name=presto.plugin.hive
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;thrift-connector&quot;&gt;Thrift connector&lt;/h1&gt;

&lt;p&gt;The &lt;a href=&quot;/docs/current/connector/thrift.html&quot;&gt;Thrift connector&lt;/a&gt; had many
&lt;a href=&quot;/docs/current/release/release-351.html#thrift-connector-changes&quot;&gt;backwards incompatible changes&lt;/a&gt;
to both the Thrift service interface and the configuration properties. You need
update all of your implementations of the Thrift service used by the connector.&lt;/p&gt;

&lt;h1 id=&quot;spi&quot;&gt;SPI&lt;/h1&gt;

&lt;p&gt;If you have any custom plugins for Trino, such as connectors or functions,
these need to be updated. The package name is now &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;io.trino.spi&lt;/code&gt;, and a
few classes were renamed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoException&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TrinoException&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoPrincipal&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TrinoPrincipal&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoWarning&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TrinoWarning&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are no functional changes, so all you should need to do is update
your imports and rename the references to the above class names.&lt;/p&gt;

&lt;h1 id=&quot;migration-guide&quot;&gt;Migration guide&lt;/h1&gt;

&lt;p&gt;Now that you understand what is different and what you need to change,
you can start thinking about the list of steps needed to perform the
migration. The following is a rough plan for upgrading your environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Prepare to deploy the new version&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Let users know the name is changing, so they are not surprised by the logo changes in the UI.&lt;/li&gt;
  &lt;li&gt;Make sure that users are using recent client versions. Ideally, upgrade them all to
version 350, as mentioned above. You can check the HTTP request logs for the coordinator
to see what client versions are in use.&lt;/li&gt;
  &lt;li&gt;Update your server configuration with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;protocol.v1.alternate-header-name=Presto&lt;/code&gt;
to allow supporting all of your existing Presto clients.&lt;/li&gt;
  &lt;li&gt;If you are using the RPM, have a plan to deal with the new RPM name
and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; directory names.&lt;/li&gt;
  &lt;li&gt;If you are using Docker, use the new image name, make sure your configuration will
be mounted using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; path name, and remember that the CLI is now named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Update any custom plugins to use the new SPI.&lt;/li&gt;
  &lt;li&gt;Check if you have anything using JMX to monitor your clusters, and decide if you will
update them to the new names or set a Trino config to revert to the old names.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Upgrade your servers to Trino 351+&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Upgrade development and staging servers.&lt;/li&gt;
  &lt;li&gt;Upgrade production servers. If you have multiple clusters, you can do them one
at a time, and verify everything is working before moving on to the next one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Upgrade clients&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Upgrade all clients including the CLI, JDBC driver, Python, etc., to the Trino versions.&lt;/li&gt;
  &lt;li&gt;Update any applications using JDBC to use the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jdbc:trino:&lt;/code&gt; connection URL prefix.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Cleanup&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;protocol.v1.alternate-header-name&lt;/code&gt; configuration property.&lt;/li&gt;
  &lt;li&gt;If you configured Trino to use the old JMX names, convert your monitoring system
to use the new JMX names and remove the fallback configs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;getting-help&quot;&gt;Getting help&lt;/h1&gt;

&lt;p&gt;We’re here to help! If you run into any issues while upgrading, or having any
questions or concerns, &lt;a href=&quot;/slack.html&quot;&gt;ask on Slack&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>David Phillips, Dain Sundstrom</name>
        </author>
      

      <summary>As we previously announced, we’re rebranding Presto SQL as Trino. Now comes the hard part: migrating to the new version of the software. We just released the first version, Trino 351, which uses the name Trino everywhere, both internally and externally. Unfortunately, there are some unavoidable compatibility aspects that administrators of Trino need to know about. We hope this post makes the transition as smooth as possible.</summary>

      
      
    </entry>
  
    <entry>
      <title>We’re rebranding PrestoSQL as Trino</title>
      <link href="https://trino.io/blog/2020/12/27/announcing-trino.html" rel="alternate" type="text/html" title="We’re rebranding PrestoSQL as Trino" />
      <published>2020-12-27T00:00:00+00:00</published>
      <updated>2020-12-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/12/27/announcing-trino</id>
      <content type="html" xml:base="https://trino.io/blog/2020/12/27/announcing-trino.html">&lt;p&gt;We’re rebranding PrestoSQL as Trino. The software and the community you have come to love and depend on aren’t 
going anywhere, we are simply renaming. &lt;strong&gt;Trino is the new name for PrestoSQL&lt;/strong&gt;, the project supported by the founders 
and creators of Presto® along with the major contributors – just under a shiny new name. And now you can find us here:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;GitHub: &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;https://github.com/trinodb/trino&lt;/a&gt;. Please give it a &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/.github/star.png&quot;&gt;star&lt;/a&gt;!&lt;/li&gt;
  &lt;li&gt;Twitter: &lt;a href=&quot;https://twitter.com/trinodb&quot;&gt;@trinodb&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Slack: &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;https://trino.io/slack.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn why we’re doing this, read on…&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In 2012, Dain, David and Martin joined the Facebook data infrastructure team. Together with Eric Hwang, we created 
Presto® to address the problems of low latency interactive analytics over Facebook’s massive Hadoop data warehouse. 
One of our non-negotiable conditions was for Presto® to be an open source project. Open source is in our DNA - we had 
all used and participated in open source projects to various degrees in the past, and we recognized the power of open 
communities and developers coming together to build successful software that can stand the test of time.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-announcement/team.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Over the next six years, we worked hard to build a healthy open source community and ecosystem around the project. We 
worked with developers and users all over the world and welcomed them into the Presto® community. Presto® was on a path 
of increasing growth and success, in large part because of the contributions from developers across many fields and all 
over the world.&lt;/p&gt;

&lt;p&gt;Unfortunately in 2018, it became clear that Facebook management wanted to have tighter control over the project and its 
future. This culminated with their decision to grant Facebook developers commit rights on the project without any prior 
experience in Presto®. We strongly believe that this kind of decision is not compatible with having a healthy, open 
community. Moreover, they made this decision by fiat without engaging the Presto® community. As a matter of principle, 
we had no choice but to leave Facebook in order to focus on making sure Presto® continued to be a successful project 
with an open, collaborative and independent community. In reality, the choice was easy.&lt;/p&gt;

&lt;p&gt;We started the Presto Software Foundation in January 2019 as an independent entity to oversee the development of the 
software and community, continuing the meritocratic system that had been in place over the previous 6 years. The community 
quickly consolidated under this new home. We intentionally stayed unemployed over the next 10 months to focus on expanding 
and strengthening the community by working directly with major users and contributors, as well as reaching out to a wider 
group of users and developers across the globe. This resulted in new use cases and an injection of energy, making the 
project more vibrant than ever before as even more new users and developers became engaged. But, don’t take our word for 
it, let the data speak for itself:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/trino-announcement/commits.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Months after this consolidation, Facebook decided to create a competing community using The Linux Foundation®. As a first 
action, Facebook applied for a trademark on Presto®. This was a surprising, norm-breaking move because up until that point, 
the Presto® name had been used without constraints by commercial and non-commercial products for over 6 years. In September 
of 2019, Facebook established the Presto Foundation at The Linux Foundation®, and immediately began working to enforce this 
new trademark. We spent the better part of the last year trying to agree to terms with Facebook and The Linux Foundation 
that would not negatively impact the community, but unfortunately we were unable to do so. The end result is that we must 
now change the name in a short period of time, with little ability to minimize user disruption.&lt;/p&gt;

&lt;p&gt;On a personal note, and as the founders who named the project Presto® in the first place, this is an incredibly sad and 
disappointing turn of events. And while we will always have fondness for the name Presto®, we have come to accept that a 
name is just a name. To be frank, we’re tired of this endless distraction, and we intend to focus on what matters most 
and what we are best at doing – building high quality software everyone can rely on and fostering a healthy community 
of users and developers that build it and support it. We’re not going anywhere – we’re the same people, the same amazing 
software, under a new name: Trino.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you love this project, you already love Trino. ❤️&lt;/strong&gt;&lt;/p&gt;

&lt;html&gt;
&lt;p style=&quot;font-size:0.8em&quot;&gt;Facebook is a registered trademark of Facebook Inc.  The Linux Foundation and Presto are trademarks of The Linux Foundation.&lt;/p&gt;
&lt;/html&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>We’re rebranding PrestoSQL as Trino. The software and the community you have come to love and depend on aren’t going anywhere, we are simply renaming. Trino is the new name for PrestoSQL, the project supported by the founders and creators of Presto® along with the major contributors – just under a shiny new name. And now you can find us here: GitHub: https://github.com/trinodb/trino. Please give it a star! Twitter: @trinodb Slack: https://trino.io/slack.html If you want to learn why we’re doing this, read on…</summary>

      
      
    </entry>
  
    <entry>
      <title>7: Cost Based Optimizer, Decorrelate subqueries, and does Presto make my RDBMS faster?</title>
      <link href="https://trino.io/episodes/7.html" rel="alternate" type="text/html" title="7: Cost Based Optimizer, Decorrelate subqueries, and does Presto make my RDBMS faster?" />
      <published>2020-11-30T00:00:00+00:00</published>
      <updated>2020-11-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/7</id>
      <content type="html" xml:base="https://trino.io/episodes/7.html">&lt;h2 id=&quot;release-348&quot;&gt;Release 348&lt;/h2&gt;
&lt;p&gt;Release Notes discussed: &lt;a href=&quot;https://prestosql.io/docs/current/release/release-348.html&quot;&gt;https://prestosql.io/docs/current/release/release-348.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin’s announcement:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for OAuth2 authorization in Web UI&lt;/li&gt;
  &lt;li&gt;Support for S3 streaming uploads&lt;/li&gt;
  &lt;li&gt;Support for DISTINCT aggregations in correlated subqueries&lt;/li&gt;
  &lt;li&gt;Performance improvement for ORDER BY … LIMIT queries&lt;/li&gt;
  &lt;li&gt;Many improvements and bug fixes to JDBC driver&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manfred’s observations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;SHOW STATS to play around with&lt;/li&gt;
  &lt;li&gt;switch for Hive view translation off, legacy or new coral system&lt;/li&gt;
  &lt;li&gt;a bunch of other Hive connector improvements&lt;/li&gt;
  &lt;li&gt;Iceberg on GCP and Azure&lt;/li&gt;
  &lt;li&gt;Small SPI changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-cost-based-optimizer&quot;&gt;Concept of the week: Cost Based Optimizer&lt;/h2&gt;
&lt;p&gt;We’re continuing our series covering some fundamental topics that build up to
dynamic filtering! This week we’re discussing the cost-based optimizer with
Presto co-creator &lt;a href=&quot;https://twitter.com/mtraverso&quot;&gt;Martin Traverso&lt;/a&gt;!&lt;/p&gt;

&lt;h3 id=&quot;parseranalyzer&quot;&gt;Parser/Analyzer&lt;/h3&gt;

&lt;p&gt;To recap, in &lt;a href=&quot;6.html&quot;&gt;episode 6&lt;/a&gt; we discussed a little bit about the various
forms a query takes from submission to the coordinator, to actually being
executed. We discussed how the parser generates an abstract syntax tree (AST) 
and the analyzer checks for valid SQL including functions and making sure tables
and columns being referenced actually exist.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/7/ast.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here’s an example of an abstract syntax tree from last weeks episode for query
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT * FROM (VALUES 1) t(a) WHERE a = 1 OR 1 = a OR a = 1;&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;planner&quot;&gt;Planner&lt;/h3&gt;

&lt;p&gt;The next phase we discussed was the planner. Internally, the planner and
optimizer overlap substantially, but you can think of the planner as the early
part of the planning phase that generates the logical query, and over several
optimization iterations becomes an optimized distributed query. The planner
generates a new tree data structure called the plan IR (intermediate
representation) that contains nodes representing the steps that need to be
performed in order to answer the query. The leaves of the tree get executed 
first, and the parents of each node are dependent on the action of its child
completing before it can start.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/7/logical.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here’s ab example of a logical plan tree using the same query form the AST
above. Since this query isn’t pulling from a data source, the distributed
plan is equivalent to the logical plan.&lt;/p&gt;

&lt;h3 id=&quot;cost-based-optimizer-cbo&quot;&gt;Cost-Based Optimizer (CBO)&lt;/h3&gt;

&lt;p&gt;In the cost-based optimizer phase, there are various rules that are applied
to the Plan IR that slowly optimize the structure into the final distributed
plan that is then executed. To do this, the optimizer retrieves some statistical
metadata of the tables and their data. This information includes, table row
counts, column data size, column low/high value, distinct column value count, 
and the percentage of null values in a column. With the list of rules that aim
to leverage these statistics, the optimizer improves the query structure that 
improves on parallelism based on the number of workers to the number of sources.&lt;/p&gt;

&lt;p&gt;If you want to jump into the code, start at 
&lt;a href=&quot;https://github.com/prestosql/presto/blob/348/presto-main/src/main/java/io/prestosql/sql/planner/LogicalPlanner.java#L188&quot;&gt;the entry point&lt;/a&gt;
for the planner/optimizer and the initial planning starts on 
&lt;a href=&quot;https://github.com/prestosql/presto/blob/348/presto-main/src/main/java/io/prestosql/sql/planner/LogicalPlanner.java#L200&quot;&gt;this line&lt;/a&gt;. 
This loop is where the &lt;a href=&quot;https://github.com/prestosql/presto/blob/348/presto-main/src/main/java/io/prestosql/sql/planner/LogicalPlanner.java#L205&quot;&gt;actual optimization&lt;/a&gt;
occurs. So if you are interested, maybe grab a brandy 🥃 and take some time to
set your debugger at these points and watch the optimizer do its thing!&lt;/p&gt;

&lt;p&gt;Refer to chapter 4 in 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;
pg. 50.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-1415-decorrelate-subqueries-with-limit-or-topn&quot;&gt;PR of the week: PR 1415 Decorrelate subqueries with Limit or TopN&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/prestosql/presto/pull/1415&quot;&gt;https://github.com/prestosql/presto/pull/1415&lt;/a&gt;, 
done by Presto contributer and Starburst Engineer &lt;a href=&quot;https://github.com
/kasiafi&quot;&gt;kasiafi&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before we can jump into this PR, let’s discuss what a subquery is and further
what a correlated subquery is. In SQL you have a nested query that runs within
another query, typically embedded within a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; statement.
Take this query for example:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;60&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this example, we have a standard non-correlated subquery that runs on 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table2&lt;/code&gt;. The reason it is not correlated is because there are no dependencies
on the parent query that is being run on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;table&lt;/code&gt;. This type of query enables the
SQL engine to run the subquery first and then use those results to run the
parent query after. In the case of a correlated query, you typically have at
least one criterion in the nested query that depends on the parent. This
requires that the nested query gets executed for each row of the parent query. 
Take a look at this correlated query:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;60&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this example, we are running the subquery in the context of the row in order
to evaluate the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t1.b&lt;/code&gt;. Having this query run for every row of the
parent query is certainly not ideal if it is not required and that is why
subquery decorrelation is a common optimization technique if an equivalent
non-correlated subquery exists for a given correlated subquery.&lt;/p&gt;

&lt;p&gt;This pull request adds a rule that added the ability for Presto to handle the
decorrelation of a subquery containing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; or (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER&lt;/code&gt; + &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; i.e.
TopN) clauses. So, the common trick during decorrelation is to turn it into a 
query that can process the results from the inner table in one shot. The
approach is to flatten the results of executing the subquery for every row into
a single stream of rows before it is finally ready for execution.&lt;/p&gt;

&lt;p&gt;This change also applies to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL&lt;/code&gt; join, which behaves a lot like a nested
subquery only that it acts as a table and returns multiple rows instead of
just a single row.&lt;/p&gt;

&lt;h2 id=&quot;pr-demo-pr-1415-decorrelate-subqueries-with-limit-or-topn&quot;&gt;PR Demo: PR 1415 Decorrelate subqueries with Limit or TopN&lt;/h2&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;###&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Fails&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Returns&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;more&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;than&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;one&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subquery&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;This&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actually&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fails&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;execution&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;during&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;planning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optimizing&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;two&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;below&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fail&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Limit&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correlated&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;non&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equality&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predicate&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subquery&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TopN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;correlated&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;non&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equality&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predicate&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subquery&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;After the show Kasia pointed out that the failing queries were not all failing 
for the same reason. The first failing query above actually gets planned and
executed, but the exception occurs during the execution. The rest actually
fail during the planning and optimization phase as they were unable to be 
decorrelated due to the issue I line out in the comments above.&lt;/p&gt;

&lt;h2 id=&quot;question-of-the-week&quot;&gt;Question of the week:&lt;/h2&gt;

&lt;p&gt;In this week’s question, we answer: Will running Presto on my relational 
database make processing faster?&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I have been going over the docs of PrestoSQL and it seems to fit some of my 
requirements. I am little concerned about the resources needed to run Presto 
in production. Because the size of my prod data is between 3-5GB and there
will be very minimal data growth. Is Presto suitable for such a small 
data size?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Many times, the idea that Presto is fast gets conflated with the idea that
Presto is a good fit for all use cases. It is important to understand that
Presto is a) not a database b) not developed for OLTP workloads and c) built
to handle data at the scale of Terabytes to Petabytes over distributed queries.
Since Presto uses a connector framework, it also has an added benefit of running
federated queries to whatever data source that returns data that can be
represented in some columnar fashion.&lt;/p&gt;

&lt;p&gt;For relatively small size data sets you should try directly using your
relational database first. Doing this is better for small data sets. Database
indexes are really nice if you’re not in big data world and if you give your
SQL Server say 10 GB memory, it should be running fully in-memory and thus
— fast.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;
&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://prestosql.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Feb 9 - Feb 10 &lt;a href=&quot;http://starburstdata.com/datanova&quot;&gt;http://starburstdata.com/datanova&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://prestosql.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://prestosql.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/08/13/training-security.html&quot;&gt;https://prestosql.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/08/27/training-performance.html&quot;&gt;https://prestosql.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://prestosql.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://prestosql.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://prestosql.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://prestosql.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://prestosql.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 348 Release Notes discussed: https://prestosql.io/docs/current/release/release-348.html</summary>

      
      
    </entry>
  
    <entry>
      <title>6: Query Planning, Remove duplicate predicates, and Memory settings</title>
      <link href="https://trino.io/episodes/6.html" rel="alternate" type="text/html" title="6: Query Planning, Remove duplicate predicates, and Memory settings" />
      <published>2020-11-30T00:00:00+00:00</published>
      <updated>2020-11-30T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/6</id>
      <content type="html" xml:base="https://trino.io/episodes/6.html">&lt;h2 id=&quot;release-347&quot;&gt;Release 347&lt;/h2&gt;

&lt;p&gt;We discuss the Trino 347 release notes:
&lt;a href=&quot;https://trino.io/docs/current/release/release-347.html&quot;&gt;https://trino.io/docs/current/release/release-347.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Official release announcement from Martin Traverso:&lt;/p&gt;

&lt;p&gt;We’re happy to announce the release of Presto 347! This version includes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for EXCEPT ALL and INTERSECT ALL&lt;/li&gt;
  &lt;li&gt;New syntax for changing the owner of a view&lt;/li&gt;
  &lt;li&gt;Performance improvements when inserting data into Hive tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notes from Manfred:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;contains_sequence function for arrays.&lt;/li&gt;
  &lt;li&gt;CentOS 8 on docker image.&lt;/li&gt;
  &lt;li&gt;Kudu get dynamic filtering.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;concept-of-the-week-query-planning&quot;&gt;Concept of the week: Query planning&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;All happening on coordinator in cluster.&lt;/li&gt;
  &lt;li&gt;Before a query can be planned, the coordinator receives a SQL query and
passes it to a parser.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Parser/Analyzer&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The Parser parses the sql query into an AST (abstract syntax tree).&lt;/li&gt;
  &lt;li&gt;Then the analyzer checks for valid SQL including functions and such.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Planner/Optimizer&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Request metadata about structure from catalogs.
    &lt;ul&gt;
      &lt;li&gt;Do the tables and columns exist?&lt;/li&gt;
      &lt;li&gt;What data types are used?&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Request metadata about content (table stats, data location).&lt;/li&gt;
  &lt;li&gt;Create logical plan
    &lt;ul&gt;
      &lt;li&gt;Are function parameters using right data types?&lt;/li&gt;
      &lt;li&gt;What catalogs/schema/tables/columns need to be accessed?&lt;/li&gt;
      &lt;li&gt;Are joins using compatible field data types?&lt;/li&gt;
      &lt;li&gt;Optimize
        &lt;ul&gt;
          &lt;li&gt;Eliminate redundant conditions.&lt;/li&gt;
          &lt;li&gt;Figure best order of operations.&lt;/li&gt;
          &lt;li&gt;Decide on filtering early.&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Create distributed plan (More on this in the next episode!)
    &lt;ul&gt;
      &lt;li&gt;Break logical plan up.&lt;/li&gt;
      &lt;li&gt;Adapt to parallel access by multiple workers to data source.&lt;/li&gt;
      &lt;li&gt;Break up operations so workers aggregate and process data from other workers.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt; to learn what is planned.
Also refer to chapter 4 in 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;
pg. 50.&lt;/p&gt;

&lt;h2 id=&quot;pr-of-the-week-pr-730-remove-duplicate-predicates&quot;&gt;PR of the week: PR 730 Remove duplicate predicates&lt;/h2&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/730&quot;&gt;https://github.com/trinodb/trino/pull/730&lt;/a&gt;, 
came from one of the co-creators &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;.
This pull request removes duplicate predicates in logical binary expressions
(AND, OR) and canonicalizes commutative arithmetic expressions and comparisons
to handle a larger number of variants. Canonicalize is a big word but all it
is saying is that if there are multiple representations of the same logic or
data, then simplify it to a simpler or agreed upon normal form.&lt;/p&gt;

&lt;p&gt;For example the statement &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COALESCE(a * (2 * 3), 1 - 1)&lt;/code&gt; is 
equivalent to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COALESCE(6 * a, 0)&lt;/code&gt; as the expression 2 * 3 can
be simplified to static integer.&lt;/p&gt;

&lt;p&gt;This is an example of a logical plan because we are talking about the query
syntax by optimizing the SQL. It differs from the distributed plan as we are not
determining how the plan will be distributed, where this plan will run and it
does not run further optimizations that are handled by the cost based optimizer
such as pushdown predicates. We’ll talk about this step more in the next
episode. For now let’s cover a few examples&lt;/p&gt;

&lt;h2 id=&quot;demo-pr-730-remove-duplicate-predicates&quot;&gt;Demo: PR 730 Remove duplicate predicates&lt;/h2&gt;
&lt;p&gt;The format of the EXPLAIN used is &lt;a href=&quot;https://graphviz.org/&quot;&gt;graphviz&lt;/a&gt;. The
online tool used during the show is &lt;a href=&quot;http://viz-js.com/&quot;&gt;Viz.js&lt;/a&gt;. You can paste
the output of your EXPLAIN queries to visualize the query in a tree form.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;FORMAT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GRAPHVIZ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LOGICAL&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;FORMAT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GRAPHVIZ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LOGICAL&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; 

&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;FORMAT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GRAPHVIZ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DISTRIBUTED&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tiny&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tiny&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tiny&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;question-of-the-week-how-should-i-allocate-memory-properties&quot;&gt;Question of the week: How should I allocate memory properties?&lt;/h2&gt;

&lt;p&gt;In this week’s question, we answer:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;How should I allocate memory properties? CPU : 16Core  MEM:64GB&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before answering this, we should make sure a few things about memory are clear.&lt;/p&gt;

&lt;h3 id=&quot;user-memory&quot;&gt;User memory&lt;/h3&gt;
&lt;p&gt;Space needed that the user is capable of reasoning about:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Input Data&lt;/li&gt;
  &lt;li&gt;Hash tables execution&lt;/li&gt;
  &lt;li&gt;Sorting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Settings&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory-per-node&lt;/code&gt;&lt;/strong&gt; - maximum amount of user memory that a query
is allowed to use on a given worker.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory&lt;/code&gt;&lt;/strong&gt; (without the -per-node at the end) - This config caps 
the amount of user memory used by a single query over all worker nodes in your 
cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;system-memory&quot;&gt;System memory&lt;/h3&gt;
&lt;p&gt;Memory needed to facilitate internal usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Shuffle buffers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NOTE: There are no settings for this memory as it is implicitly set by the user
and total memory settings. Use this to calculate system memory:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;max system memroy per node = &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;&lt;/strong&gt; - 
 &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory-per-node&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;max system memory = &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory&lt;/code&gt;&lt;/strong&gt; - &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-memory&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;total-memory&quot;&gt;Total memory&lt;/h3&gt;
&lt;p&gt;Total Memory = System + User, but there are only properties for total and
user memory.&lt;/p&gt;

&lt;p&gt;Settings&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;&lt;/strong&gt; - maximum amount of total memory that a
  query is allowed to use on a given worker.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory&lt;/code&gt;&lt;/strong&gt;(without the -per-node at the end) - This config 
caps the total memory used by a single query over all worker nodes in your
cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;heap-headroom&quot;&gt;Heap headroom&lt;/h3&gt;
&lt;p&gt;The final setting I would like to cover is the 
&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;memory.heap-headroom-per-node&lt;/code&gt;&lt;/strong&gt;. This config sets aside memory for the
JVM heap for allocations that are not tracked by Presto. You can typically go
with the default on this setting which is 30% of the JVM’s max heap size 
(-Xmx setting).&lt;/p&gt;

&lt;h3 id=&quot;jvm-heap-memory--xmx-setting&quot;&gt;JVM heap memory (-Xmx setting)&lt;/h3&gt;
&lt;p&gt;Now knowing that Presto is a java application means it runs on the JVM. None of
these memory settings mean anything until we actually have the JVM that Presto
is running on set aside sufficient memory. So how do I know I am setting 
sufficient memory based on my settings?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query.max-total-memory-per-node&lt;/code&gt;&lt;/strong&gt; + &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;memory.heap-headroom-per-node&lt;/code&gt;&lt;/strong&gt; &amp;lt; 
 &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Xmx&lt;/code&gt; setting (Java heap)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/6/memory_pools.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Dain really covers the proportions well in detail on the recent training videos. 
Here’s a snippet of what he recommends.&lt;/p&gt;

&lt;iframe width=&quot;1058&quot; height=&quot;595&quot; src=&quot;https://www.youtube.com/embed/Pu80FkBRP-k?start=2569&amp;amp;end=2674&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; 
encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;
&lt;/iframe&gt;

&lt;p&gt;All in all, try to estimate the amount of memory needed by your max anticipated
query load, and if possible try to get even more than your estimate. Once Presto
is discovered by users, they will start to use it even more and demands on the
system will grow.&lt;/p&gt;

&lt;h2 id=&quot;events-news-and-various-links&quot;&gt;Events, news, and various links&lt;/h2&gt;
&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&quot;&gt;https://towardsdatascience.com/statistics-in-spark-sql-explained-22ec389bf71b&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&quot;&gt;https://www.infoworld.com/article/3597971/on-premises-data-warehouses-are-dead.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dec 8 &lt;a href=&quot;https://www.meetup.com/Warsaw-Data-Engineering/events/274939817/&quot;&gt;https://www.meetup.com/Warsaw-Data-Engineering/events/274939817/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 9 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;Presto with Martin Traverso, Dain Sundstrom and David Phillips&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;Simplify Your Data Architecture With The Presto Distributed SQL Engine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5980279&quot;&gt;How Open Source Presto Unlocks a Single Point of Access to Data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/5656471&quot;&gt;The Data Access Struggle is Real&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/&quot;&gt;Presto with Justin Borgman&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://redhatx.buzzsprout.com/755519/3923864&quot;&gt;The infrastructure renaissance and how it will power the modernization of analytics platforms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://softwareengineeringdaily.com/2018/05/24/ubers-data-platform-with-zhenxiao-luo/&quot;&gt;Uber’s Data Platform with Zhenxiao Luo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Release 347</summary>

      
      
    </entry>
  
    <entry>
      <title>A Report about Presto Conference Tokyo 2020 Online</title>
      <link href="https://trino.io/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020.html" rel="alternate" type="text/html" title="A Report about Presto Conference Tokyo 2020 Online" />
      <published>2020-11-21T00:00:00+00:00</published>
      <updated>2020-11-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020</id>
      <content type="html" xml:base="https://trino.io/blog/2020/11/21/a-report-about-presto-conference-tokyo-2020.html">&lt;p&gt;On Nov 11th, 2020, Japan Presto Community held the 2nd Presto Conference 
welcoming Martin Traverso and Brian Olsen.
The conference was hosted at Youtube Live.
This article is the summary of the conference aiming to share their great talks.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-community-updates&quot;&gt;Presto Community Updates&lt;/h1&gt;

&lt;p&gt;First of all, Martin introduced recent Presto updates in these days. 
It covers recent changes and enhancements achieved by the community activities.
Attendees also learned several new functions that will be available soon.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Update / Merge (https://github.com/prestosql/presto/issues/3325)&lt;/li&gt;
  &lt;li&gt;Materialized Views (https://github.com/prestosql/presto/pull/3283)&lt;/li&gt;
  &lt;li&gt;Dynamically resolved functions&lt;/li&gt;
  &lt;li&gt;Optimized Parquet reader&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, at Q&amp;amp;A, he suggests new developers who want to contribute to PrestoSQL 
to check “good first issue” tag on Github. The tag is a good first step for a new joiner to contribute. 
Ref. &lt;a href=&quot;https://github.com/prestosql/presto/labels/good%20first%20issue&quot;&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/NxDBBEA67Ws&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h1 id=&quot;presto-community---how-to-get-involved&quot;&gt;Presto Community - How to get involved&lt;/h1&gt;

&lt;p&gt;To make attendees get used to Presto Community, Martin provided a guide for walking around Presto community. 
He gives us their team’s principles about the Presto community, and talk about their education strategy for new Presto users.
I would like to quote the pricinpals here.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;We are passionate about open source&lt;/li&gt;
  &lt;li&gt;We help others be succesfful with what we create&lt;/li&gt;
  &lt;li&gt;We create robust long-lasting software&lt;/li&gt;
  &lt;li&gt;We are egalitarian (nobody is more important than the other)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;support-presto-as-a-feature-of-saas&quot;&gt;Support Presto as a feature of SaaS&lt;/h1&gt;

&lt;p&gt;Then, Satoru Kamikaseda, Technical Support Engineer at Treasure Data, provides an overview of how Treasure Data supports Presto in their service. 
Presto is heavily used to support many enterprise use cases as a customer data platoform, 
and it is becoming the hub component processing high throughput workload from many kinds of clients such as Spark, ODBC and JDBC.&lt;/p&gt;

&lt;p&gt;He described statistics about Presto queries on their platform, and how to support each cases. 
In the stats, 1/3 is any investigation of job failure and query result, 1/3 is a request to help their client’s SQL, 
and others are a sort of notifications to their clients and performance investigation. 
His talk must be useful for any SaaS companies that provides a query engine to their clients to learn how difficult it is to support a distibuted query engine.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/GR6e3dfKKJ8w4c&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/SatoruKamikaseda/support-presto-as-a-feature-of-saas&quot; title=&quot;Support Presto as a feature of SaaS&quot; target=&quot;_blank&quot;&gt;Support Presto as a feature of SaaS&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/SatoruKamikaseda&quot; target=&quot;_blank&quot;&gt;SatoruKamikaseda&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;how-to-use-presto-with-aws-efficiently&quot;&gt;How to use Presto with AWS efficiently&lt;/h1&gt;

&lt;p&gt;We could learn how to use Presto with AWS including Presto on EMR, Presto on EC2, Presto by Athena and AWS Glue.
Noritaka Sekiyama, Sr. Big Data Architect at Amazon Web Service, Japan, also shares a comparison of Presto on AWS (EC2, EMR, Athena). 
If you are a new to Presto, his talk gives you an insight to choose your first Presto environement.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/kWzJ1XqR96A9di&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/ssuserca76a5/aws-presto&quot; title=&quot;AWS で Presto を徹底的に使いこなすワザ&quot; target=&quot;_blank&quot;&gt;AWS で Presto を徹底的に使いこなすワザ&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/ssuserca76a5&quot; target=&quot;_blank&quot;&gt;Noritaka Sekiyama&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;presto--line-2020&quot;&gt;Presto @ LINE 2020&lt;/h1&gt;

&lt;p&gt;LINE is the biggest company providing the mobile communication tool in Japan (say WhatsApp in Japan). HYuya Ebihara, one of Presto maintainers, 
gives us how they improve Presto at their platform since they presented in &lt;a href=&quot;/blog/2019/07/11/report-for-presto-conference-tokyo.html&quot;&gt;the previous conference&lt;/a&gt;. 
Their Presto usage significantly increases from 2019. Num of Presto workers from 100 to 300 and Num of daily queries reaches to 50,000 queries from 20,000 queries. 
We could learn how to upgrade Presto from 314 to 339 and how they resolved issues through Presto upgrade.&lt;/p&gt;

&lt;iframe src=&quot;https://docs.google.com/presentation/d/e/2PACX-1vS2QdQjhLsiSuVdWlEmT23ixqoZXkRrKKMRGa1hrZHg65OpcH18RpzARotOMYvIBSwP57lPPAHkUQOx/embed&quot; frameborder=&quot;0&quot; width=&quot;595&quot; height=&quot;485&quot; allowfullscreen=&quot;true&quot; mozallowfullscreen=&quot;true&quot; webkitallowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;

&lt;h1 id=&quot;dive-into-amazon-athena---serverless-presto-2020&quot;&gt;Dive into Amazon Athena - Serverless Presto, 2020&lt;/h1&gt;

&lt;p&gt;Makoto Kawamura, Solution Architect at Amazon Web Service Japan, 
introduces the latest features of AWS Athena and performance tuning tips. It must be helpful for developers who tied to AWS to explore Amazon Athena.&lt;/p&gt;

&lt;div style=&quot;width: 90%&quot;&gt;&lt;script async=&quot;&quot; class=&quot;speakerdeck-embed&quot; data-id=&quot;92a399aad5344df197279cd4195d9464&quot; data-ratio=&quot;1.77777777777778&quot; src=&quot;//speakerdeck.com/assets/embed.js&quot;&gt;&lt;/script&gt;&lt;/div&gt;

&lt;h1 id=&quot;presto-cassandra-connector-hack-at-repro&quot;&gt;Presto Cassandra Connector Hack at Repro&lt;/h1&gt;

&lt;p&gt;Repro provides Customer Engagement Platform that enables companies to personalize their communication strategies with the right message at the right time to drive better retention and lifetime value. 
They use Presto for a segmentation backend system in their service to make a list of audiences with a certain condition.&lt;/p&gt;

&lt;p&gt;Takeshi Arabiki gives us an in-depth presentation on the modification of Presto Casandra to stabilize and improve the performance of Presto, 
in addition to the use of Presto in Repro.
His talk covers a wide range of topics from investigation of the bottleneck to its resolution.&lt;/p&gt;

&lt;script async=&quot;&quot; class=&quot;speakerdeck-embed&quot; data-id=&quot;9289d942805a4bf2be908cf42a122a29&quot; data-ratio=&quot;1.77777777777778&quot; src=&quot;//speakerdeck.com/assets/embed.js&quot;&gt;&lt;/script&gt;

&lt;h1 id=&quot;testing-distributed-query-engine-as-a-service&quot;&gt;Testing Distributed Query Engine as a Service&lt;/h1&gt;

&lt;p&gt;At the end, Naoki Takezoe from Treasure Data, talks their challenges towards Presto upgrade and 
how hard to migrate variety of workload with performance stability. 
In actual production-scale enviroment that are running multiple client, testing is one of big challenges. 
He shows how they simulate their client workload with theier developed query simulator to cover various corner cases and to verify data correctness.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/yCrep8qbYUzNzh&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
&lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/takezoe/testing-distributed-query-engine-as-a-service&quot; title=&quot;Testing Distributed Query Engine as a Service&quot; target=&quot;_blank&quot;&gt;Testing Distributed Query Engine as a Service&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/takezoe&quot; target=&quot;_blank&quot;&gt;takezoe&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap Up&lt;/h1&gt;

&lt;p&gt;This conference was the first online Presto conference in Tokyo. 
Unfortunately, We couldn’t have a chance to discuss with the community developers and creators in face-to-face. We hope we’ll get such a great opportunity in the near future.
However, that was a great time to have many presentations with the community members to learn a lot of new things from their wornderful experience.
During the conference, the average number of Youtube Live viewers are over 100 people, 
and the total of attendees are around 180 people. 
In the previous conference, there were 89 attendees. I think the number of Presto developers/users in Japan has been increasing gradually. 
We really appreciate developers in the community and creators. Thank you so much for coming to the conference and see you next time!&lt;/p&gt;

&lt;h1 id=&quot;youtube-live-link&quot;&gt;Youtube Live link&lt;/h1&gt;

&lt;p&gt;The event is mainly talked in Japanese.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://youtu.be/NxDBBEA67Ws&quot;&gt;Presto Conference Tokyo 2020 Online&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Toru Takahashi, Treasure Data</name>
        </author>
      

      <summary>On Nov 11th, 2020, Japan Presto Community held the 2nd Presto Conference welcoming Martin Traverso and Brian Olsen. The conference was hosted at Youtube Live. This article is the summary of the conference aiming to share their great talks.</summary>

      
      
    </entry>
  
    <entry>
      <title>5: Hive Partitions, sync_partition_metadata, and Query Exceeded Max Columns!</title>
      <link href="https://trino.io/episodes/5.html" rel="alternate" type="text/html" title="5: Hive Partitions, sync_partition_metadata, and Query Exceeded Max Columns!" />
      <published>2020-11-19T00:00:00+00:00</published>
      <updated>2020-11-19T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/5</id>
      <content type="html" xml:base="https://trino.io/episodes/5.html">&lt;p&gt;In this week’s concept, Manfred discusses Hive Partitioning.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Concept from RDBMS systems implemented in HDFS&lt;/li&gt;
  &lt;li&gt;Normally just multiple files in a directory per table&lt;/li&gt;
  &lt;li&gt;Lots of different file formats, but always one directory&lt;/li&gt;
  &lt;li&gt;Partitioning creates nested directories&lt;/li&gt;
  &lt;li&gt;Needs to be set up at start of table creation&lt;/li&gt;
  &lt;li&gt;CTAS query&lt;/li&gt;
  &lt;li&gt;Uses WITH ( partitioned_by = ARRAY[‘date’])&lt;/li&gt;
  &lt;li&gt;Results in tablename/date=2020-11-19&lt;/li&gt;
  &lt;li&gt;Can also nest deeper WITH ( partitioned_by = ARRAY[‘date’, ‘countrycode’])&lt;/li&gt;
  &lt;li&gt;Can greatly enhance performance&lt;/li&gt;
  &lt;li&gt;Optimizer can determine what directories to read based on field&lt;/li&gt;
  &lt;li&gt;Especially useful when fields are used in WHERE clauses&lt;/li&gt;
  &lt;li&gt;Also useful for historic data management over time such as moving data out
to archive, deleting data, or replacing data with aggregates, or even just
  running compaction on subsets&lt;/li&gt;
  &lt;li&gt;Presto can use DELETE on partitions using DELTE FROM table WHERE date=value&lt;/li&gt;
  &lt;li&gt;Also possible to create empty partitions upfront CALL system.create_empty_partition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See here for more details: &lt;a href=&quot;https://www.educba.com/partitioning-in-hive/&quot;&gt;https://www.educba.com/partitioning-in-hive/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/223&quot;&gt;https://github.com/trinodb/trino/pull/223&lt;/a&gt;, 
came from contributor &lt;a href=&quot;https://github.com/luohao&quot;&gt;Hao Luo&lt;/a&gt;. What this function
does is similar to Hive’s &lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)&quot;&gt;MSCK REPAIR TABLE&lt;/a&gt;
where if it finds a hive partition directory in the filesystem that exist but
no partition entry in the metastore, then it will add the entry to the
metastore. If there is an entry in the metastore but the partition was deleted
from the filesystem, then it will remove the metastore entry. You can find
more information about &lt;a href=&quot;https://trino.io/docs/current/connector/hive.html#procedures&quot;&gt;this procedure in the documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here are the commands and SQL I ran during the show on Presto&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CATALOGS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCHEMAS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TABLES&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SCHEMA&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;location&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;s3a://part/&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Create a table with no partitions&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_part&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_part&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-2&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-4&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-5&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-6&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;partitioned_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;dt&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-2&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-4&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-5&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-6&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;no_part&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020-11-18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;


&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Make sure you are using minio (which is a rename of hive) catalog&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CALL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sync_partition_metadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ADD&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CALL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sync_partition_metadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;DROP&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CALL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sync_partition_metadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;FULL&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

 &lt;span class=&quot;c1&quot;&gt;-- Create a table with multi partitions&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;multi_part&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;year&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;month&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;day&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;ORC&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;partitioned_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;year&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;month&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;day&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;minio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;multi_part&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-2&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-3&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-4&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-5&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-6&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2020&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-7&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-8&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;18&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-9&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-10&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;19&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;11&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part-12&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;2019&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;01&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;20&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We ran some queries against the metastore database. It’s a complicated model so 
here is a database diagram to show the different tables and their relations in
the metastore.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/episode/5/hive_metastore_database_diagram.png&quot; alt=&quot;&quot; /&gt;
This diagram was generated by niftimusmaximus on 
&lt;a href=&quot;https://analyticsanvil.wordpress.com/2016/08/21/useful-queries-for-the-hive-metastore/&quot;&gt;The Analytics Anvil&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;MariaDB (metastore database)&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;USE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metastore_db&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show database&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show tables given a database&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show location and input format of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INPUT_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;LOCATION&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show (de)serializer format of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SLIB&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SERDES&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SERDE_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show columns of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;COLUMNS_V2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CD_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CD_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INTEGER_IDX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- show partitions of the table given database/table names&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;LOCATION&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DBS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBLS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PARTITIONS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_ID&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SDS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SD_ID&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TBL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;orders&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;part&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this week’s question, we answer:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Why am I getting, “Query exceeded maximum columns. Please reduce the number 
of columns referenced and re-run the query.”?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I’m running this query to check for duplicates. My table has approx. 650
columns and I get this error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT *, COUNT(1) 
FROM tbl 
GROUP BY * 
HAVING COUNT(1) &amp;gt; 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;getting a stacktrace like this&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;io.prestosql.spi.PrestoException: Compiler failed
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitScanFilterAndProject(LocalExecutionPlanner.java:1306)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitProject(LocalExecutionPlanner.java:1185)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitProject(LocalExecutionPlanner.java:705)
	at io.prestosql.sql.planner.plan.ProjectNode.accept(ProjectNode.java:82)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitAggregation(LocalExecutionPlanner.java:1119)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitAggregation(LocalExecutionPlanner.java:705)
	at io.prestosql.sql.planner.plan.AggregationNode.accept(AggregationNode.java:204)
	at io.prestosql.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:461)
	at io.prestosql.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:432)
	at io.prestosql.execution.SqlTaskExecutionFactory.create(SqlTaskExecutionFactory.java:75)
	at io.prestosql.execution.SqlTask.updateTask(SqlTask.java:382)
	at io.prestosql.execution.SqlTaskManager.updateTask(SqlTaskManager.java:383)
	at io.prestosql.server.TaskResource.createOrUpdateTask(TaskResource.java:128)
	at jdk.internal.reflect.GeneratedMethodAccessor480.invoke(Unknown Source)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The throwable that causes this error &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MethodTooLargeException&lt;/code&gt; comes from the ASM
library &lt;a href=&quot;https://asm.ow2.io/&quot;&gt;https://asm.ow2.io/&lt;/a&gt; when you ask it to create a method with more
bytecode than is allowed by the JVM specification.&lt;/p&gt;

&lt;p&gt;We try to generate code for handling given query and the code generated is too 
large. Since the code is proportional to number of columns referenced, we
rewrap the exception in something more meaningful to the user.&lt;/p&gt;

&lt;p&gt;The general strategy would be to lower the number of columns that you reference.&lt;/p&gt;

&lt;p&gt;The problem is that in removing columns you will remove important information
to the query. For example, in the example looking for duplicates above, you 
won’t be able to discard false positive duplicate matches, but this may be
good enough to help narrow the search space. As always, it depends…&lt;/p&gt;

&lt;p&gt;To learn more about the JVM limit and search for code_length in the Java SE
specification.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.3&quot;&gt;SE8&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.3&quot;&gt;SE11&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks to &lt;a href=&quot;https://github.com/hashhar&quot;&gt;Ashhar Hasan&lt;/a&gt; for asking this 
question and providing some useful context!&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-346.html&quot;&gt;https://trino.io/docs/current/release/release-346.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2020/05/presto-sql-for-newbies.html&quot;&gt;https://www.javahelps.com/2020/05/presto-sql-for-newbies.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2020/04/setup-presto-sql-development-environment.html&quot;&gt;https://www.javahelps.com/2020/04/setup-presto-sql-development-environment.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-types-of-joins.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-types-of-joins.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&quot;&gt;https://www.javahelps.com/2019/11/presto-sql-join-algorithms.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://medium.com/analytics-vidhya/deploying-starburst-enterprise-presto-on-googles-kubernetes-engine-with-storage-and-postgres-72483b10ab62&quot;&gt;https://medium.com/analytics-vidhya/deploying-starburst-enterprise-presto-on-googles-kubernetes-engine-with-storage-and-postgres-72483b10ab62&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Nov 19 Presto Tokyo Conference - Japanese &lt;a href=&quot;https://techplay.jp/event/795265&quot;&gt;https://techplay.jp/event/795265&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Nov 24 EMEA - Polish &lt;a href=&quot;https://www.meetup.com/Warsaw-Data-Engineering/events/274666392/&quot;&gt;https://www.meetup.com/Warsaw-Data-Engineering/events/274666392/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 2 &lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 3 EMEA &lt;a href=&quot;https://www.starburstdata.com/introduction-to-presto/&quot;&gt;https://www.starburstdata.com/introduction-to-presto/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 9 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this week’s concept, Manfred discusses Hive Partitioning. Concept from RDBMS systems implemented in HDFS Normally just multiple files in a directory per table Lots of different file formats, but always one directory Partitioning creates nested directories Needs to be set up at start of table creation CTAS query Uses WITH ( partitioned_by = ARRAY[‘date’]) Results in tablename/date=2020-11-19 Can also nest deeper WITH ( partitioned_by = ARRAY[‘date’, ‘countrycode’]) Can greatly enhance performance Optimizer can determine what directories to read based on field Especially useful when fields are used in WHERE clauses Also useful for historic data management over time such as moving data out to archive, deleting data, or replacing data with aggregates, or even just running compaction on subsets Presto can use DELETE on partitions using DELTE FROM table WHERE date=value Also possible to create empty partitions upfront CALL system.create_empty_partition See here for more details: https://www.educba.com/partitioning-in-hive/</summary>

      
      
    </entry>
  
    <entry>
      <title>4: Presto on ACID, row-level INSERT/DELETE, and why JDK11?</title>
      <link href="https://trino.io/episodes/4.html" rel="alternate" type="text/html" title="4: Presto on ACID, row-level INSERT/DELETE, and why JDK11?" />
      <published>2020-11-04T00:00:00+00:00</published>
      <updated>2020-11-04T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/4</id>
      <content type="html" xml:base="https://trino.io/episodes/4.html">&lt;p&gt;In this week’s concept, Manfred discusses ACID in general, CAP theorem, 
HDFS and Hive before ACID, and now ORC ACID and similar support.&lt;/p&gt;

&lt;p&gt;ACID &lt;a href=&quot;https://en.wikipedia.org/wiki/ACID&quot;&gt;https://en.wikipedia.org/wiki/ACID&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Atomicity - Transaction completely succeeds or completely fails, no partial 
results so no inconsistent relationships left tangling and such. The database
 remains in a consistent state.&lt;/li&gt;
  &lt;li&gt;Consistency - database content always adheres to defined rules (key
 constraints).&lt;/li&gt;
  &lt;li&gt;Isolation, transactions are isolated from each other and can run in parallel
  with same result as sequentially.&lt;/li&gt;
  &lt;li&gt;Durability - no data is lost after transaction completion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ACID used to be a crucial criteria for a “serious” relational database system.&lt;/p&gt;

&lt;p&gt;Then came big data and the CAP theorem. &lt;a href=&quot;https://en.wikipedia.org/wiki/CAP_theorem&quot;&gt;https://en.wikipedia.org/wiki/CAP_theorem&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Consistency&lt;/li&gt;
  &lt;li&gt;Availability&lt;/li&gt;
  &lt;li&gt;Partition tolerance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/5402&quot;&gt;https://github.com/trinodb/trino/pull/5402&lt;/a&gt;,
came from contributor &lt;a href=&quot;https://github.com/djsstarburst&quot;&gt;David Stryker&lt;/a&gt;. David
covers some interesting aspects to working on this pull request. This commit
adds support for row-level insert and delete for Hive ACID tables, and
product tests that verify that row-level insert and delete where allowed.&lt;/p&gt;

&lt;p&gt;Here is the SQL that we ran in the INSERT/DELETE demo&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/*
  Ran against Presto
*/
SHOW SCHEMAS IN minio;
SHOW TABLES IN minio.acid;

CREATE SCHEMA minio.acid
WITH (location = &apos;s3a://acid/&apos;);


CREATE TABLE minio.acid.test (a int, b int)
WITH (
   format=&apos;ORC&apos;,
   transactional=true
);

INSERT INTO minio.acid.test VALUES (10, 10), (20, 20);

SELECT * FROM  minio.acid.test;

DELETE FROM minio.acid.test WHERE a = 10;

/*
  Ran against Hive
*/

SHOW DATABASES;

SELECT * FROM acid.test;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;David also mentioned &lt;a href=&quot;http://shzhangji.com/blog/2019/06/10/understanding-hive-acid-transactional-table/&quot;&gt;this blog&lt;/a&gt;
to better understand the hive acid model.&lt;/p&gt;

&lt;p&gt;In this week’s question we answer, “Why is Java 11 needed in the newer version
of Presto and how do I get the older version of Presto as I need the 328 latest 
on Java 8 as Java 11 isn’t available to use?&lt;/p&gt;

&lt;p&gt;Using Java 11 because it is the next LTS verison of java since 8. Java 11 
provides significant performance and stability improvements, so we believe 
everyone should be running that version to get the best experience out of 
Presto. Moving to Java 11 allows us to take advantage of many improvements to 
the JDK and the Java language that were introduced since Java 8.&lt;/p&gt;

&lt;p&gt;For older versions, you can download it from maven or an older document version.
&lt;a href=&quot;https://repo.maven.apache.org/maven2/io/prestosql/presto-server/&quot;&gt;https://repo.maven.apache.org/maven2/io/prestosql/presto-server/&lt;/a&gt;
&lt;a href=&quot;https://trino.io/docs/328/&quot;&gt;https://trino.io/docs/328/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One thing to point out is you’re only required to use JDK11 for the server. The
client can be on JDK8. One reason you would need to run Presto on JDK8 is if the
server had to be run with another service running JDK8 which we do not recommend
as this will degrade the performance of your cluster and could cause other
issues if Presto is fighting for resources.&lt;/p&gt;

&lt;p&gt;Another possibility is that there is
a company policy requiring specific JDKs be installed on all servers. You can
have side-by-side installs of multiple versions of the JDK and use the
appropriate one. You just need to launch Presto with the correct java command. 
If your company is against using a newer JDK, you can point out the arguments
above to update the policy to at least include JDK11.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-345.html&quot;&gt;https://trino.io/docs/current/release/release-345.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &quot;&gt;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Nov 12 Webinar: &lt;a href=&quot;https://www.starburstdata.com/webinar-lower-cdw-costs-starburst&quot;&gt;https://www.starburstdata.com/webinar-lower-cdw-costs-starburst&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Nov 17 &lt;a href=&quot;https://databricks.com/session_eu20/presto-fast-sql-on-anything-including-delta-lake-snowflake-elasticsearch-and-more&quot;&gt;https://databricks.com/session_eu20/presto-fast-sql-on-anything-including-delta-lake-snowflake-elasticsearch-and-more&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Nov 19 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 2 &lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 9 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 10 &lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Dec 16 &lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin(Now with timestamps!):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Presto Summit Series - Real world usage&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent Podcasts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this week’s concept, Manfred discusses ACID in general, CAP theorem, HDFS and Hive before ACID, and now ORC ACID and similar support.</summary>

      
      
    </entry>
  
    <entry>
      <title>3: Running two Presto distributions and Kafka headers as Presto columns</title>
      <link href="https://trino.io/episodes/3.html" rel="alternate" type="text/html" title="3: Running two Presto distributions and Kafka headers as Presto columns" />
      <published>2020-10-22T00:00:00+00:00</published>
      <updated>2020-10-22T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/3</id>
      <content type="html" xml:base="https://trino.io/episodes/3.html">&lt;p&gt;In this week’s concept, Manfred discusses what an SPI (service provider 
interface) is and covers the connector architecture of Presto, Starburst, and 
Custom.&lt;/p&gt;

&lt;p&gt;In this week’s pull request &lt;a href=&quot;https://github.com/trinodb/trino/pull/4462&quot;&gt;https://github.com/trinodb/trino/pull/4462&lt;/a&gt;, 
came from user &lt;a href=&quot;https://github.com/0xE282B0&quot;&gt;Sven Pfennig&lt;/a&gt;. Sven works for 
&lt;a href=&quot;syncier.com&quot;&gt;Syncier GmbH&lt;/a&gt; and as part of his role there he gets to contribute
to open source projects such as Presto. Thanks Sven! We jump into a quick setup
of a kafka broker using the 
&lt;a href=&quot;https://kafka.apache.org/quickstart&quot;&gt;kafka quickstart tutorial&lt;/a&gt; and I use the 
&lt;a href=&quot;https://github.com/edenhill/kafkacat&quot;&gt;kafkacat tool&lt;/a&gt; to show off the addition 
of headers in Kafka that Sven has provided us and discuss why this is 
beneficial.&lt;/p&gt;

&lt;p&gt;Here’s the crazy select statement I used to decode the binary values to utf text
of the foo column&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT 
   _message, 
   reduce(element_at(_headers,&apos;foo&apos;), &apos;&apos;, (s, c) -&amp;gt; s || from_utf8(c), s -&amp;gt; s) AS foo 
FROM kafka.default.pcb 
WHERE contains(map_keys(_headers), &apos;foo&apos;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;An alternative tutorial that uses the TPC dataset can be located on the website site.
&lt;a href=&quot;https://trino.io/docs/current/connector/kafka-tutorial.html&quot;&gt;https://trino.io/docs/current/connector/kafka-tutorial.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This weeks question was accidentally cut off as I had mapped my Shift + R key to
toggle streaming/recording and this cut the broadcast when I typed the R in
FROM.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-344.html&quot;&gt;https://trino.io/docs/current/release/release-344.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &quot;&gt;https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/BigDataATL/events/273435961/&quot;&gt;https://www.meetup.com/BigDataATL/events/273435961/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://edw2020chicago.dataversity.net/&quot;&gt;https://edw2020chicago.dataversity.net/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin:
&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Presto Summit Series - Real world usage
&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recent Podcasts:
&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;
&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>In this week’s concept, Manfred discusses what an SPI (service provider interface) is and covers the connector architecture of Presto, Starburst, and Custom.</summary>

      
      
    </entry>
  
    <entry>
      <title>Announcing Presto Conference Tokyo 2020</title>
      <link href="https://trino.io/blog/2020/10/21/announcing-presto-conference-tokyo-2020.html" rel="alternate" type="text/html" title="Announcing Presto Conference Tokyo 2020" />
      <published>2020-10-21T00:00:00+00:00</published>
      <updated>2020-10-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/21/announcing-presto-conference-tokyo-2020</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/21/announcing-presto-conference-tokyo-2020.html">&lt;p&gt;Last year, &lt;a href=&quot;/blog/2019/07/11/report-for-presto-conference-tokyo.html&quot;&gt;Presto Conference Tokyo 2019&lt;/a&gt; 
was held in Japan with Martin Traverso, Dain Sundstrom and David Phillips, 
the founders of the Presto Software Foundation.&lt;/p&gt;

&lt;p&gt;This year, the event changes to be an online only event. Presto Conference 
Tokyo 2020 is happening on the 20th of November. 
You can &lt;a href=&quot;https://techplay.jp/event/795265&quot;&gt;find out details and register right now&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;The event includes six sessions from Treasure Data, Amazon Web Services 
Japan, Repro and LINE, as well as open sessions with Martin and Brian Olsen, 
a Developer Advocate at Starburst Data.
This is a valuable opportunity to hear from engineers who are actually using 
Presto. It has something for those who are using Presto for data engineering
and those who don’t use Presto yet but are interested in it.&lt;/p&gt;

&lt;!--more--&gt;</content>

      
        <author>
          <name>Yuya Ebihara, LINE</name>
        </author>
      

      <summary>Last year, Presto Conference Tokyo 2019 was held in Japan with Martin Traverso, Dain Sundstrom and David Phillips, the founders of the Presto Software Foundation. This year, the event changes to be an online only event. Presto Conference Tokyo 2020 is happening on the 20th of November. You can find out details and register right now! The event includes six sessions from Treasure Data, Amazon Web Services Japan, Repro and LINE, as well as open sessions with Martin and Brian Olsen, a Developer Advocate at Starburst Data. This is a valuable opportunity to hear from engineers who are actually using Presto. It has something for those who are using Presto for data engineering and those who don’t use Presto yet but are interested in it.</summary>

      
      
    </entry>
  
    <entry>
      <title>A gentle introduction to the Hive connector</title>
      <link href="https://trino.io/blog/2020/10/20/intro-to-hive-connector.html" rel="alternate" type="text/html" title="A gentle introduction to the Hive connector" />
      <published>2020-10-20T00:00:00+00:00</published>
      <updated>2020-10-20T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/20/intro-to-hive-connector</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/20/intro-to-hive-connector.html">&lt;p&gt;TL;DR: The Hive connector is what you use in Trino for reading data from object
storage that is organized according to the rules laid out by Hive, without using
the Hive runtime code.&lt;/p&gt;

&lt;p&gt;One of the most confusing aspects when starting Trino is the Hive connector. 
Typically, you seek out the use of Trino when you experience an intensely slow
query turnaround from your existing Hadoop, Spark, or Hive infrastructure. In
fact, the genesis of Trino, formerly known as Presto, came about due to these 
slow Hive query conditions at Facebook back in 2012.&lt;/p&gt;

&lt;p&gt;So when you learn that Trino has a Hive connector,
it can be rather confusing since you moved to Trino to circumvent the slowness
of your current Hive cluster. Another common source of confusion is when you
want to query your data from your cloud object storage, such as AWS S3, MinIO, 
and Google Cloud Storage. This too uses the Hive connector. If that 
confuses you, don’t worry, you are not alone. This blog aims to explain this
commonly confusing nomenclature.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;hive-architecture&quot;&gt;Hive architecture&lt;/h1&gt;

&lt;p&gt;To understand the origins and inner workings of Trino’s Hive connector, you
first need to know a few high level components of the Hive architecture.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/intro-to-hive-connector/hive.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can simplify the Hive architecture to four components:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The runtime&lt;/em&gt; contains the logic of the query engine that translates the SQL
-esque Hive Query Language(HQL) into MapReduce jobs that run over files stored 
in the filesystem.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The storage&lt;/em&gt; component is simply that, it stores files in various formats and
index structures to recall these files. The file formats can be anything as
simple as JSON and CSV, to more complex files such as columnar formats like ORC
and Parquet. Traditionally, Hive runs on top of the Hadoop Distributed
Filesystem (HDFS). As cloud-based options became more prevalent, object storage
like Amazon S3, Azure Blob Storage, Google Cloud Storage, and others needed
to be leveraged as well and replaced HDFS as the storage component.&lt;/p&gt;

&lt;p&gt;In order for Hive to process these files, it must have a mapping
from SQL tables in &lt;em&gt;the runtime&lt;/em&gt; to files and directories in &lt;em&gt;the storage&lt;/em&gt;
component. To accomplish this, Hive uses the Hive Metastore Service (HMS), 
often shortened to &lt;em&gt;the metastore&lt;/em&gt; to manage the metadata about the files such
as table columns, file locations, file formats, etc…&lt;/p&gt;

&lt;p&gt;The last component not included in the image is Hive’s &lt;em&gt;data organization
specification&lt;/em&gt;. The documentation of this element only exists in the code in
Hive and has been reverse engineered to be used by other systems like Trino 
to remain compatible with other systems.&lt;/p&gt;

&lt;p&gt;Trino reuses all of these components except for &lt;em&gt;the runtime&lt;/em&gt;. This is the same
approach most compute engines take when dealing with data in object stores, 
specifically, Trino, Spark, Drill, and Impala. When you think of the Hive
connector, you should think about a connector that is capable of reading data
organized by the unwritten Hive specification.&lt;/p&gt;

&lt;h3 id=&quot;trino-runtime-replaces-hive-runtime&quot;&gt;Trino runtime replaces Hive runtime&lt;/h3&gt;

&lt;p&gt;In the early days of big data systems, many expected query turnaround to take a 
long time due to the high volume of unstructured data in ETL workloads. The
primary goal in early iterations of these systems was simply throughput over
large volumes of data while maintaining fault-tolerance. Now, more businesses
want to run fast interactive queries over their big data instead of running jobs
that take hours and produce possibly undesirable results. Many companies have
petabytes of data and metadata in their data warehouse. Data in storage is
cumbersome to move and the data in the metastore takes a long time to repopulate
in other formats. Since only the runtime that executed Hive queries needs
replacement, the Trino engine utilizes the existing metastore metadata and
files residing in storage, and the Trino runtime effectively replaces the
Hive runtime responsible for analyzing the data.&lt;/p&gt;

&lt;h1 id=&quot;trino-architecture&quot;&gt;Trino Architecture&lt;/h1&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/intro-to-hive-connector/trino.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-hive-connector-nomenclature&quot;&gt;The Hive connector nomenclature&lt;/h3&gt;

&lt;p&gt;Notice, that the only change in the Trino architecture is &lt;em&gt;the runtime&lt;/em&gt;. The
HMS still exists along with &lt;em&gt;the storage&lt;/em&gt;. This is not by accident. This design
exists to address a common problem faced by many companies. It simplifies the
migration from using Hive to using Trino. Regardless of &lt;em&gt;the storage&lt;/em&gt; component
used &lt;em&gt;the runtime&lt;/em&gt; makes use of the HMS and that is the reason this connector is
the Hive connector.&lt;/p&gt;

&lt;p&gt;Where the confusion tends to come from, is when you search for a connector
from the context of the storage systems you want to query. You may not even be 
aware &lt;em&gt;the metastore&lt;/em&gt; is a necessity or even exists. Typically, you look for an
S3 connector, a GCS connector or a MinIO connector. All you need is the Hive 
connector and the HMS to manage the metadata of the objects in your storage.&lt;/p&gt;

&lt;h3 id=&quot;the-hive-metastore-service&quot;&gt;The Hive Metastore Service&lt;/h3&gt;

&lt;p&gt;The HMS is the only Hive process used in the entire Trino ecosystem when using
the Hive connector. The HMS is actually a simple service with a binary API using
&lt;a href=&quot;https://thrift.apache.org/&quot;&gt;the Thrift protocol&lt;/a&gt;. This service makes updates to
the metadata, stored in an RDBMS such as PostgreSQL, MySQL, or MariaDB. There
are other compatible replacements of the HMS such as AWS Glue, a
drop-in substitution for the HMS.&lt;/p&gt;

&lt;h3 id=&quot;getting-started-with-the-hive-connector-on-trino&quot;&gt;Getting started with the Hive Connector on Trino&lt;/h3&gt;

&lt;p&gt;To drive this point home, I created a tutorial that showcases using Trino and
looking at the metadata it produces. In the following scenario, the docker 
environment contains four docker containers:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;trino&lt;/code&gt; - &lt;em&gt;the runtime&lt;/em&gt; in this scenario that replaces Hive.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;minio&lt;/code&gt; - &lt;em&gt;the storage&lt;/em&gt; is an open-source cloud object storage.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive-metastore&lt;/code&gt; -  &lt;em&gt;the metastore&lt;/em&gt; service instance.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mariadb&lt;/code&gt; - the database that &lt;em&gt;the metastore&lt;/em&gt; uses to store the metadata.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can play around with the system and optionally view the configurations. The
scenario asks you to run a query to populate data in MinIO and then see the
resulting metadata populated in MariaDB by the HMS. The next step asks you to
run queries over the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mariadb&lt;/code&gt; database which holds the generated
metadata from &lt;em&gt;the metastore&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If you have any questions or run into any issues with the example, you can find
us on &lt;a href=&quot;/slack.html&quot;&gt;slack&lt;/a&gt; on the #dev or #general channels.&lt;/p&gt;

&lt;p&gt;Have fun!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/bitsondatadev/trino-getting-started/tree/main/hive/trino-minio&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/intro-to-hive-connector/intro-to-hive.jpeg&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>TL;DR: The Hive connector is what you use in Trino for reading data from object storage that is organized according to the rules laid out by Hive, without using the Hive runtime code. One of the most confusing aspects when starting Trino is the Hive connector. Typically, you seek out the use of Trino when you experience an intensely slow query turnaround from your existing Hadoop, Spark, or Hive infrastructure. In fact, the genesis of Trino, formerly known as Presto, came about due to these slow Hive query conditions at Facebook back in 2012. So when you learn that Trino has a Hive connector, it can be rather confusing since you moved to Trino to circumvent the slowness of your current Hive cluster. Another common source of confusion is when you want to query your data from your cloud object storage, such as AWS S3, MinIO, and Google Cloud Storage. This too uses the Hive connector. If that confuses you, don’t worry, you are not alone. This blog aims to explain this commonly confusing nomenclature.</summary>

      
      
    </entry>
  
    <entry>
      <title>2: Kubernetes, arrays on Elasticsearch, and security breaks the UI</title>
      <link href="https://trino.io/episodes/2.html" rel="alternate" type="text/html" title="2: Kubernetes, arrays on Elasticsearch, and security breaks the UI" />
      <published>2020-10-07T00:00:00+00:00</published>
      <updated>2020-10-07T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/2</id>
      <content type="html" xml:base="https://trino.io/episodes/2.html">&lt;p&gt;This week we had a bit of a technical issue between zoom and OBS so there was 
some editing done to remove a portion of the broadcast which mainly cuts out us 
covering the releases. We circle back and give a small summary but unfortunately
 lost the majority of that part of the conversation.&lt;/p&gt;

&lt;p&gt;In this week’s concept, we cover a general overview of kubernetes and how
kubernetes is used when deploying and scaling up . We
also dive into how this is being used at our guest Cory Darby’s company,
BlueCat.&lt;/p&gt;

&lt;p&gt;In this week’s pull request covers a pull request
&lt;a href=&quot;https://github.com/trinodb/trino/pull/2462&quot;&gt;https://github.com/trinodb/trino/pull/2462&lt;/a&gt; which closes ticket
&lt;a href=&quot;https://github.com/trinodb/trino/issues/2441&quot;&gt;https://github.com/trinodb/trino/issues/2441&lt;/a&gt;. This was actually a PR Brian
submitted some months ago. He dives into a bit about 
&lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html&quot;&gt;Elasticsearch mappings&lt;/a&gt; 
and how Elasticsearch models their data. He then covers how this motivated the 
pull request addressing the need for explicit mappings of which Elasticsearch 
fields are array types vs scalar.&lt;/p&gt;

&lt;p&gt;In this week’s question, we answer, “Why does the web ui say “disabled”?” This 
typically comes from a security setup issue and there’s another similar issue
 when you are using a proxy that we cover as a bonus.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-342.html&quot;&gt;https://trino.io/docs/current/release/release-342.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/docs/current/release/release-343.html&quot;&gt;https://trino.io/docs/current/release/release-343.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manfred’s Training - SQL at any scale
&lt;a href=&quot;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&quot;&gt;https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/&lt;/a&gt;
&amp;lt;https://learning.oreilly.com/live-training/courses/presto-first-steps
/0636920462859/&amp;gt;&lt;/p&gt;

&lt;p&gt;Blogs&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&quot;&gt;https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://medium.com/@joshua_robinson/presto-and-fast-object-putting-backups-to-use-for-devops-and-machine-learning-s3-46876eef4ffa&quot;&gt;https://medium.com/@joshua_robinson/presto-and-fast-object-putting-backups-to-use-for-devops-and-machine-learning-s3-46876eef4ffa&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/BigDataATL/events/273435961/&quot;&gt;https://www.meetup.com/BigDataATL/events/273435961/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://edw2020chicago.dataversity.net/&quot;&gt;https://edw2020chicago.dataversity.net/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin:
&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Presto Summit Series - Real world usage
&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recent Podcasts:
&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;
&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>This week we had a bit of a technical issue between zoom and OBS so there was some editing done to remove a portion of the broadcast which mainly cuts out us covering the releases. We circle back and give a small summary but unfortunately lost the majority of that part of the conversation.</summary>

      
      
    </entry>
  
    <entry>
      <title>Launching Presto First Steps training</title>
      <link href="https://trino.io/blog/2020/10/07/presto-first-steps.html" rel="alternate" type="text/html" title="Launching Presto First Steps training" />
      <published>2020-10-07T00:00:00+00:00</published>
      <updated>2020-10-07T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/07/presto-first-steps</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/07/presto-first-steps.html">&lt;p&gt;Writing the book &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive
Guide&lt;/a&gt; with Matt and Martin earlier this
year, and then publishing it with &lt;a href=&quot;https://www.oreilly.com/&quot;&gt;O’Reilly&lt;/a&gt; was a
great experience and has been a great success. Lots of readers took advantage of
getting a &lt;a href=&quot;/blog/2020/04/11/the-definitive-guide.html&quot;&gt;free digital copy of the book from Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now it is time to follow up with a training class. I am pleased to let you know
that you can join me for three hours of
&lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;Presto First Steps&lt;/a&gt;
in November.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The new course is aimed at beginners with Presto, who want to accelerate their
initial understanding and adoption. You ramp up quickly to install and configure
Presto, use the CLI, and learn how to query connected data sources with SQL. The
class is completely interactive, and I look forward to many of you joining me
and bring lots of great questions to ask.&lt;/p&gt;

&lt;p&gt;The class includes three interactive training exercises on
&lt;a href=&quot;https://katacoda.com/&quot;&gt;Katacoda&lt;/a&gt;. They allow you to get hands on experience
with Presto immediately. Lots of useful tips and tricks are covered in my
material, and of course I plan to run a bunch of additional demos. You can find
more details about the content of the class in &lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;the registration
page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Don’t miss out and make sure you &lt;a href=&quot;https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/&quot;&gt;reserve your ticket
now&lt;/a&gt;!&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Writing the book Trino: The Definitive Guide with Matt and Martin earlier this year, and then publishing it with O’Reilly was a great experience and has been a great success. Lots of readers took advantage of getting a free digital copy of the book from Starburst. Now it is time to follow up with a training class. I am pleased to let you know that you can join me for three hours of Presto First Steps in November.</summary>

      
      
    </entry>
  
    <entry>
      <title>Hello I&apos;m Brian, Presto Developer Advocate</title>
      <link href="https://trino.io/blog/2020/10/01/intro-developer-advocate.html" rel="alternate" type="text/html" title="Hello I&apos;m Brian, Presto Developer Advocate" />
      <published>2020-10-01T00:00:00+00:00</published>
      <updated>2020-10-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/10/01/intro-developer-advocate</id>
      <content type="html" xml:base="https://trino.io/blog/2020/10/01/intro-developer-advocate.html">&lt;p&gt;Hello, Presto nation!&lt;/p&gt;

&lt;p&gt;My name is Brian, and I’m a new developer advocate working at Starburst. Let me 
give you a little background on how I got here, and cover how my role can help
the Presto community.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/developer-advocate/brian.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;My career in computation and databases started in the military. As luck would
have it, I worked on a big data team as my first job out of college! I was in a
Hive shop that dealt with the typical outdated runtime and slow query
turnaround. Eventually, our architect introduced us to Presto as an alternative.
I worked with him to start testing and moving our existing use cases built on
Hive to use Presto. We also used Elasticsearch and had a few cases that needed
to perform joins and unions over the datasets in both Elasticsearch and Hive.
There were a few use cases that were not going to immediately be transferable
without some modification to the Presto Elasticsearch connector.&lt;/p&gt;

&lt;h2 id=&quot;joining-the-presto-community&quot;&gt;Joining the Presto community&lt;/h2&gt;

&lt;p&gt;The first modification was &lt;a href=&quot;https://github.com/trinodb/trino/issues/2441&quot;&gt;adding support for Elasticsearch array 
types&lt;/a&gt;, and the second was, 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/754&quot;&gt;support for nested types&lt;/a&gt;. My 
first interaction with the Presto community was incredible! As a serial
open-source attempter, I always wanted to get invested in an open-source
project. I had started pull requests in various projects. Sometimes I ran into 
unpleasant maintainers, in other cases the rules were daunting or too confusing
to start. I created a pull request only to have it sit there with no
communication as to why it wasn’t accepted or even looked at. However, when I
first joined &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;, I searched to see if there was already a
discussion about array types in the history. I ran into &lt;a href=&quot;https://trinodb.slack.com/archives/CP1MUNEUX/p1570064139005900&quot;&gt;a discussion between 
Dain and Martin about this 
issue&lt;/a&gt;. I
conversed with Martin, who was incredibly polite and willing to take time to 
discuss how this should be implemented.&lt;/p&gt;

&lt;h2 id=&quot;contributing&quot;&gt;Contributing&lt;/h2&gt;

&lt;p&gt;When I actually pulled the code, I saw how well written and maintained it was
compared to many open-source projects I had seen in the past. I made a few
changes, wrote a test around my use case, and signed a CLA agreement. After a
couple of weeks, my pull request was merged and I had finally contributed to an
open-source project. After that interaction, and seeing the code, I wanted to do
more. I really saw something special with this community.&lt;/p&gt;

&lt;p&gt;While many Presto contributors are doing amazing work contributing code, I
noticed there were some holes in other areas of the community that needed to be
filled. I started answering questions on Slack, LinkedIn, and Twitter and I
planned out a Udemy course for Presto. The &lt;a href=&quot;https://youtu.be/RPaG0Gu2I6c&quot;&gt;initial 
video&lt;/a&gt; I piloted is about tuning the memory
configuration of Presto.&lt;/p&gt;

&lt;h2 id=&quot;becoming-a-developer-advocate&quot;&gt;Becoming a developer advocate&lt;/h2&gt;

&lt;p&gt;Around this time I got into contact with some folks at Starburst about joining 
them to work with the community and Presto full-time! As I joined, we hadn’t
figured out what my exact role was at Starburst. Eventually, we decided I would
best serve as a developer advocate. What I’ve come to find is this role is 
aiming to do exactly what I set out to do before I joined. As a developer
advocate, I serve the community and act as a liaison between Starburst and the
Presto community. Up until this time, that responsibility has been unofficially
shared by many of the maintainers of Presto. I am here to simply take some of
that responsibility from them and focus all of my efforts on community growth
and health.&lt;/p&gt;

&lt;p&gt;The health of a community is difficult to define and is generally
subject to various signals that we can observe. These signals include an
increase in helpful interactions within the community, new members joining the
community, members who are actively engaging in the community, diversity of the
community, and more. If we start by focusing on making the community successful,
the success of the project will follow. Keeping the goal in mind that co-creator
David Phillips mentions:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This is the type of project that we look at Postgres as the inspiration. 
Postgres started in the eighties, it became a SQL system in the nineties, and
it’s still in active use and active development today. We say we want Presto
to have the same kind of history. - David Phillips&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next Steps&lt;/h2&gt;

&lt;p&gt;My first goal is to create a larger set of free learning materials, that expand
upon my initial goals when planning for my Udemy course. I recently started a
show with Manfred Moser called the Presto Community Broadcast. The show landing 
page is &lt;a href=&quot;/broadcast.html&quot;&gt;here&lt;/a&gt; and contains all the information about the show
schedule and where to find new and old episodes. This helps as we can use any
relevant material we create on this show for future teaching or blogs. We want
these live sessions to be interactive, and look forward to your feedback to
understand if our efforts are actually helping, or if you have ideas to improve
the show. This show, along with blogs, documentation, and interactive tutorials
are how I initially intend to fill some common questions that are received
through our &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; and &lt;a href=&quot;https://stackoverflow.com/questions/tagged/presto&quot;&gt;Stack 
Overflow&lt;/a&gt; channels. Another
goal of adding these materials is to attract new members to the community. Not
all the material may be super relevant to the existing members of the community,
but this makes the community much more viable for newer members.&lt;/p&gt;

&lt;p&gt;Outside of providing new learning materials, your feedback helps us to
understand common problems and allows us to fix them. This feedback will aid us
in focusing on issues commonly voiced within the community but somehow get lost
in translation. This could be improving the Presto code itself, or it could be
making the documentation better, or to address common confusion, even if the
confusion comes from a force outside of the Presto community.&lt;/p&gt;

&lt;p&gt;For example, I recently &lt;a href=&quot;https://bitsondata.dev/what-is-benchmarketing-and-why-is-it-bad/&quot;&gt;wrote a 
blog&lt;/a&gt; about
some shady benchmarketing practices that were painting Presto in a bad light. 
The goal here was to make fun of the wildly bogus claims brought against Presto 
and the community. What better way to do that than to write a nerdy Justin
Bieber parody?&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/FSy8V-R0_Zw&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;While I have hopefully convinced you all of my mission here. I can’t accomplish
any of this in a vacuum. The whole point of my work starts and ends with all of
you. I look forward to speaking with and one day post COVID-19, meeting you all
at meetups and conferences. For now virtual meetups and the Presto Community
Broadcast are a great start. If you have ideas or want to reach out to introduce
yourself, you can find me on 
&lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/bitsondatadev&quot;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks for reading this and being a part of this community. One last thing to
tell you about myself, I’m a sucker for cheesy sign-offs so…&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For fast data at resto, Presto is the besto!&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Hello, Presto nation! My name is Brian, and I’m a new developer advocate working at Starburst. Let me give you a little background on how I got here, and cover how my role can help the Presto community.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto at Argentina Big Data Meetup 2020-09-23</title>
      <link href="https://trino.io/blog/2020/09/28/argentina-big-data-meetup.html" rel="alternate" type="text/html" title="Presto at Argentina Big Data Meetup 2020-09-23" />
      <published>2020-09-28T00:00:00+00:00</published>
      <updated>2020-09-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/09/28/argentina-big-data-meetup</id>
      <content type="html" xml:base="https://trino.io/blog/2020/09/28/argentina-big-data-meetup.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/IkjNcW7cS2w&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Martin made a guest appearance at the 
&lt;a href=&quot;https://www.meetup.com/Argentina-Big-Data-Meetup/&quot;&gt;Argentina Big Data Meetup&lt;/a&gt;
(online) where in the first hour Martin talks about Presto’s past, present, and
future. This includes the history from Facebook to Starburst, some context to
some early architectural decisions, as well as, why Presto was open-sourced. 
Finally, Martin covers recent changes along with some upcoming changes on the
roadmap.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/blog/argentina-big-data-meetup/Presto%20-%20Big%20Data%20Meetup%20Argentina%202020-09-23.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next hour is an interesting talk given by Federico Palladoro covering his
company, Jampp’s, migration strategy from EMR Presto to Docker using Nomad vs
Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/blog/argentina-big-data-meetup/Big%20Data%20Meetup_%20Presto%20on%20Docker.pdf&quot;&gt;Slides&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These presentations are in Spanish.&lt;/p&gt;

&lt;!--more--&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Martin made a guest appearance at the Argentina Big Data Meetup (online) where in the first hour Martin talks about Presto’s past, present, and future. This includes the history from Facebook to Starburst, some context to some early architectural decisions, as well as, why Presto was open-sourced. Finally, Martin covers recent changes along with some upcoming changes on the roadmap. Slides The next hour is an interesting talk given by Federico Palladoro covering his company, Jampp’s, migration strategy from EMR Presto to Docker using Nomad vs Kubernetes. Slides These presentations are in Spanish.</summary>

      
      
    </entry>
  
    <entry>
      <title>1: What is Presto, WITH RECURSIVE, and Hive connector</title>
      <link href="https://trino.io/episodes/1.html" rel="alternate" type="text/html" title="1: What is Presto, WITH RECURSIVE, and Hive connector" />
      <published>2020-09-24T00:00:00+00:00</published>
      <updated>2020-09-24T00:00:00+00:00</updated>
      <id>https://trino.io/episodes/1</id>
      <content type="html" xml:base="https://trino.io/episodes/1.html">&lt;p&gt;Today’s concept covers a big overview of what Presto is for those that are new
to Presto. For mor information about Presto, check out the following resources:
&lt;a href=&quot;/&quot;&gt;Website&lt;/a&gt;
&lt;a href=&quot;https://trino.io/docs/current/&quot;&gt;Documentation&lt;/a&gt;
Download the &lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Free Presto O’Reilly Book&lt;/a&gt;
Learn &lt;a href=&quot;/development/&quot;&gt;how to contribute&lt;/a&gt;
Join our community on the &lt;a href=&quot;/slack.html&quot;&gt;Slack channel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this PR we covered &lt;a href=&quot;https://github.com/trinodb/trino/pull/5163&quot;&gt;pull request 5163&lt;/a&gt;
which is actually just a documentation update around the existing
experimental features of the WITH RECURSIVE feature. The extended development of
this feature is still being tracked and documented in 
&lt;a href=&quot;https://github.com/trinodb/trino/issues/1122&quot;&gt;issue 1122&lt;/a&gt;. As with many 
problems in recursion, the solution space typically exponentially increases and
so it is something that can easily be misused and cause problems. We run the 
query and discuss it as well as some of the things that can go wrong. Check out
 he pull request to see more documentation that was added around it.&lt;/p&gt;

&lt;p&gt;In the question of the week, we covered a lot of the confusion around the hive
connector(https://trino.io/docs/current/connector/hive.html). Feel free to 
try out the katacoda example I created and will be nesting within an 
&lt;a href=&quot;blog/2020/10/20/intro-to-hive-connector.html&quot;&gt;intro to the hive connector blog&lt;/a&gt;.
This is running on a non-paid katacoda account so resources are scarce at times
and it may take a while to load. Nevertheless, the information written around it
will help you quickly have a Presto environment to play with.&lt;/p&gt;

&lt;p&gt;Release Notes discussed:
&lt;a href=&quot;https://trino.io/docs/current/release/release-341.html&quot;&gt;https://trino.io/docs/current/release/release-341.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upcoming events&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.meetup.com/BigDataATL/events/273435961/&quot;&gt;https://www.meetup.com/BigDataATL/events/273435961/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://edw2020chicago.dataversity.net/&quot;&gt;https://edw2020chicago.dataversity.net/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-portland-or-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-minneapolis-mn-2/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-jacksonville-fl/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-detroit-mi/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-providence-ri/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&quot;&gt;https://techtalksummits.com/event/virtual-commercial-it-denver-co/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&quot;&gt;https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Latest training from David, Dain, and Martin:
&lt;a href=&quot;https://trino.io/blog/2020/07/15/training-advanced-sql.html&quot;&gt;https://trino.io/blog/2020/07/15/training-advanced-sql.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/30/training-query-tuning.html&quot;&gt;https://trino.io/blog/2020/07/30/training-query-tuning.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/13/training-security.html&quot;&gt;https://trino.io/blog/2020/08/13/training-security.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/08/27/training-performance.html&quot;&gt;https://trino.io/blog/2020/08/27/training-performance.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Presto Summit Series - Real world usage
&lt;a href=&quot;https://trino.io/blog/2020/05/15/state-of-presto.html&quot;&gt;https://trino.io/blog/2020/05/15/state-of-presto.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;https://trino.io/blog/2020/06/16/presto-summit-zuora.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;https://trino.io/blog/2020/07/06/presto-summit-arm-td.html&lt;/a&gt;
&lt;a href=&quot;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&quot;&gt;https://trino.io/blog/2020/07/22/presto-summit-pinterest.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recent Podcasts:
&lt;a href=&quot;https://www.contributor.fyi/presto&quot;&gt;https://www.contributor.fyi/presto&lt;/a&gt;
&lt;a href=&quot;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&quot;&gt;https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about Presto yourself, you should check out the 
O’Reilly Trino Definitive guide. You can download 
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;the free PDF&lt;/a&gt; or 
buy the book online.&lt;/p&gt;

&lt;p&gt;Music for the show is from the &lt;a href=&quot;https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp&quot;&gt;Megaman 6 Game Play album by Krzysztof 
Słowikowski&lt;/a&gt;.&lt;/p&gt;</content>

      

      <summary>Today’s concept covers a big overview of what Presto is for those that are new to Presto. For mor information about Presto, check out the following resources: Website Documentation Download the Free Presto O’Reilly Book Learn how to contribute Join our community on the Slack channel</summary>

      
      
    </entry>
  
    <entry>
      <title>Read support for original files of Hive transactional tables in Presto</title>
      <link href="https://trino.io/blog/2020/09/23/hive-acid-original-files.html" rel="alternate" type="text/html" title="Read support for original files of Hive transactional tables in Presto" />
      <published>2020-09-23T00:00:00+00:00</published>
      <updated>2020-09-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/09/23/hive-acid-original-files</id>
      <content type="html" xml:base="https://trino.io/blog/2020/09/23/hive-acid-original-files.html">&lt;p&gt;In &lt;a href=&quot;https://trino.io/docs/current/release/release-331.html&quot;&gt;Presto 331&lt;/a&gt;,
read support for Hive transactional tables was introduced. It works well, if a
user creates a new Hive transactional table and reads it from Presto. However,
if an existing table is converted to a Hive transactional table, Presto would
fail to read data from such a table because read support for original files was
missing. Original files are those files in a Hive transactional table that
existed before the table was converted into a Hive transactional table.
Until version 340, Presto expected all files in a Hive transactional table to be
in Hive ACID format. Users would have to perform a major compaction to convert
original files into ACID files (i.e. base files) in such tables. This is not
always possible as the original flat table (table in non-ACID format) could be
huge and converting all the existing data into ACID format can be very
expensive.&lt;/p&gt;

&lt;p&gt;This blog is an extension of the blog &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and transactional tables’
support in Presto&lt;/a&gt;. It first describes
original files and then goes into details of read support for such files that
was  added in Presto 340.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;what-are-the-original-files&quot;&gt;What are the original files?&lt;/h1&gt;

&lt;p&gt;Files present in non-transactional ORC tables have the standard ORC schema. When
a flat table is converted into a transactional table, existing files are not
converted into Hive ACID format. Such files, in a transactional table, that are
not in Hive ACID format, are called original files. These files are named as
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;000000_X&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;000000_X_copy_Y&lt;/code&gt;. These files don’t have ACID columns and have
differences in the schema as follows:&lt;/p&gt;

&lt;p&gt;Table Schema&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;n_nationkey : int,
n_name : string,
n_regionkey : int,
n_comment : string
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Original File Schema&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct {
    n_nationkey : int,
    n_name : string,
    n_regionkey : int,
    n_comment : string
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Delta File Schema&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct {
    operation : int,
    originalTransaction : bigint,
    bucket : int,
    rowId : bigint,
    currentTransaction : bigint,
    row : struct {
        n_nationkey : int,
        n_name : string,
        n_regionkey : int,
        n_comment : string
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Before Presto 340, Presto used to fail the query if it reads from a Hive
transactional table having original files.&lt;/p&gt;

&lt;h1 id=&quot;update-and-delete-support-on-original-files&quot;&gt;Update and delete support on original files&lt;/h1&gt;

&lt;p&gt;Hive achieves updates/deletes on a row in original files by synthetically
generating ACID columns for original files. Presto follows the same mechanism of
generating ACID columns synthetically as discussed later.&lt;/p&gt;

&lt;h2 id=&quot;acid-column-generation-on-original-files&quot;&gt;ACID column generation on original files&lt;/h2&gt;

&lt;p&gt;Files in Hive ACID format have 5 ACID columns, but we need only 3 columns i.e.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalTransactionId&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bucketId&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rowId&lt;/code&gt; to uniquely identify a row. In
this section, we will see how these 3 columns are synthetically generated for
original files.&lt;/p&gt;

&lt;h3 id=&quot;original-transaction-id&quot;&gt;Original transaction ID&lt;/h3&gt;

&lt;p&gt;An original transaction ID is the write ID when a record is first created. For
original files, the original transaction ID is always 0.&lt;/p&gt;

&lt;h3 id=&quot;bucket-id&quot;&gt;Bucket ID&lt;/h3&gt;

&lt;p&gt;Bucket ID is retrieved from the original file name. For the original file
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0000ABC_DEF&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0000ABC_DEF_copy_G&lt;/code&gt;, the bucket ID will be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ABC&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;row-id&quot;&gt;Row ID&lt;/h3&gt;

&lt;p&gt;To calculate the row ID, the total row count of all the original files, which
come before the current one in lexicographical order, is calculated.
Then, the row ID is equal to the sum of the value calculated and local row ID in
the current original file.&lt;/p&gt;

&lt;p&gt;Here is an example to calculate the global Row ID of the 3rd row of an original
File &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;000000_0_copy_2&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;000000_0            -&amp;gt; 	X1 Rows (returned by ORC footer field numberOfRows)

000000_0_copy_1     -&amp;gt; 	X2 Rows (returned by ORC footer field numberOfRows)

000000_0_copy_2     -&amp;gt;	[ Row 0 ]
                        [ Row 1 ]
                        [ Row 2 ]   &amp;lt;- Local Row ID (returned by filePosition in OrcRecordReader) = 2
                                       Global Row ID = (X1+X2+2)
                        [ Row 3 ]

000000_0_copy_3     -&amp;gt;  X4 Rows
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; As we see, additional computations are required to generate row IDs
while reading original files, therefore, read is slower than ACID format files
in the transactional table.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once Presto has the 3 ACID columns for a row, it can check for update/delete on
it. Delete deltas, written by Hive for Original files, have row IDs generated by
following the same
strategy as discussed above, hence, the same logic of filtering out deleted rows
as discussed in &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and transactional tables’ support in Presto
&lt;/a&gt; works with the original files too.&lt;/p&gt;

&lt;h1 id=&quot;changes-in-presto-to-support-reading-original-files&quot;&gt;Changes in Presto to support reading original files&lt;/h1&gt;

&lt;p&gt;Presto split generation logic and ORC reader is modified to add read support
for original files. Following are the changes done at coordinator and worker
level:&lt;/p&gt;

&lt;h2 id=&quot;split-generation&quot;&gt;Split generation&lt;/h2&gt;

&lt;p&gt;We use a new class named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; to store &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OriginalFiles
&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaFiles&lt;/code&gt; for HiveSplit. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundSplitLoader.loadPartitions
&lt;/code&gt; is called in an executor to create splits for each partition. In addition
to the steps mentioned in blog &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and transactional tables’ support in
Presto&lt;/a&gt;, Presto does the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Original files, ACID subdirectories (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;base&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;) are
figured out by listing the partition location by Hive AcidUtils Helper class.&lt;/li&gt;
  &lt;li&gt;Registry for delete deltas &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaInfo&lt;/code&gt; is created which has minimal
information through which &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; path can be constructed by the workers.&lt;/li&gt;
  &lt;li&gt;Registry for original files &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OriginalFileInfo&lt;/code&gt; is created which has
information like file name, size and bucket ID.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo.Builder&lt;/code&gt; keeps a map
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo.Builder.bucketIdToOriginalFileInfoMap&lt;/code&gt; of bucket ID to the list of
original files belonging to the same bucket.&lt;/li&gt;
  &lt;li&gt;Hive splits are created for each original file, base and delta directories.
Each hive split has a construct &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;For an original file split, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; has:&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;&lt;strong&gt;Bucket ID:&lt;/strong&gt; Bucket ID of the original file.&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;OriginalFilesList:&lt;/strong&gt; List of all the original files belong to the
 same bucket calculated from
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo.Builder.bucketIdToOriginalFileInfoMap&lt;/code&gt;.&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;DeleteDeltaFilesList:&lt;/strong&gt; List of delete deltas.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;For an base/delta file split, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; has:&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;&lt;strong&gt;DeleteDeltaFilesList:&lt;/strong&gt; List of delete deltas.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;reading-hive-original-files-data-in-workers&quot;&gt;Reading Hive original files data in workers&lt;/h2&gt;

&lt;p&gt;Hive splits generated during the split generation phase make their way to worker
nodes where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSourceFactory&lt;/code&gt; is used to create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PageSource&lt;/code&gt; for
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt; operator. In addition to the steps mentioned in blog &lt;a href=&quot;/blog/2020/06/01/hive-acid.html&quot;&gt;Hive ACID and
transactional tables’ support in Presto&lt;/a&gt;
, Presto does the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt; is created for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; locations, if any.&lt;/li&gt;
  &lt;li&gt;For original file split, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSourceFactory&lt;/code&gt; fetches &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalFilesList&lt;/code&gt;
from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AcidInfo&lt;/code&gt; and calculates &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalFileRowId&lt;/code&gt; by calling
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OriginalFilesUtils.getPrecedingRowCount&lt;/code&gt; and sends this information to
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSource&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSouce&lt;/code&gt; returns rows from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcRecordReader&lt;/code&gt; which are not present in
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;follow-up&quot;&gt;Follow up&lt;/h1&gt;

&lt;p&gt;For an original file split, the current implementation may take quadratic time
in the worst case to calculate global row ID by reading row IDs from the
original files’ footer. It may be optimized by keeping a query level cache in
worker nodes or by precomputing global row IDs in coordinator during split
computation.&lt;/p&gt;

&lt;h1 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h1&gt;

&lt;p&gt;I would also like to express my gratitude to everyone who helped me throughout
developing the feature. Thank you
&lt;a href=&quot;https://in.linkedin.com/in/shubham-tagra-267a5838&quot;&gt;Shubham Tagra&lt;/a&gt; for
brainstorming sessions and providing continuous guidance on Presto Hive ACID.
Thank you &lt;a href=&quot;https://www.linkedin.com/in/piotrfindeisen/&quot;&gt;Piotr Findeisen&lt;/a&gt; for
helping me further refine the code with insightful code reviews.&lt;/p&gt;</content>

      
        <author>
          <name>Harmandeep Singh, Qubole</name>
        </author>
      

      <summary>In Presto 331, read support for Hive transactional tables was introduced. It works well, if a user creates a new Hive transactional table and reads it from Presto. However, if an existing table is converted to a Hive transactional table, Presto would fail to read data from such a table because read support for original files was missing. Original files are those files in a Hive transactional table that existed before the table was converted into a Hive transactional table. Until version 340, Presto expected all files in a Hive transactional table to be in Hive ACID format. Users would have to perform a major compaction to convert original files into ACID files (i.e. base files) in such tables. This is not always possible as the original flat table (table in non-ACID format) could be huge and converting all the existing data into ACID format can be very expensive. This blog is an extension of the blog Hive ACID and transactional tables’ support in Presto. It first describes original files and then goes into details of read support for such files that was added in Presto 340.</summary>

      
      
    </entry>
  
    <entry>
      <title>Configuring and Tuning Presto Performance with Dain</title>
      <link href="https://trino.io/blog/2020/08/27/training-performance.html" rel="alternate" type="text/html" title="Configuring and Tuning Presto Performance with Dain" />
      <published>2020-08-27T00:00:00+00:00</published>
      <updated>2020-08-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/27/training-performance</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/27/training-performance.html">&lt;p&gt;With the help of &lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;David’s training about advanced SQL&lt;/a&gt;, you composed a number of useful queries.
You gained valuable insights from the resulting data. However these complex
queries take time to run. If only you could make them run faster. I think we
have just what you need:&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Understanding and Tuning Presto Query Processing&lt;/strong&gt;
with Dain Sundstrom.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We did it again! Joined by over 120 eager students we discussed all sorts of
aspects of sizing and tuning your Presto cluster. Yet again we received so many
questions that we went over our planned time budget. The material covered is
crucial to run a Presto deployment successfully in production, so make sure you
check out the recording and the slide deck:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/09/Presto-Training-Series-Configuring-Tuning-Presto-Performance.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/Pu80FkBRP-k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;This training session is geared towards helping users tune and size their Presto
deployment for optimal performance. Delivered by Dain Sundstrom,  this session
covers the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Cluster configuration and node sizing&lt;/li&gt;
  &lt;li&gt;Memory configuration and management&lt;/li&gt;
  &lt;li&gt;Improving task concurrency and worker scheduling&lt;/li&gt;
  &lt;li&gt;Tuning your JVM configuration&lt;/li&gt;
  &lt;li&gt;Investigating queries for join order and other criteria&lt;/li&gt;
  &lt;li&gt;Tuning the cost-based optimizer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 9 September 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/38kt5ih&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>With the help of David’s training about advanced SQL, you composed a number of useful queries. You gained valuable insights from the resulting data. However these complex queries take time to run. If only you could make them run faster. I think we have just what you need: Join us for a free webinar Understanding and Tuning Presto Query Processing with Dain Sundstrom. Update: We did it again! Joined by over 120 eager students we discussed all sorts of aspects of sizing and tuning your Presto cluster. Yet again we received so many questions that we went over our planned time budget. The material covered is crucial to run a Presto deployment successfully in production, so make sure you check out the recording and the slide deck: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Faster Queries on Nested Data</title>
      <link href="https://trino.io/blog/2020/08/14/dereference-pushdown.html" rel="alternate" type="text/html" title="Faster Queries on Nested Data" />
      <published>2020-08-14T00:00:00+00:00</published>
      <updated>2020-08-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/14/dereference-pushdown</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/14/dereference-pushdown.html">&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-334.html&quot;&gt;Presto 334&lt;/a&gt;
adds significant performance improvements for queries
accessing nested fields inside struct columns. They have been optimized through
the pushdown of dereference expressions. With this feature, the query execution
prunes structural data eagerly, extracting the necessary fields.&lt;/p&gt;

&lt;h1 id=&quot;motivation&quot;&gt;Motivation&lt;/h1&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RowType&lt;/code&gt; is a built-in data type of Presto, storing the in-memory
representation of commonly used nested data types of the connectors, eg.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;STRUCT&lt;/code&gt; type in Hive. Datasets often contain wide and deeply nested structural
columns, i.e. a struct column having hundreds of fields, with the fields being
nested themselves.&lt;/p&gt;

&lt;p&gt;Although such &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RowType&lt;/code&gt; columns can contain plenty of data, most of the
analytical queries access just a few fields out of it. Without dereference
pushdown, Presto scans the whole column, and shuffles all that data around
before projecting the necessary fields. This suboptimal execution causes higher
CPU usage, higher memory usage and higher query latencies, than required. The
unnecessary operations get even more expensive with wider/deeper structs and
more complex query plans.&lt;/p&gt;

&lt;p&gt;LinkedIn’s data ecosystem makes heavy usage of nested columns. It is common to
have 2-3 levels of nesting, and up to 50 fields in most of our tracking tables.
Because of the query execution inefficiency for nested fields, ETL pipelines
were set up at LinkedIn to copy the nested columns as a set of top-level columns
 corresponding to subfields. This step added overhead in our ingestion process
and delayed data availability for analytics. It also caused ORC schemas to be
inconsistent with the rest of the infrastructure, making it harder to migrate
from existing flows on row-oriented formats.&lt;/p&gt;

&lt;p&gt;Similarly, Lyft’s schemas make heavy use of nested data to decompose a ride
into its routes, riders, segments, modes, and geo-coordinates. Prior to the
performance improvements, analytical queries would either need to be run on
clusters with very long timeouts, or the data would have to be flattened before
being analyzed, adding an extra ETL step. Not only would this be costly, it
would also cause the original schema to diverge in our data warehouse making it
more difficult for data scientists to understand.&lt;/p&gt;

&lt;p&gt;The dereference pushdown optimization in Presto is having a massive impact on
the ingestion story at both LinkedIn and Lyft. Nested data is now being made
available faster for consumption with a consistency of structure across all
stores, while maintaining performance parity for analytical queries.&lt;/p&gt;

&lt;h1 id=&quot;example&quot;&gt;Example&lt;/h1&gt;

&lt;p&gt;Say we have a Hive table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs&lt;/code&gt;, with a struct-typed column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; in the
schema. The column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; is wide and deeply nested, i.e. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW(company
varchar, requirements ROW(skills array(...), education ROW(...), salary ...) ,
...)&lt;/code&gt;. Most queries would access a small percentage of data from this struct
using the dereference projection (the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.&lt;/code&gt; operation). Consider such a query &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Q&lt;/code&gt;
below.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;appid&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;company&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;applications&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jobs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jobid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jobid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It should suffice to scan only one field &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;company&lt;/code&gt; from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;J.job_info&lt;/code&gt; for
executing this query. But, without dereference pushdown, Presto scans and
shuffles everything from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt;, only to project a single field at the end.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/original_plan.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;solution-pushdown-of-dereference-expressions&quot;&gt;Solution: Pushdown of Dereference Expressions&lt;/h1&gt;

&lt;p&gt;With dereference pushdown, Presto optimizes queries by extracting the sufficient
 fields from a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; as early as possible. This is enforced by modifying the
query plan through a set of optimizers, and can be broadly divided into two
parts.&lt;/p&gt;

&lt;p&gt;First, dereference projections are extracted in the query plan and pushed as
close to the table scan as possible. This happens independent of what the
connector is. Secondly, there is a further improvement for Hive tables. The
Hive Connector and ORC/Parquet readers have been optimized to scan only the
sufficient subfield columns.&lt;/p&gt;

&lt;p&gt;Pushdown of predicates on the subfields is also a crucial optimization. For
example, if a query has filters on subfields (i.e. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a.b &amp;gt; 5&lt;/code&gt;), they should be
utilized by ORC/Parquet readers while scanning files. The pushdown helps with
the pruning of files, stripes and row-groups based on column-level statistics.
This optimization is achieved as a byproduct of the above two optimizations.&lt;/p&gt;

&lt;p&gt;With the dereference pushdown, queries observe significant performance gains in
terms of CPU/memory usage and query runtime, roughly proportional to the
relative size of nested columns compared to the accessed fields.&lt;/p&gt;

&lt;h2 id=&quot;pushdown-in-query-plan&quot;&gt;Pushdown in Query Plan&lt;/h2&gt;

&lt;p&gt;The goal here is to execute dereference projections as early as possible. This
usually means performing them right after the table scans.&lt;/p&gt;

&lt;p&gt;A projection operation that performs dereferencing on input symbols (i.e.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info.company&lt;/code&gt;) reduces the amount of data going up the plan tree. Pushing
dereference projections down means that we are pruning data early. It reduces
the amount of data being processed and shuffled in query execution. For the
example query &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Q&lt;/code&gt;, the query plan looks like the following when dereference
pushdown is enabled.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/transformed_plan.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The projection &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info.company&lt;/code&gt; now directly follows the scan of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs&lt;/code&gt; table,
 avoiding the propagation the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Limit&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Join&lt;/code&gt; nodes. Note
that all of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info&lt;/code&gt; is still being scanned, and pruning it in the reader
requires connector-dependent optimizations.&lt;/p&gt;

&lt;h2 id=&quot;pushdown-in-the-hive-connector&quot;&gt;Pushdown in the Hive Connector&lt;/h2&gt;

&lt;p&gt;In columnar formats like ORC and Parquet, the data is laid out in a columnar
fashion even for subfields. If we have a column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;STRUCT(f1, f2, f3)&lt;/code&gt;, the
subfields &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f1&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f2&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f3&lt;/code&gt; are stored as independent columns. An optimized
query engine should only scan the required fields through its ORC reader,
skipping the rest. This optimization has been added for Hive connector.&lt;/p&gt;

&lt;p&gt;Dereference projections above a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScanNode&lt;/code&gt; are pushed down in the Hive
connector as “virtual” (or “projected”) columns. The query plan is modified to
refer to these new columns. For the query &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Q&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs&lt;/code&gt; table would be scanned
differently with this optimization, as shown below. The projection is now
embedded in the Hive connector. Here, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info#company&lt;/code&gt; can be thought of as
a virtual column representing the subfield &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;job_info.company&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/connector_pushdown.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The Hive connector handles the projections before returning columns to Presto’s
engine. It provides the required virtual columns to format-specific readers.
ORC and Parquet readers optimize their scans based on subfields required,
increasing their read throughput. Subfield pruning is not possible for
row-oriented format readers (e.g. AVRO). For them, Hive connector performs
adaptation to project the required fields.&lt;/p&gt;

&lt;h2 id=&quot;pushdown-of-predicates-on-subfields&quot;&gt;Pushdown of Predicates on Subfields&lt;/h2&gt;

&lt;p&gt;Columnar formats store per-column statistics in the data files, which can be
used by the readers for filtering. eg. if a query contains filter &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y = 5&lt;/code&gt; for a
top-level column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;, Presto’s ORC reader can skip ORC stripes and files by
looking at the upper and lower bounds for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; in the statistics.&lt;/p&gt;

&lt;p&gt;The same concept of predicate-based pruning can work for filters involving
subfields, since the statistics are also stored for subfield columns. i.e.
Presto’s ORC/Parquet reader should be able to filter based on a constraint like
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x.f1 = 5&lt;/code&gt; for more optimal scans. Good news! In the final optimized plan,
predicates on a subfield are pushed down to the hive connector as a constraint
on the corresponding virtual column, and later used for optimizing the scan.
The complete logic is a bit complicated to explain here, but can be illustrated
through the following example.&lt;/p&gt;

&lt;p&gt;Given an initial plan with a predicate on a dereferenced field (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x.f1 = 5&lt;/code&gt;), a
chain of optimizers transform it to a more optimal plan with reader-level
predicates. In the future, the same optimization will be added to the Parquet
reader.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/predicate_pushdown.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the final plan, Hive connector knows to scan the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; and the subfield
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x.f1&lt;/code&gt;. It also takes advantage of the “virtual” column constraint &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x#f1 = 5&lt;/code&gt;
for reader-level pruning.&lt;/p&gt;

&lt;h2 id=&quot;performance-improvement&quot;&gt;Performance Improvement&lt;/h2&gt;

&lt;p&gt;Dereference pushdown improves performance for queries accessing nested fields
in multiple ways. First, it increases the read throughput for table scans,
reducing the CPU time. The pruning of fields during the scan also means lesser
data to process for all downstream operators and tasks. So the early
projections result in more optimal execution for any operations that involve
shuffle or copy of data. Moreover, for ORC/Parquet, the read performance
improves in the case of selective filters on subfields.&lt;/p&gt;

&lt;p&gt;Below are some experimental results on a production dataset at LinkedIn which
contains 3 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;STRUCT&lt;/code&gt; columns, having ~20-30 small subfields in each. The
example queries used in the analysis access only a few subfields. The queries
have been listed as their approximate query shape for the sake of brevity. The
plots compare CPU usage, peak memory usage and averaged query wall time.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/cpu_perf.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/memory_perf.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dereference-pushdown/runtime_perf.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;CPU usage and peak memory usage show orders-of-magnitude improvement in
presence of dereference pushdown. Query wall times also reduce considerably,
and this improvement is more drastic for the relatively complex &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; query,
as expected.&lt;/p&gt;

&lt;p&gt;Please note that these are not benchmarks! The performance improvement you’ll
see will vary depending on how many columns are contained in your nested data
versus how many you’ve referenced. At Lyft we saw improvements of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;50x&lt;/code&gt; for some
queries!&lt;/p&gt;

&lt;h2 id=&quot;future-work&quot;&gt;Future Work&lt;/h2&gt;

&lt;p&gt;The pushdown of dereference expressions can be extended to arrays. i.e.
dereference operations applied after unnesting an array should also get pushed
down to the readers. For example, using our jobs table from before, our
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jobs.job_info&lt;/code&gt; structure may contain a repeating structure such as
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;required_skills&lt;/code&gt;. With the following query, the entire required_skills
structure would be read even though only a small part of it is being referenced.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;description&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jobs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;required_skills&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;years_of_experience&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The work for this improvement is being tracked in &lt;a href=&quot;https://github.com/trinodb/trino/issues/3925&quot;&gt;this issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Similar to Hive Connector, connector-level dereference pushdown can be extended
to other connectors supporting nested types.&lt;/p&gt;

&lt;p&gt;Another future improvement will be the pushdown of predicates on subfields for
data stored in Parquet format. Although the pruning of nested fields occurs
with Parquet, the predicates are not yet pushed down into the reader.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Pushing down dereference operations in the query provides massive performance
gains, especially while operating on large structs. At LinkedIn and Lyft, this
feature has shown great impact for analytical queries on nested datasets.&lt;/p&gt;

&lt;p&gt;We’re excited for the Presto community to try it out. Feel free to dig into
&lt;a href=&quot;https://github.com/trinodb/trino/issues/1953&quot;&gt;this github issue&lt;/a&gt; for
technical details. Please reach out to us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; for further
disucssions or reporting issues.&lt;/p&gt;</content>

      
        <author>
          <name>Pratham Desai (LinkedIn), James Taylor (Lyft)</name>
        </author>
      

      <summary>Presto 334 adds significant performance improvements for queries accessing nested fields inside struct columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.</summary>

      
      
    </entry>
  
    <entry>
      <title>Securing Presto with Dain</title>
      <link href="https://trino.io/blog/2020/08/13/training-security.html" rel="alternate" type="text/html" title="Securing Presto with Dain" />
      <published>2020-08-13T00:00:00+00:00</published>
      <updated>2020-08-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/13/training-security</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/13/training-security.html">&lt;p&gt;All the useful and fast running queries your created with the knowledge from
&lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;David’s training about advanced SQL&lt;/a&gt; and &lt;a href=&quot;/blog/2020/07/30/training-query-tuning.html&quot;&gt;Martin’s training about query
tuning&lt;/a&gt; created a problem. You
now have lots of users on your Presto cluster that want to access all sorts of
different data source, have different privileges and corporate security asked
about your plans. How about you tap into some help from Dain:&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Securing Presto&lt;/strong&gt; with Dain Sundstrom.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;What a great training session! Dain captured the audience and lots of questions
were covered beyond all the great material from the slides. Everything is now
available for your convenience:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/08/Presto-Training-Securing-Presto.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/KiMyRc3PSh0&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;In this training session Dain teaches you how to securely deploy Presto at
scale. We cover how to secure Presto itself, access to Presto, and access to
your underlying data. This session covers the following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto authentication, including password &amp;amp; LDAP Authentication&lt;/li&gt;
  &lt;li&gt;Authorization to access your data sources&lt;/li&gt;
  &lt;li&gt;Encryption including Presto client-to-coordinator communication&lt;/li&gt;
  &lt;li&gt;Secure communication in the cluster&lt;/li&gt;
  &lt;li&gt;Support for Kerberos&lt;/li&gt;
  &lt;li&gt;Secrets usage for configuration files including catalogs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 26 August 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/3ioQu7c&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>All the useful and fast running queries your created with the knowledge from David’s training about advanced SQL and Martin’s training about query tuning created a problem. You now have lots of users on your Presto cluster that want to access all sorts of different data source, have different privileges and corporate security asked about your plans. How about you tap into some help from Dain: Join us for a free webinar Securing Presto with Dain Sundstrom. Update: What a great training session! Dain captured the audience and lots of questions were covered beyond all the great material from the slides. Everything is now available for your convenience: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Happy Eighth Birthday Presto!</title>
      <link href="https://trino.io/blog/2020/08/08/presto-eighth-birthday.html" rel="alternate" type="text/html" title="Happy Eighth Birthday Presto!" />
      <published>2020-08-08T00:00:00+00:00</published>
      <updated>2020-08-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/08/08/presto-eighth-birthday</id>
      <content type="html" xml:base="https://trino.io/blog/2020/08/08/presto-eighth-birthday.html">&lt;p&gt;Today, Presto turned eight years old! As Presto co-creator
Dain Sundstrom points out, there’s a reason why the eighth birthday is a
little special:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/daindumb/status/1292296395219595264&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/presto-eighth-birthday/dain-tweet.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Even though Presto is a relatively young project, countless consumers, 
developers, and business personnel have felt its impact. It’s pretty clear
that there’s a lot going on with this project since its inception eight years
ago. Recently, the Presto project hit a stunning twenty thousand commits:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/mtraverso/status/1289036458670448641&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/presto-eighth-birthday/martin-tweet.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It makes you ponder how Presto became so successful in such a short amount of
time. Should the credit be given to the four founders who brought Presto to
life? Perhaps the supporting companies that provided the conditions that
called for such innovation? Or was it the community built around Presto since
its inception that has enabled this radical success?&lt;/p&gt;

&lt;p&gt;In my mind, it’s a combination of these conditions but with a special
emphasis on the latter. Without the founders’ dedication to designing Presto
for speed and extensibility and putting emphasis on a welcoming and
inclusive open-source community we wouldn’t have seen Presto outside the
walls of Facebook. Without companies like Facebook, Teradata, Netflix, and
Treasure Data that acted as a catalyst to this change, we wouldn’t have the initial
use cases that tested Presto’s scalable design and shined a light on Presto
to bring the awareness to the masses. Finally, without the passionate community
of developers who took an interest in giving back their time and efforts, 
Presto wouldn’t be anywhere near as robust or flexible as it is today. Now 
Presto has reached an unprecedented level of maturity and helped many
developers, scientists, and analysts find the answers they were looking for. 
It speaks volumes about just how special the project really is.&lt;/p&gt;

&lt;p&gt;This community of developers is really special in that the level of
expectations for developers new to OSS (open source software) is really a
low bar. Speaking from personal experience as a serial OSS attempter, when
I joined I noticed everyone treating each other with respect, a
willingness to teach, and a deliberate openness to new ideas. I interfaced
with engineers working at Starburst, the founders of Presto, and many
passionate developers like myself who also knew a thing or two about the
project that were so helpful to me. This was unlike other experiences I had
in the past that made joining an open source community an elite club that
only existing members had access to. To me, this inclusiveness is why the
presto community is thriving.&lt;/p&gt;

&lt;p&gt;The Presto community is most vibrant in &lt;a href=&quot;/slack.html&quot;&gt;the slack channel&lt;/a&gt;. Here users and
developers may ask questions such as installing and using presto, discussing
bug fixes or design changes, or sometimes just sharing great experiences or
news related to presto. This slack channel has recently grown to 2300 users
with around 500 active users at any given time.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/prestosql/status/1278393800092643328&quot; target=&quot;_blank&quot;&gt;
&lt;img src=&quot;/assets/blog/presto-eighth-birthday/presto-tweet.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To celebrate Presto really means to celebrate this community, and while we
can’t thank every individual who has contributed, we want to thank just a
handful of you for your hard work. Thanks to these engineers for their
contributions to the Presto project!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/ebyhr&quot;&gt;ebyhr&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/kasiafi&quot;&gt;kasiafi&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Praveen2112&quot;&gt;Praveen2112&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/phd3&quot;&gt;phd3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/lxynov&quot;&gt;lxynov&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/pettyjamesm&quot;&gt;pettyjamesm&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Lewuathe&quot;&gt;Lewuathe&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/raunaqmorarka&quot;&gt;raunaqmorarka&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/elonazoulay&quot;&gt;elonazoulay&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/luohao&quot;&gt;luohao&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While linking you to a blog post may not be a satisfactory thank you, the
gratitude is perhaps best &lt;a href=&quot;https://groups.google
.com/g/presto-users/c/647v2ckRyGA&quot;&gt;stated on the presto-users&lt;/a&gt; google group by co-creator Martin Traverso:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;When Dain, David, Eric and I started the project that many
years ago, we had the goal to make it open source and build a community
around it. What we never imagined was how far it would go, how widely it
would be adopted across the entire world, and how many amazing people we
would meet and get a chance to work with along the way.&lt;/p&gt;

  &lt;p&gt;Congratulations to everyone who played a part in that journey. It’s been a
great ride so far. Here’s to another 8 years!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thanks to everyone who has contributed to Presto, congratulations to the
founders for starting such an amazing project. Together let’s make Presto the
most useful analytics tool yet!&lt;/p&gt;</content>

      
        <author>
          <name>Brian Olsen</name>
        </author>
      

      <summary>Today, Presto turned eight years old! As Presto co-creator Dain Sundstrom points out, there’s a reason why the eighth birthday is a little special:</summary>

      
      
    </entry>
  
    <entry>
      <title>Understanding and Tuning Presto Query Processing with Martin</title>
      <link href="https://trino.io/blog/2020/07/30/training-query-tuning.html" rel="alternate" type="text/html" title="Understanding and Tuning Presto Query Processing with Martin" />
      <published>2020-07-30T00:00:00+00:00</published>
      <updated>2020-07-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/30/training-query-tuning</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/30/training-query-tuning.html">&lt;p&gt;With the help of &lt;a href=&quot;/blog/2020/07/15/training-advanced-sql.html&quot;&gt;David’s training about advanced SQL&lt;/a&gt; you composed a number of useful queries.
You gain valuable insights from the resulting data. However these complex
queries take time to run. If only you could make them run faster. I think we
have just what you need coming up.&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Understanding and Tuning Presto Query Processing&lt;/strong&gt;
with Martin Traverso.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We are delighted that such an advanced topic attracted close to 150 attendees.
Everyone learned a lot and many additional questions came up during class and in
the Q&amp;amp;A overtime. Take advantage of the slides and recording to recapture, or if
you could not attend:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/08/Presto-Training-Understanding-and-Tuning-Presto-Query-Processing.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/GcS02yTNwC0&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;In this training session Martin helps to understand how Presto executes query.
That knowledge can help you improve query performance. For example, the explain
plan is a powerful tool, but reading the plans and make sense of them can be
overwhelming. We explore how to create an explain plan for you query and how to
read it. We look at the work the cost-based optimizer performs and how you can
potentially help Presto run your queries even faster. This session covers to
following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Explain the EXPLAIN&lt;/li&gt;
  &lt;li&gt;Learn how queries are analyzed and executed&lt;/li&gt;
  &lt;li&gt;Understand what the optimizer does, including some of its limitations&lt;/li&gt;
  &lt;li&gt;Showcase the cost-based optimizer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 12 August 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/2VB9DZP&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>With the help of David’s training about advanced SQL you composed a number of useful queries. You gain valuable insights from the resulting data. However these complex queries take time to run. If only you could make them run faster. I think we have just what you need coming up. Join us for a free webinar Understanding and Tuning Presto Query Processing with Martin Traverso. Update: We are delighted that such an advanced topic attracted close to 150 attendees. Everyone learned a lot and many additional questions came up during class and in the Q&amp;amp;A overtime. Take advantage of the slides and recording to recapture, or if you could not attend: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto for Analytics at Pinterest</title>
      <link href="https://trino.io/blog/2020/07/22/presto-summit-pinterest.html" rel="alternate" type="text/html" title="Presto for Analytics at Pinterest" />
      <published>2020-07-22T00:00:00+00:00</published>
      <updated>2020-07-22T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/22/presto-summit-pinterest</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/22/presto-summit-pinterest.html">&lt;p&gt;After &lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Presto&lt;/a&gt; and the two
real world examples from &lt;a href=&quot;/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;Zuora&lt;/a&gt;
and &lt;a href=&quot;/blog/2020/07/06/presto-summit-arm-td.html&quot;&gt;Arm Treasure Data&lt;/a&gt;, I hope
you are ready to hear from a well known brand using Presto in their analytics
ecosystem – &lt;a href=&quot;https://www.pinterest.com&quot;&gt;Pinterest&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presto: A key component for analytics at Pinterest&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;Our webinar was well received and caused a whole bunch of questions. Check out
the slides and video recording:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/08/Presto-Summit-Webinar-Series-Presto-at-Pinterest.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/mZ59CTOPkl8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Join us to learn how Pinterest uses Presto to meet the company’s rapidly
increasing analytics need, while keeping the cost low.&lt;/p&gt;

&lt;p&gt;Presto plays an important role in Pinterest’s analytics ecosystem. Find out how
runs Presto at the company, how Pinterest leverages warning systems to guide
users to write better queries, and how Pinterest scales up their clusters to
meet with their rapid growing and complex workloads.&lt;/p&gt;

&lt;p&gt;The following topics are discussed:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto integrated with Pinterest infrastructure&lt;/li&gt;
  &lt;li&gt;Setup of a warning systems to guide users write better queries&lt;/li&gt;
  &lt;li&gt;Management of complex workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Speakers:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.linkedin.com/in/puchengy/&quot;&gt;Pucheng Yang&lt;/a&gt; is a software engineer
at Pinterest working on the Presto, SparkSQL and Hive query engines. He joined
the company two years ago as a new grad.&lt;/li&gt;
  &lt;li&gt;Yi He is a software engineer at Pinterest. Prior to Pinterest, he worked at
Facebook on Presto OLAP and query federation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 19 August 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/32FfRfm&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us and participating in the webinar
with their questions.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>After State of Presto and the two real world examples from Zuora and Arm Treasure Data, I hope you are ready to hear from a well known brand using Presto in their analytics ecosystem – Pinterest: Presto: A key component for analytics at Pinterest Update: Our webinar was well received and caused a whole bunch of questions. Check out the slides and video recording: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Advanced SQL in Presto with David</title>
      <link href="https://trino.io/blog/2020/07/15/training-advanced-sql.html" rel="alternate" type="text/html" title="Advanced SQL in Presto with David" />
      <published>2020-07-15T00:00:00+00:00</published>
      <updated>2020-07-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/15/training-advanced-sql</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/15/training-advanced-sql.html">&lt;p&gt;You have read our book &lt;a href=&quot;/blog/2020/04/11/the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;, practiced with various SQL examples, and
consulted our &lt;a href=&quot;https://trino.io/docs&quot;&gt;Presto documentation&lt;/a&gt;. Great steps to
become a Presto and SQL expert. However, learning efficient and advanced SQL can
take years of experience. Luckily we have some help from an expert coming your
way.&lt;/p&gt;

&lt;p&gt;Join us for a free webinar &lt;strong&gt;Advanced SQL in Presto&lt;/strong&gt; with David Phillips.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;With nearly 200 live attendees and a two hour session we ended with lots of
questions from the engaged audience. After 20 minutes overtime we wrapped up the
successful event. Check out the presentation slides and the recording:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2020/07/Presto-Training-Series-Advanced-SQL-Features-in-Presto.pdf&quot;&gt;Download the slides&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/HN_95ObHAiw&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;In our new &lt;a href=&quot;https://bit.ly/2NO26Cm&quot;&gt;Presto Training Series&lt;/a&gt; we give Presto users
an opportunity to learn advanced skills from the co-creators of Presto –
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and 
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;. Beyond the basics, each of the four 
training sessions covers critical topics for scaling Presto to more users and
use cases.&lt;/p&gt;

&lt;p&gt;Our first session with David is geared towards helping users understand how to
run more complex and comprehensive SQL queries with Presto. Delivered by David
Phillips, this session covers to following topics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Using JSON and other complex data types&lt;/li&gt;
  &lt;li&gt;Advanced aggregation techniques&lt;/li&gt;
  &lt;li&gt;Window functions&lt;/li&gt;
  &lt;li&gt;Array and map functions&lt;/li&gt;
  &lt;li&gt;Lambda expressions&lt;/li&gt;
  &lt;li&gt;Many other SQL functions and features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Wednesday, 29 July 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;p&gt;Duration: 2h&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/2YOtx5f&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>You have read our book Trino: The Definitive Guide, practiced with various SQL examples, and consulted our Presto documentation. Great steps to become a Presto and SQL expert. However, learning efficient and advanced SQL can take years of experience. Luckily we have some help from an expert coming your way. Join us for a free webinar Advanced SQL in Presto with David Phillips. Update: With nearly 200 live attendees and a two hour session we ended with lots of questions from the engaged audience. After 20 minutes overtime we wrapped up the successful event. Check out the presentation slides and the recording: Download the slides</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Migration at Arm Treasure Data</title>
      <link href="https://trino.io/blog/2020/07/06/presto-summit-arm-td.html" rel="alternate" type="text/html" title="Presto Migration at Arm Treasure Data" />
      <published>2020-07-06T00:00:00+00:00</published>
      <updated>2020-07-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/07/06/presto-summit-arm-td</id>
      <content type="html" xml:base="https://trino.io/blog/2020/07/06/presto-summit-arm-td.html">&lt;p&gt;Both events of our virtual Presto Summit tour event,
&lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Presto&lt;/a&gt; and the
&lt;a href=&quot;/blog/2020/06/16/presto-summit-zuora.html&quot;&gt;Zuora presentation&lt;/a&gt;
were well received and recordings are available for you to watch. Your next
chance to learn more about Presto in the real world comes from Arm Treasure
Data and is presented by Taro L. Saito:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presto at Arm Treasure Data: A Journey of Migrating 1 Million Presto Queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We had a great event with some in-depth, detailed questions from the audience.
Check out the recording to learn more:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/NGMugRsNraE&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Join us to discover how as part of their customer data platform, Arm Treasure
Data utilizes Presto as the query engine processing over 1 million queries per
day. This system supports the data business of over 500 companies in three
regions - US, EU, and Asia.&lt;/p&gt;

&lt;p&gt;Arm Treasure Data has been using Presto 0.205 and in 2019 started a big
migration project to Presto 317. Although they performed extensive query
simulations to check any incompatibilities, the team faced many unexpected challenges.
In this session you learn more about their migration of the production system:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Technical details on many challenges&lt;/li&gt;
  &lt;li&gt;Key lessons learned&lt;/li&gt;
  &lt;li&gt;Latest updates on AWS Graviton2, the next generation of 64-bit Arm instance
types that can be used for running Presto&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our speaker, Taro L. Saito, is a principal software engineer at Arm Treasure
Data and Ph.D. of computer science at the University of Tokyo. He has built a
cloud database service at Arm Treasure Data, which is processing over millions
of queries every day. Previously, he worked as an assistant professor at the
University of Tokyo, studying distributed database systems and their
applications to genome sciences. He has created several open-source projects,
including Airframe, MessagePack, and various sbt plugins (sbt-sonatype,
sbt-pack) for Scala that help to publish thousands of OSS projects.&lt;/p&gt;

&lt;p&gt;Date: Thursday, 16 July 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/38wrS80&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Both events of our virtual Presto Summit tour event, State of Presto and the Zuora presentation were well received and recordings are available for you to watch. Your next chance to learn more about Presto in the real world comes from Arm Treasure Data and is presented by Taro L. Saito: Presto at Arm Treasure Data: A Journey of Migrating 1 Million Presto Queries Update: We had a great event with some in-depth, detailed questions from the audience. Check out the recording to learn more:</summary>

      
      
    </entry>
  
    <entry>
      <title>Data Integrity Protection in Presto</title>
      <link href="https://trino.io/blog/2020/06/25/data-integrity-protection.html" rel="alternate" type="text/html" title="Data Integrity Protection in Presto" />
      <published>2020-06-25T00:00:00+00:00</published>
      <updated>2020-06-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/25/data-integrity-protection</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/25/data-integrity-protection.html">&lt;p&gt;It all started on an Thursday afternoon in March, when &lt;a href=&quot;https://github.com/sopel39&quot;&gt;Karol Sobczak&lt;/a&gt;
was grilling Presto with heavy rounds of benchmarks, as we were ramping up to Starburst Enterprise
Presto 332-e release. Karol discovered what seemed to be a serious regression, and turned out to be even more
serious Cloud environment issue.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-benchmarks&quot;&gt;Presto Benchmarks&lt;/h1&gt;

&lt;p&gt;At the Presto project, we take serious care of stability and efficiency, so releases undergo
rigorous performance benchmarks. The intention is to safe guard against any performance regressions
or stability problems. Usually, the performance improvements are benchmarked separately when they
are being added to the codebase. At Starburst, those benchmarks are even more important, especially
for the Starburst Enterprise Presto LTS releases.&lt;/p&gt;

&lt;p&gt;On a side note, we use &lt;a href=&quot;https://github.com/trinodb/benchto&quot;&gt;Benchto&lt;/a&gt; for organizing
&lt;a href=&quot;https://github.com/trinodb/trino/tree/master/presto-benchto-benchmarks&quot;&gt;Presto benchmark suites&lt;/a&gt;,
executing them and collecting the results. We use managed &lt;a href=&quot;https://kubernetes.io/&quot;&gt;Kubernetes&lt;/a&gt; in a public
cloud for provisioning Presto clusters, along with &lt;a href=&quot;https://www.starburst.io/platform/deployment-options/starburst-on-kubernetes/&quot;&gt;Starburst Enterprise Presto Kubernetes&lt;/a&gt;.
We use &lt;a href=&quot;https://jupyter.org/&quot;&gt;Jupyter&lt;/a&gt; for producing result reports in HTML and PDF formats.&lt;/p&gt;

&lt;h1 id=&quot;alleged-regression&quot;&gt;Alleged Regression&lt;/h1&gt;

&lt;p&gt;It all started in March, when &lt;a href=&quot;https://github.com/sopel39&quot;&gt;Karol Sobczak&lt;/a&gt;
was grilling Presto with heavy rounds of benchmarks for the Starburst Enterprise Presto 332-e release.
On one Thursday afternoon he reported stability problems, with few benchmark runs failing with
exceptions similar to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query failed (#20200326_150852_00338_dj225): Unknown block encoding:
LONG_ARRAY� � �� � @@@���� �@  @ � �@@@ @@� @�@D�� @@��@ `� @@� @#�@ � 0�
... (9550 more bytes)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In Presto, a block encoding is a way of encoding a particular Block type (here, a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LongArrayBlock&lt;/code&gt;).
They are used when exchanging blocks of data between Presto nodes, or in spill to disk.
Blocks form a polymorphic class hierarchy, so every time a block is encoded, we need
to also store the encoding identifier. The encoding identifier (here, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LONG_ARRAY&lt;/code&gt; string)
is written as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;string length&amp;gt;&lt;/code&gt; (4-byte, signed integer in little-endian) followed by
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;string bytes&amp;gt;&lt;/code&gt; containing the UTF-8 representation of the encoding id. Clearly, in the case above,
the receiver read the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;encoding id length&amp;gt;&lt;/code&gt; as 9623 instead of 10! How could that be ever possible?&lt;/p&gt;

&lt;p&gt;Presto 332 brought a lot of good changes and upgrade to Java 11 was one of them.
Therefore, Starburst Enterprise Presto 332-e was the first Starburst release using Java 11 by default.
For earlier releases, we ran benchmarks using AWS EC2 machines orchestrated with &lt;a href=&quot;https://www.starburst.io/platform/deployment-options/aws/&quot;&gt;Starburst’s Presto
CloudFormation Template (CFT)&lt;/a&gt;. This was also the first time we did
Presto release benchmarks running on Kubernetes clusters, with AWS EKS. We could suspect many different factors
as being the cause. We started to sift through the code, search team’s “collective brain” and
the Internet for any ideas. One of the important sources was Vijay Pandurangan’s writeup on &lt;a href=&quot;https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mesos-kubernetes-docker-containers-4986f88f7a19&quot;&gt;data
corruption bug discovered by Twitter in 2015&lt;/a&gt;. Of course, we also repeated benchmark runs. Seeing is believing.&lt;/p&gt;

&lt;h1 id=&quot;production-issues&quot;&gt;Production issues&lt;/h1&gt;

&lt;p&gt;On the next day, a customer reported similar problems with their Presto cluster. Of course, they
were not running a yet-to-be-released version that we were still benchmarking. They run into what seemed to
be a very serious regression in a Starburst Enterprise Presto 323-e release line. The customer was also using
the AWS cloud, but not the Kubernetes deployment. They were using &lt;a href=&quot;https://www.starburst.io/platform/deployment-options/aws/&quot;&gt;CFT-based deployment&lt;/a&gt;
– the same stack we were using for all our release benchmarks so far – and we had never run into issues like this before.
As the customer was using a fresh-off-press latest minor release, we decided (in spirit of global health care trend)
to “quarantine” that release and roll back the customer installation to the previous version.&lt;/p&gt;

&lt;p&gt;However, the fact that a small bug fix release triggered data problems was unnerving. The fact that we
did not discover any of these problems before, was even more unnerving.&lt;/p&gt;

&lt;h1 id=&quot;more-testing--the-data-corruption&quot;&gt;More testing – the data corruption&lt;/h1&gt;

&lt;p&gt;As we were running more and more, and even more test runs, we discovered new failure modes.
For example:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query failed (#20200327_001931_00020_8di4r): Cannot cast DECIMAL(7, 2) &apos;18734974449861284.67&apos; to DECIMAL(12, 2)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Well, this message is not &lt;em&gt;wrong&lt;/em&gt;. It’s not possible to cast &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;18734974449861284.67&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(12, 2)&lt;/code&gt;.
Except that it is &lt;em&gt;also&lt;/em&gt; not possible to have a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL(7, 2)&lt;/code&gt; with such value. Something wrong happened to the
data. At that moment, we realized the problem was very serious, because data could become corrupted.
This corrupted data could lead to a failure (like above), but it could also lead to incorrect query results,
or incorrect data being persisted (in case of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS&lt;/code&gt; queries). We created
a virtual War Room (that is, a Slack channel), got together all Presto experts and our experienced field team
to discuss potential causes, further diagnostics and mitigation strategies.&lt;/p&gt;

&lt;p&gt;Since the problem was affecting data exchanges between Presto nodes, we listed the following strategies
to try to dissect the problem:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;determining which query (queries) is (are) causing failures,&lt;/li&gt;
  &lt;li&gt;running with HTTP/2,&lt;/li&gt;
  &lt;li&gt;reverting to running on Java 8,&lt;/li&gt;
  &lt;li&gt;enabling exchange compression (as decompression is very sensitive to data corruption),&lt;/li&gt;
  &lt;li&gt;trying to upgrade Jetty,&lt;/li&gt;
  &lt;li&gt;determining whether failures correlate with JVM GC activity,&lt;/li&gt;
  &lt;li&gt;inspecting the source code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;different-configuration&quot;&gt;Different configuration&lt;/h1&gt;

&lt;p&gt;We were able to quickly prototype and verify some of the ideas. Switching to HTTP/2 or
upgrading Jetty to the latest version did not help. Nor did downgrading to Jetty version
that had been using for a long time. We also verified that problem was reproducible with Java 8,
so we concluded Java 11 was not the cause of it.&lt;/p&gt;

&lt;h1 id=&quot;checksums&quot;&gt;Checksums&lt;/h1&gt;

&lt;p&gt;We identified the problem occurs somewhere within exchanges, between one Presto worker
node serializing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; object (basic unit of data processing in Presto) and another node
deserializing it.&lt;/p&gt;

&lt;p&gt;While decimal cast failure didn’t directly point at the data corruption problem (there could
be many other reasons for it), there was no other explanation for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Unknown block encoding&lt;/code&gt; exceptions.
The serialization is done in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesSerde.serialize&lt;/code&gt; (used by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TaskOutputOperator&lt;/code&gt;, the data sender) and
deserialization is done in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesSerde.deserialize&lt;/code&gt; (used by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ExchangeOperator&lt;/code&gt;, the
receiver of the data). As the logic is nicely encapsulated in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PagesSerde&lt;/code&gt; class, we
added checksums to the serialized data: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;checksum&amp;gt; &amp;lt;serialized page&amp;gt;&lt;/code&gt;.
This felt like a smart move – except that it gave us nothing more than a confirmation that
there is a problem (“checksum failure”).
This we already knew.&lt;/p&gt;

&lt;p&gt;We considered adding logging to capture data going out from one node and going in on
another node, but that would be huge amount of logs. One run of benchmarks transfers
hundreds of terabytes of data between the nodes.&lt;/p&gt;

&lt;p&gt;We went ahead and created a Presto build that added data redundancy to be able to reconstruct
the data on the receiving side.
There are many &lt;a href=&quot;https://en.wikipedia.org/wiki/Erasure_code&quot;&gt;well-known error-correction codes&lt;/a&gt;
(e.g. &lt;a href=&quot;https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction&quot;&gt;Reed–Solomon error correction&lt;/a&gt;
available in Hadoop 3). In our case, speed of &lt;em&gt;implementation&lt;/em&gt; (a.k.a. simplicity) was a deciding factor,
so we added data mirroring: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;checksum&amp;gt; &amp;lt;serialized page&amp;gt; &amp;lt;serialized page&amp;gt;&lt;/code&gt;.
In order to avoid logging of all the data exchanges, we added the deserialized pages (both copies)
to the exceptions being raised.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;java.sql.SQLException: Query failed (#20200401_113622_00676_p7qp7): Hash mismatch, read: 1251072184702746109, calculated: 7591448164918409110
    Suppressed: java.lang.RuntimeException: Slice, first half: 040000000A0000004C4F4E475F415252.... (945 kilobytes)
    Suppressed: java.lang.RuntimeException: Slice, secnd half: 040000000A0000004C4F4E475F415252.... (945 kilobytes)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The exception told us the first part was changed, since read checksum did not match the calculated
checksum (it was calculated based on the first copy of the data and was different than the checksum
calculated on the sending side).
Having the encoded data in the exception like that, it was easy to extract the actual data and compare,
so now we could see &lt;em&gt;how&lt;/em&gt; the data was changed.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cat failure.txt | grep &apos;Slice, first half&apos; | cut -d: -f4- | sed &apos;s/^ *//&apos; | xxd -r -p &amp;gt; changed
cat failure.txt | grep &apos;Slice, secnd half&apos; | cut -d: -f4- | sed &apos;s/^ *//&apos; | xxd -r -p &amp;gt; original
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Comparing binary files is fun, but in practice it can be more convenient to compare &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hexdump&lt;/code&gt; output.
The output below was created with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vimdiff &amp;lt;(hexdump -Cv original) &amp;lt;(hexdump -Cv changed)&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;++--6064 lines: 00000000  04 00 00 00 0a 00 00 00  4c 4f 4...|+ +--6064 lines: 00000000  04 00 00 00 0a 00 00...
 00017b00  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00
 00017b10  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00
 00017b20  00 cb 6a 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b30  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b40  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b50  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b60  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  e1 67 25 00 00 00 00 00
 00017b70  00 e1 67 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  e1 67 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017b80  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017b90  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017ba0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bb0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bc0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bd0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017be0  00 fb 69 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017bf0  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c00  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c10  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c20  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c30  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c40  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c50  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c60  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c70  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c80  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c90  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017ca0  00 34 68 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  34 68 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cb0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cc0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cd0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017ce0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cf0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017d00  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017d10  00 2e 6b 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d20  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d30  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d40  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d50  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d60  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d70  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d80  00 cf 68 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017d90  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017da0  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017db0  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017dc0  00 6b 69 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017dd0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017de0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017df0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e00  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e10  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e20  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e30  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e40  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e50  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e60  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e70  00 a9 66 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017e80  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  fb 67 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017e90  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  fb 67 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017ea0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017eb0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ec0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ed0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ee0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ef0  00 fb 67 25 00 00 00 00  00 5e 6b 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 5e 6b 25 00 00 00 00
++--23429 lines: 00017f00  00 5e 6b 25 00 00 00 00  00 5e ...|+ +--23429 lines: 00017f00  00 5e 6b 25 00 00 0...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is perhaps no surprise that 0 bytes occupied a lot of the data transfer. For performance reasons,
Presto uses fixed-length representation for fixed-length data types, such as integers or decimals.
Compressing data for the sake of network exchanges makes sense, if your network is saturated and
CPU is not, and is off by default. If we replace 0 bytes with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__&lt;/code&gt;, we see that the difference
between original (left) and changed (right) is pretty interesting: it looks like one 0 byte was
shifted from offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00017b60+5&lt;/code&gt; (approximately) to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;00017e90+12&lt;/code&gt; (approximately).
This is very unusual data change. We got other failure samples showing similar data changes,
with varying offset numbers.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;++--6064 lines: 00000000  04 00 00 00 0a 00 00 00  4c 4f 4...|+ +--6064 lines: 00000000  04 00 00 00 0a 00 00...
 00017b00  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __
 00017b10  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __
 00017b20  __ cb 6a 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b30  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b40  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b50  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b60  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  e1 67 25 __ __ __ __ __
 00017b70  __ e1 67 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  e1 67 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017b80  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017b90  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017ba0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bb0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bc0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bd0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017be0  __ fb 69 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017bf0  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c00  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c10  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c20  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c30  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c40  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c50  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c60  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c70  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c80  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c90  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017ca0  __ 34 68 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  34 68 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cb0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cc0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cd0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017ce0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cf0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017d00  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017d10  __ 2e 6b 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d20  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d30  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d40  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d50  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d60  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d70  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d80  __ cf 68 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017d90  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017da0  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017db0  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017dc0  __ 6b 69 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017dd0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017de0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017df0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e00  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e10  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e20  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e30  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e40  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e50  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e60  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e70  __ a9 66 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017e80  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  fb 67 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017e90  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  fb 67 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017ea0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017eb0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ec0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ed0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ee0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ef0  __ fb 67 25 __ __ __ __  __ 5e 6b 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ 5e 6b 25 __ __ __ __
++--23429 lines: 00017f00  00 5e 6b 25 00 00 00 00  00 5e ...|+ +--23429 lines: 00017f00  00 5e 6b 25 00 00 00...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;outside-of-presto&quot;&gt;Outside of Presto&lt;/h1&gt;

&lt;p&gt;We captured a cluster of 10 nodes manifesting the problem and hold on to it in further investigation.
Our testing showed that TPC-DS query 72 is significantly more likely to fail than other queries.
On the isolated cluster, a loop running TPC-DS query 72 would reproduce a failure within 2 hours.
We added additional information in the exception reporting checksum failure, to identify on which
node the failure happens and which node is the sender of the data. For all the failures on the isolated
10-node cluster, the failure would always happen with one worker node (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10.83.28.124&lt;/code&gt;, the Receiver) reading data
from certain other worker node (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10.142.0.84&lt;/code&gt;, the Sender). We stopped all other workers and attempted to
reproduce the problem outside of Presto.&lt;/p&gt;

&lt;p&gt;One of the things we tried was checking the network reliability with netcat.
On the Sender node, we ran the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;dd if=/dev/urandom of=/tmp/small-data bs=$[1024*1024] count=1
ncat -l 20165 --keep-open --max-conns 100 --sh-exec &quot;cat /tmp/small-data&quot; -v
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the Receiver node we run the following in a loop:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ncat --recv-only 10.142.0.84 20165 &amp;gt; &quot;/tmp/received&quot;
sha1sum &quot;/tmp/received&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Running this in a loop for just a few dozens of seconds resulted in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/received&lt;/code&gt; different
than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/small-data&lt;/code&gt;. Sometimes the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/received&lt;/code&gt; would be “just” a prefix of the original data
and sometimes there would be data displacements within the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/received&lt;/code&gt; file. We cross-checked these
observations on a different pair of nodes and also on a different public cloud, using same netcat version.
We observed the same behavior everywhere we checked it, with varying, but high error rate, over 1%. This high
error rate was what led us to discard this evidence – there was either something wrong with the way we
used netcat, we violated netcat’s assumptions or netcat was not the right tool for this task.&lt;/p&gt;

&lt;p&gt;We searched for other tools that we could use. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iperf&lt;/code&gt; is a well-known tool for stressing out the network.
Sadly, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iperf&lt;/code&gt; &lt;a href=&quot;https://github.com/esnet/iperf/issues/157&quot;&gt;does not have an ability to verify exchanged data integrity yet&lt;/a&gt;.
We deployed a &lt;a href=&quot;https://github.com/findepi/netsum&quot;&gt;home-made, Java-based tool&lt;/a&gt; instead. using this tool
we were able to reproduce the data corruption problem between Sender and Receiver nodes. The error rate
was very low. To reproduce the problem we had to saturate the network and use multiple concurrent TCP connections
(which is very similar to how Presto uses the network). This validated our
observations that the data corruption problem was happening outside of Presto. Interestingly, we were unable
to reproduce the problem when stressing the network with a single TCP connection.&lt;/p&gt;

&lt;h1 id=&quot;mystery-unsolved&quot;&gt;Mystery unsolved&lt;/h1&gt;

&lt;p&gt;Obviously, with such a strong evidence gathered so far, we opened a support ticket with AWS.
The support team was great and did a lot of investigation on their own. Unfortunately, the problem went
away before the support team was able to get to the bottom of it. It was April already.
Perhaps, one day someone will find the smoking gun and write the rest of this story.&lt;/p&gt;

&lt;h1 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h1&gt;

&lt;p&gt;We implemented data integrity protection measure in Presto. We used &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso’s&lt;/a&gt;
Java implementation of the &lt;a href=&quot;https://github.com/Cyan4973/xxHash&quot;&gt;XXHash64&lt;/a&gt; algorithm. Thanks to its
speed, we could enable it by default, with negligible impact on overall query performance.
By default, data integrity violation results in query failure, but Presto can be configured to retry as well,
by setting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;exchange.data-integrity-verification&lt;/code&gt; configuration property.&lt;/p&gt;

&lt;p&gt;This chapter of the Presto history should remain closed and we should be able to forget about all this.
However, a couple days ago, a customer running Presto on Azure Kubernetes Service (AKS) reported an exception like
the one below. On the next day, we bumped into this as well. We were doing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE TABLE AS SELECT&lt;/code&gt;
to prepare a new benchmark dataset on Azure Storage.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Query failed (#20200622_124803_00000_abcde): Checksum verification failure on 10.12.3.47
    when reading from http://10.12.3.53:8080/v1/task/20200622_124803_00000_abcde.2.6/results/5/8:
    Data corruption, read checksum: 0xe17e6eaeb665dc6e, calculated checksum: 0xb3540697373195f1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is no fun when a query fails like this. However – what a joy and pride that it did not silently
return incorrect query results. Rest assured, Presto will not return incorrect results, wherever you
run it.&lt;/p&gt;

&lt;h1 id=&quot;credits&quot;&gt;Credits&lt;/h1&gt;

&lt;p&gt;Special thanks go to our customers, for your understanding and the trust you have in us.
Without you, Starburst wouldn’t be as fun place as it is!
Thanks to &lt;a href=&quot;https://github.com/lukasz-walkiewicz&quot;&gt;Łukasz Walkiewicz&lt;/a&gt; and &lt;a href=&quot;https://github.com/sopel39&quot;&gt;Karol Sobczak&lt;/a&gt;
for fantastic benchmark and experimentation automation and your help with running the experiments!
Thanks to &lt;a href=&quot;https://github.com/willmostly&quot;&gt;Will Morrison&lt;/a&gt; for finding the Sender and Receiver machines
that reproduced the problem so nicely!
Thanks to &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;, &lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;
and &lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt; for guidance, ideas, clever tips and code pointers!
Thanks to &lt;a href=&quot;https://github.com/losipiuk&quot;&gt;Łukasz Osipiuk&lt;/a&gt; for running experiments, cross-checking
the results and helping keep sanity. Shout out to the whole Starburst team – it was truly a team’s work!&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>It all started on an Thursday afternoon in March, when Karol Sobczak was grilling Presto with heavy rounds of benchmarks, as we were ramping up to Starburst Enterprise Presto 332-e release. Karol discovered what seemed to be a serious regression, and turned out to be even more serious Cloud environment issue.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto at Zuora</title>
      <link href="https://trino.io/blog/2020/06/16/presto-summit-zuora.html" rel="alternate" type="text/html" title="Presto at Zuora" />
      <published>2020-06-16T00:00:00+00:00</published>
      <updated>2020-06-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/16/presto-summit-zuora</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/16/presto-summit-zuora.html">&lt;p&gt;The Presto Summit is morphing into a series of virtual events, and we already
started with the &lt;a href=&quot;/blog/2020/05/15/state-of-presto.html&quot;&gt;State of Presto webinar&lt;/a&gt; recently. Next up is a talk about Presto with
lots of practical insights at &lt;a href=&quot;https://zuora.com/&quot;&gt;Zuora&lt;/a&gt; presented by Henning
Schmiedehausen:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using Presto as Query Layer in a Distributed Microservices Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We had a great event with lots of questions from the audience, taking us beyond
the planned time frame. Check out the recording to learn more:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/ICAPZksjP0k&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Presto has found its place as a SQL-based query engine for big data in the new
stack, but it does not have to be limited to big data and large scale analytics
applications.&lt;/p&gt;

&lt;p&gt;In this presentation, Henning highlights how Presto helped Zuora to transform
its monolithic data architecture for an online transactional system into a
loosely coupled, services-based architecture. In doing so it helped to solve the
most pressing problem when splitting up data, providing direct to access
production data across many services and enabling complex data queries across
live data. Zuora Data Query was an instant success when it was launched.&lt;/p&gt;

&lt;p&gt;In this webinar you discover:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The technical architecture that embedded Presto in the Zuora service stack&lt;/li&gt;
  &lt;li&gt;The pieces of Presto that could be used directly off the shelf&lt;/li&gt;
  &lt;li&gt;How we productized it into a system that now serves huge numbers of small
queries against live data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our speaker, Henning Schmiedehausen, Chief Architect at Zuora, is a thought
leader in the open source Java community with more than 25 years of experience
contributing to successful open source projects. At Zuora he serves as the chief
architect and is responsible for the technical aspects of transforming the Zuora
system to a new, scalable, and flexible Microservices Architecture. Prior to
Zuora he worked at Facebook and Groupon as a principal engineer. Henning also
served as a board member at the Apache Software Foundation&lt;/p&gt;

&lt;p&gt;Date: Tuesday, 30 June 2020&lt;/p&gt;

&lt;p&gt;Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://bit.ly/2YfPNne&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many Presto users joining us.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>The Presto Summit is morphing into a series of virtual events, and we already started with the State of Presto webinar recently. Next up is a talk about Presto with lots of practical insights at Zuora presented by Henning Schmiedehausen: Using Presto as Query Layer in a Distributed Microservices Architecture Update: We had a great event with lots of questions from the audience, taking us beyond the planned time frame. Check out the recording to learn more:</summary>

      
      
    </entry>
  
    <entry>
      <title>Dynamic partition pruning</title>
      <link href="https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html" rel="alternate" type="text/html" title="Dynamic partition pruning" />
      <published>2020-06-14T00:00:00+00:00</published>
      <updated>2020-06-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/14/dynamic-partition-pruning</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html">&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Star_schema&quot;&gt;Star-schema&lt;/a&gt; is one of the most widely used data mart patterns. 
The star schema consists of fact tables (usually partitioned) and dimension tables, 
which are used to filter rows from fact tables.
Consider the following query which captures a common pattern of a fact table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; partitioned by the column 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ss_sold_date_sk&lt;/code&gt; joined with a filtered dimension table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_dim&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT COUNT(*) FROM 
store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
WHERE d_following_holiday=&apos;Y&apos; AND d_year = 2000;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without dynamic filtering, Presto will push predicates for the dimension table to the table scan on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_dim&lt;/code&gt; but 
it will scan all the data in the fact table since there are no filters on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; in the query.
The join operator will end up throwing away most of the probe-side rows as the join criteria is highly selective. 
The current implementation of &lt;a href=&quot;https://trino.io/blog/2019/06/30/dynamic-filtering.html&quot;&gt;dynamic filtering&lt;/a&gt; improves
on this, however it is limited only to broadcast joins on tables stored in ORC or Parquet format. 
Additionally, it does not take advantage of the layout of partitioned Hive tables.&lt;/p&gt;

&lt;p&gt;With dynamic partition pruning, which extends the current implementation of dynamic filtering, every worker node collects 
values eligible for the join from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_dim.d_date_sk&lt;/code&gt; column and passes it to the coordinator. 
Coordinator can then skip processing of the partitions of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; which don’t meet the join criteria. 
This greatly reduces the amount of data scanned from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; table by worker nodes. 
This optimization is applicable to any storage format and to both broadcast and partitioned join.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;design-considerations&quot;&gt;Design considerations&lt;/h1&gt;

&lt;p&gt;This optimization requires dynamic filters collected by worker nodes to be communicated to the coordinator over the network.
We needed to ensure that this additional communication overhead does not overload the coordinator.
This was achieved by packing dynamic filters into Presto’s existing framework for sending status updates from worker to coordinator.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/server/DynamicFilterService.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterService&lt;/code&gt;&lt;/a&gt; 
was added on the coordinator node to perform dynamic filter collection asynchronously.
Queries registered with this service can request dynamic filters while scheduling splits without blocking any operations.
This service is also responsible for ensuring that all the build-side tasks of a join stage have completed execution before 
constructing dynamic filters to be used in the scheduling of probe-side table scans by the coordinator.&lt;/p&gt;

&lt;h1 id=&quot;implementation&quot;&gt;Implementation&lt;/h1&gt;

&lt;p&gt;For identifying opportunities for dynamic filtering in the logical plan, we rely on the implementation added in
&lt;a href=&quot;https://github.com/trinodb/trino/pull/91&quot;&gt;#91&lt;/a&gt;. Dynamic filters are modeled as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FunctionCall&lt;/code&gt; expressions which 
evaluate to a boolean value. They are created in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PredicatePushDown&lt;/code&gt; optimizer rule from the equi-join clauses of inner join 
nodes and pushed down in the plan along with other predicates. Dynamic filters are added to the plan after the cost-based 
optimization rules. This ensures that dynamic filters do not interfere with cost estimation and join reordering.
The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PredicatePushDown&lt;/code&gt; rule can end up pushing dynamic filters to unsupported places in the plan via inferencing. 
This was solved by adding the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveUnsupportedDynamicFilters.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RemoveUnsupportedDynamicFilters&lt;/code&gt;&lt;/a&gt;
optimizer rule which is responsible for ensuring that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dynamic filters are present only directly above a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt; node and only if the subtree is on the probe side of some downstream &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JoinNode&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Dynamic filters are removed from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JoinNode&lt;/code&gt; if there is no consumer for it on its probe side subtree.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also run &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/sanity/DynamicFiltersChecker.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFiltersChecker&lt;/code&gt;&lt;/a&gt;
at the end of the planning phase to ensure that the above conditions have been satisfied by the optimized plan.&lt;/p&gt;

&lt;p&gt;We reuse the existing &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/operator/DynamicFilterSourceOperator.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterSourceOperator&lt;/code&gt;&lt;/a&gt;
in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LocalExecutionPlanner&lt;/code&gt; to collect build-side values from each inner join on each worker node. In addition to passing the collected &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TupleDomain&lt;/code&gt;
to &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LocalDynamicFiltersCollector&lt;/code&gt;&lt;/a&gt; 
within the same worker node for use in broadcast join probe-side scans, we also pass them to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TaskContext&lt;/code&gt; to populate task 
status updates for the coordinator.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ContinuousTaskStatusFetcher&lt;/code&gt; on the coordinator node pulls task status updates from all worker nodes up to every
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;task.status-refresh-max-wait&lt;/code&gt; seconds (default is 1 second) or less (if task status changes). &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterService&lt;/code&gt; 
on the coordinator regularly polls for dynamic filters from task status updates through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SqlQueryExecution&lt;/code&gt; and provides
an interface to supply dynamic filters when they are ready. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ConnectorSplitManager#getSplits&lt;/code&gt; API has been updated to
optionally utilize dynamic filters supplied by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DynamicFilterService&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the Hive connector, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundHiveSplitLoader&lt;/code&gt; can apply dynamic filtering by either completely skipping the listing
of files within a partition, or by avoiding the creation of splits within a loaded partition if the dynamic filters 
become available in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InternalHiveSplitFactory#createInternalHiveSplit&lt;/code&gt; due to lazy enumeration of splits.&lt;/p&gt;

&lt;h1 id=&quot;benchmarks&quot;&gt;Benchmarks&lt;/h1&gt;

&lt;p&gt;We ran TPC-DS queries on 5 worker nodes cluster of r4.8xlarge machines using data stored in ORC format.
TPC-DS tables were partitioned as:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;catalog_returns&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cr_returned_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;catalog_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cs_sold_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_returns&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sr_returned_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ss_sold_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;web_returns&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wr_returned_date_sk&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;web_sales&lt;/code&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ws_sold_date_sk&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/hdinsight/tpcds-hdinsight/blob/master/ddl/createAllORCTables.hql&quot;&gt;createAllORCTables.hql&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following queries ran faster by more than 20% with dynamic partition pruning (measuring the elapsed time in seconds,
 CPU time in minutes and Data read in MB).&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Query&lt;/th&gt;
      &lt;th&gt;Baseline elapsed&lt;/th&gt;
      &lt;th&gt;Dynamic partition pruning elapsed&lt;/th&gt;
      &lt;th&gt;Baseline CPU&lt;/th&gt;
      &lt;th&gt;Dynamic partition pruning CPU&lt;/th&gt;
      &lt;th&gt;Baseline data read&lt;/th&gt;
      &lt;th&gt;Dynamic partition pruning data read&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;q01&lt;/td&gt;
      &lt;td&gt;10.96&lt;/td&gt;
      &lt;td&gt;8.50&lt;/td&gt;
      &lt;td&gt;10.2&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;17.91&lt;/td&gt;
      &lt;td&gt;14.53&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q04&lt;/td&gt;
      &lt;td&gt;21.63&lt;/td&gt;
      &lt;td&gt;10.80&lt;/td&gt;
      &lt;td&gt;23.6&lt;/td&gt;
      &lt;td&gt;16.1&lt;/td&gt;
      &lt;td&gt;34.81&lt;/td&gt;
      &lt;td&gt;12.99&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q05&lt;/td&gt;
      &lt;td&gt;41.38&lt;/td&gt;
      &lt;td&gt;14.94&lt;/td&gt;
      &lt;td&gt;57.1&lt;/td&gt;
      &lt;td&gt;16.8&lt;/td&gt;
      &lt;td&gt;54.81&lt;/td&gt;
      &lt;td&gt;11.45&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q07&lt;/td&gt;
      &lt;td&gt;12.35&lt;/td&gt;
      &lt;td&gt;9.26&lt;/td&gt;
      &lt;td&gt;26.4&lt;/td&gt;
      &lt;td&gt;14.6&lt;/td&gt;
      &lt;td&gt;30.28&lt;/td&gt;
      &lt;td&gt;17.31&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q08&lt;/td&gt;
      &lt;td&gt;10.48&lt;/td&gt;
      &lt;td&gt;6.43&lt;/td&gt;
      &lt;td&gt;11.0&lt;/td&gt;
      &lt;td&gt;4.7&lt;/td&gt;
      &lt;td&gt;10.19&lt;/td&gt;
      &lt;td&gt;3.52&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q11&lt;/td&gt;
      &lt;td&gt;20.04&lt;/td&gt;
      &lt;td&gt;14.82&lt;/td&gt;
      &lt;td&gt;35.6&lt;/td&gt;
      &lt;td&gt;27.8&lt;/td&gt;
      &lt;td&gt;25.37&lt;/td&gt;
      &lt;td&gt;9.72&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q17&lt;/td&gt;
      &lt;td&gt;24.05&lt;/td&gt;
      &lt;td&gt;9.87&lt;/td&gt;
      &lt;td&gt;26.4&lt;/td&gt;
      &lt;td&gt;12.0&lt;/td&gt;
      &lt;td&gt;30.18&lt;/td&gt;
      &lt;td&gt;9.75&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q18&lt;/td&gt;
      &lt;td&gt;13.98&lt;/td&gt;
      &lt;td&gt;6.00&lt;/td&gt;
      &lt;td&gt;17.5&lt;/td&gt;
      &lt;td&gt;7.7&lt;/td&gt;
      &lt;td&gt;20.29&lt;/td&gt;
      &lt;td&gt;8.81&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q25&lt;/td&gt;
      &lt;td&gt;18.91&lt;/td&gt;
      &lt;td&gt;8.04&lt;/td&gt;
      &lt;td&gt;26.9&lt;/td&gt;
      &lt;td&gt;9.1&lt;/td&gt;
      &lt;td&gt;37.54&lt;/td&gt;
      &lt;td&gt;11.12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q27&lt;/td&gt;
      &lt;td&gt;11.98&lt;/td&gt;
      &lt;td&gt;5.58&lt;/td&gt;
      &lt;td&gt;25.1&lt;/td&gt;
      &lt;td&gt;8.6&lt;/td&gt;
      &lt;td&gt;26.69&lt;/td&gt;
      &lt;td&gt;10.12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q29&lt;/td&gt;
      &lt;td&gt;24.11&lt;/td&gt;
      &lt;td&gt;15.46&lt;/td&gt;
      &lt;td&gt;30.5&lt;/td&gt;
      &lt;td&gt;18.5&lt;/td&gt;
      &lt;td&gt;30.18&lt;/td&gt;
      &lt;td&gt;13.50&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q31&lt;/td&gt;
      &lt;td&gt;27.81&lt;/td&gt;
      &lt;td&gt;12.77&lt;/td&gt;
      &lt;td&gt;48.2&lt;/td&gt;
      &lt;td&gt;21.3&lt;/td&gt;
      &lt;td&gt;39.53&lt;/td&gt;
      &lt;td&gt;13.73&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q32&lt;/td&gt;
      &lt;td&gt;11.51&lt;/td&gt;
      &lt;td&gt;8.15&lt;/td&gt;
      &lt;td&gt;12.7&lt;/td&gt;
      &lt;td&gt;10.3&lt;/td&gt;
      &lt;td&gt;15.05&lt;/td&gt;
      &lt;td&gt;12.76&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q33&lt;/td&gt;
      &lt;td&gt;15.95&lt;/td&gt;
      &lt;td&gt;4.31&lt;/td&gt;
      &lt;td&gt;24.3&lt;/td&gt;
      &lt;td&gt;5.4&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;6.67&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q35&lt;/td&gt;
      &lt;td&gt;15.10&lt;/td&gt;
      &lt;td&gt;5.22&lt;/td&gt;
      &lt;td&gt;13.8&lt;/td&gt;
      &lt;td&gt;6.2&lt;/td&gt;
      &lt;td&gt;4.83&lt;/td&gt;
      &lt;td&gt;1.70&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q36&lt;/td&gt;
      &lt;td&gt;11.68&lt;/td&gt;
      &lt;td&gt;6.43&lt;/td&gt;
      &lt;td&gt;22.4&lt;/td&gt;
      &lt;td&gt;11.4&lt;/td&gt;
      &lt;td&gt;24.28&lt;/td&gt;
      &lt;td&gt;12.78&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q38&lt;/td&gt;
      &lt;td&gt;21.08&lt;/td&gt;
      &lt;td&gt;16.20&lt;/td&gt;
      &lt;td&gt;39.4&lt;/td&gt;
      &lt;td&gt;31.6&lt;/td&gt;
      &lt;td&gt;5.65&lt;/td&gt;
      &lt;td&gt;3.15&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q40&lt;/td&gt;
      &lt;td&gt;37.40&lt;/td&gt;
      &lt;td&gt;11.98&lt;/td&gt;
      &lt;td&gt;37.7&lt;/td&gt;
      &lt;td&gt;8.4&lt;/td&gt;
      &lt;td&gt;17.02&lt;/td&gt;
      &lt;td&gt;9.20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q46&lt;/td&gt;
      &lt;td&gt;11.57&lt;/td&gt;
      &lt;td&gt;9.06&lt;/td&gt;
      &lt;td&gt;24.4&lt;/td&gt;
      &lt;td&gt;17.3&lt;/td&gt;
      &lt;td&gt;18.51&lt;/td&gt;
      &lt;td&gt;14.19&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q48&lt;/td&gt;
      &lt;td&gt;20.48&lt;/td&gt;
      &lt;td&gt;12.65&lt;/td&gt;
      &lt;td&gt;42.3&lt;/td&gt;
      &lt;td&gt;22.5&lt;/td&gt;
      &lt;td&gt;20.71&lt;/td&gt;
      &lt;td&gt;11.54&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q49&lt;/td&gt;
      &lt;td&gt;26.69&lt;/td&gt;
      &lt;td&gt;16.01&lt;/td&gt;
      &lt;td&gt;38.8&lt;/td&gt;
      &lt;td&gt;12.0&lt;/td&gt;
      &lt;td&gt;68.67&lt;/td&gt;
      &lt;td&gt;30.57&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q50&lt;/td&gt;
      &lt;td&gt;46.90&lt;/td&gt;
      &lt;td&gt;33.22&lt;/td&gt;
      &lt;td&gt;43.4&lt;/td&gt;
      &lt;td&gt;42.5&lt;/td&gt;
      &lt;td&gt;21.30&lt;/td&gt;
      &lt;td&gt;16.77&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q54&lt;/td&gt;
      &lt;td&gt;43.05&lt;/td&gt;
      &lt;td&gt;11.39&lt;/td&gt;
      &lt;td&gt;27.5&lt;/td&gt;
      &lt;td&gt;14.8&lt;/td&gt;
      &lt;td&gt;17.71&lt;/td&gt;
      &lt;td&gt;11.52&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q56&lt;/td&gt;
      &lt;td&gt;16.23&lt;/td&gt;
      &lt;td&gt;4.12&lt;/td&gt;
      &lt;td&gt;23.8&lt;/td&gt;
      &lt;td&gt;5.5&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;6.72&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q60&lt;/td&gt;
      &lt;td&gt;16.39&lt;/td&gt;
      &lt;td&gt;6.02&lt;/td&gt;
      &lt;td&gt;25.1&lt;/td&gt;
      &lt;td&gt;6.6&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;7.42&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q61&lt;/td&gt;
      &lt;td&gt;17.18&lt;/td&gt;
      &lt;td&gt;5.50&lt;/td&gt;
      &lt;td&gt;33.4&lt;/td&gt;
      &lt;td&gt;7.1&lt;/td&gt;
      &lt;td&gt;42.63&lt;/td&gt;
      &lt;td&gt;9.37&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q66&lt;/td&gt;
      &lt;td&gt;13.67&lt;/td&gt;
      &lt;td&gt;6.59&lt;/td&gt;
      &lt;td&gt;19.1&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;19.63&lt;/td&gt;
      &lt;td&gt;8.34&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q69&lt;/td&gt;
      &lt;td&gt;9.89&lt;/td&gt;
      &lt;td&gt;7.46&lt;/td&gt;
      &lt;td&gt;10.5&lt;/td&gt;
      &lt;td&gt;6.1&lt;/td&gt;
      &lt;td&gt;4.83&lt;/td&gt;
      &lt;td&gt;3.16&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q71&lt;/td&gt;
      &lt;td&gt;17.32&lt;/td&gt;
      &lt;td&gt;6.11&lt;/td&gt;
      &lt;td&gt;23.3&lt;/td&gt;
      &lt;td&gt;6.6&lt;/td&gt;
      &lt;td&gt;31.26&lt;/td&gt;
      &lt;td&gt;8.06&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q74&lt;/td&gt;
      &lt;td&gt;16.86&lt;/td&gt;
      &lt;td&gt;9.44&lt;/td&gt;
      &lt;td&gt;24.1&lt;/td&gt;
      &lt;td&gt;17.6&lt;/td&gt;
      &lt;td&gt;22.59&lt;/td&gt;
      &lt;td&gt;8.08&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q75&lt;/td&gt;
      &lt;td&gt;122.04&lt;/td&gt;
      &lt;td&gt;69.45&lt;/td&gt;
      &lt;td&gt;102.7&lt;/td&gt;
      &lt;td&gt;62.9&lt;/td&gt;
      &lt;td&gt;110.86&lt;/td&gt;
      &lt;td&gt;63.91&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q77&lt;/td&gt;
      &lt;td&gt;23.94&lt;/td&gt;
      &lt;td&gt;7.51&lt;/td&gt;
      &lt;td&gt;29.3&lt;/td&gt;
      &lt;td&gt;6.8&lt;/td&gt;
      &lt;td&gt;49.95&lt;/td&gt;
      &lt;td&gt;12.20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q80&lt;/td&gt;
      &lt;td&gt;43.46&lt;/td&gt;
      &lt;td&gt;18.57&lt;/td&gt;
      &lt;td&gt;45.8&lt;/td&gt;
      &lt;td&gt;11.5&lt;/td&gt;
      &lt;td&gt;37.25&lt;/td&gt;
      &lt;td&gt;11.78&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;q85&lt;/td&gt;
      &lt;td&gt;20.97&lt;/td&gt;
      &lt;td&gt;16.54&lt;/td&gt;
      &lt;td&gt;16.9&lt;/td&gt;
      &lt;td&gt;14.7&lt;/td&gt;
      &lt;td&gt;14.65&lt;/td&gt;
      &lt;td&gt;10.52&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-partition-pruning/benchmark.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;18 TPC-DS queries improved runtime by over 50% while decreasing CPU usage by an average of 64%.
Data read was decreased by 66%.&lt;/li&gt;
  &lt;li&gt;7 TPC-DS queries improved between 30% to 50% while decreasing CPU usage by an average of 47%.
Data read was decreased by 54%.&lt;/li&gt;
  &lt;li&gt;29 TPC-DS queries improved by 10% to 30% while decreasing CPU by an average of 20%.
Data read was decreased by 27%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that the baseline here includes the improvements from the existing 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/1686&quot;&gt;node local dynamic filtering&lt;/a&gt; implementation.&lt;/p&gt;

&lt;h1 id=&quot;discussion&quot;&gt;Discussion&lt;/h1&gt;

&lt;p&gt;In order for dynamic filtering to work, the smaller dimension table needs to be chosen as a join’s build side.
Cost-based optimizer can automatically do this using table statistics from the metastore.
Therefore, we generated table statistics prior to running this benchmark and rely on the CBO to correctly choose
the smaller table on the build side of join.&lt;/p&gt;

&lt;p&gt;It is quite common for large fact tables to be partitioned by dimensions like time.
Queries joining such tables with filtered dimension tables benefit significantly from dynamic partition pruning. 
This optimization is applicable to partitioned Hive tables stored in any data format.
It also works with both broadcast and partitioned joins. Other connectors can easily take advantage of dynamic filters 
by implementing the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ConnectorSplitManager#getSplits&lt;/code&gt; API which supplies dynamic filters to the connector.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Support for using &lt;a href=&quot;https://github.com/trinodb/trino/pull/3871&quot;&gt;min-max range&lt;/a&gt; in DynamicFilterSourceOperator when 
the build-side contains too many values.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/3972&quot;&gt;Passing dynamic filters back to the worker nodes&lt;/a&gt; from coordinator 
to allow ORC and Parquet readers to use dynamic filters with partitioned joins.&lt;/li&gt;
  &lt;li&gt;Allow connectors to &lt;a href=&quot;https://github.com/trinodb/trino/pull/3414&quot;&gt;block probe-side scan&lt;/a&gt; until dynamic filters are ready.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2674&quot;&gt;Support dynamic filtering with inequality operators&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2190&quot;&gt;Support for semi-joins&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Take advantage of dynamic filters in connectors other than Hive.&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Raunaq Morarka, Qubole and Karol Sobczak, Starburst Data</name>
        </author>
      

      <summary>Star-schema is one of the most widely used data mart patterns. The star schema consists of fact tables (usually partitioned) and dimension tables, which are used to filter rows from fact tables. Consider the following query which captures a common pattern of a fact table store_sales partitioned by the column ss_sold_date_sk joined with a filtered dimension table date_dim: SELECT COUNT(*) FROM store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk WHERE d_following_holiday=&apos;Y&apos; AND d_year = 2000; Without dynamic filtering, Presto will push predicates for the dimension table to the table scan on date_dim but it will scan all the data in the fact table since there are no filters on store_sales in the query. The join operator will end up throwing away most of the probe-side rows as the join criteria is highly selective. The current implementation of dynamic filtering improves on this, however it is limited only to broadcast joins on tables stored in ORC or Parquet format. Additionally, it does not take advantage of the layout of partitioned Hive tables. With dynamic partition pruning, which extends the current implementation of dynamic filtering, every worker node collects values eligible for the join from date_dim.d_date_sk column and passes it to the coordinator. Coordinator can then skip processing of the partitions of store_sales which don’t meet the join criteria. This greatly reduces the amount of data scanned from store_sales table by worker nodes. This optimization is applicable to any storage format and to both broadcast and partitioned join.</summary>

      
      
    </entry>
  
    <entry>
      <title>Hive ACID and transactional tables&apos; support in Presto</title>
      <link href="https://trino.io/blog/2020/06/01/hive-acid.html" rel="alternate" type="text/html" title="Hive ACID and transactional tables&apos; support in Presto" />
      <published>2020-06-01T00:00:00+00:00</published>
      <updated>2020-06-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/06/01/hive-acid</id>
      <content type="html" xml:base="https://trino.io/blog/2020/06/01/hive-acid.html">&lt;p&gt;Hive ACID and transactional tables are supported in Presto since the 331
release. Hive ACID support is an important step towards GDPR/CCPA compliance,
and also towards Hive 3 support as &lt;a href=&quot;https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/hive-overview/content/hive_upgrade_changes.html&quot;&gt;certain distributions&lt;/a&gt;
of Hive 3 create transactional tables by default.&lt;/p&gt;

&lt;p&gt;In this blog post we cover the concepts of Hive ACID and transactional
tables along with the changes done in Presto to support them. We also cover the
performance tests on this integration and look at the future plans for this
feature.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;how-to-use-hive-acid-and-transactional-tables-in-presto&quot;&gt;How to use Hive ACID and transactional tables in Presto&lt;/h1&gt;

&lt;p&gt;Hive transactional tables are readable in Presto without any need to tweak
configs, you only need to take care of these requirements:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Use Presto version 331 or higher&lt;/li&gt;
  &lt;li&gt;Use Hive 3 Metastore Server. Presto does not support Hive transactional
tables created with Hive before version 3.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that Presto cannot create or write to Hive transactional tables yet. You
can create and write to Hive transactional tables via
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions&quot;&gt;Hive&lt;/a&gt;
or via Spark with &lt;a href=&quot;https://github.com/qubole/spark-acid&quot;&gt;Hive ACID Data Source plugin&lt;/a&gt; and
use Presto to read these tables.&lt;/p&gt;

&lt;h1 id=&quot;what-is-hive-acid-and-hive-transactional-tables&quot;&gt;What is Hive ACID and Hive transactional tables&lt;/h1&gt;
&lt;p&gt;Hive transactional tables are the tables in Hive that provide ACID semantics.
This excerpt from
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions&quot;&gt;Hive documentation&lt;/a&gt;
covers ACID traits well:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;“ACID stands for four traits of database transactions:
Atomicity (an operation either succeeds completely or fails,
it does not leave partial data), Consistency (once an application performs an
operation the results of that operation are visible to it in every subsequent
operation), Isolation (an incomplete operation by one user does not cause
unexpected side effects for other users), and Durability (once an operation is
complete it will be preserved even in the face of machine or system failure).
These traits have long been expected of database systems as part of their
transaction functionality.“&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1 id=&quot;need-for-hive-acid-and-transactional-tables&quot;&gt;Need for Hive ACID and transactional tables&lt;/h1&gt;
&lt;p&gt;In any organisation, there is always a need to update or delete existing entries
in tables e.g., a user writes or updates the review for an item purchased a
week back or a transaction status is changed after a day, etc..
With regulations like GDPR/CCPA updates/deletes become even more frequent as the
users can ask the organisation to delete the data on them, and organisations are
obligated to fulfill these requests.&lt;/p&gt;

&lt;p&gt;The standard practice to update data has been to overwrite the partition or
table with the updated data but this is inefficient and unreliable. It takes a
lot of resources to overwrite all of the existing data to update a few entries,
but more importantly there are issues around isolation when reads on old data
are going on and the overwrite starts deleting that data. To solve these issues
several solutions have been developed, many of them are covered
&lt;a href=&quot;https://www.qubole.com/blog/qubole-open-sources-multi-engine-support-for-updates-and-deletes-in-data-lakes/&quot;&gt;in this blog post&lt;/a&gt;,
and Hive ACID is one of them.&lt;/p&gt;

&lt;h1 id=&quot;concepts-of-hive-acid-and-transactional-tables&quot;&gt;Concepts of Hive ACID and transactional tables&lt;/h1&gt;

&lt;p&gt;Several concepts like transactions, WriteIds, deltas, locks, etc. are added in
Hive to achieve ACID semantics. To understand the changes done in Presto to
support Hive ACID and transactional tables, covered in the next section, it is
important to understand these concepts first. So let’s look at them in detail.&lt;/p&gt;

&lt;h2 id=&quot;types-of-hive-transactional-tables&quot;&gt;Types of Hive transactional tables&lt;/h2&gt;
&lt;p&gt;There are two types of Hive transactional tables: Insert-Only transactional
tables and CRUD transactional tables.
Following table compares the two:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Type of transactional table&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Hive DML Operations Supported&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Input Formats supported&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Synthetic columns in file?&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Additional Table Properties&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Insert-Only Transactional Tables&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;INSERT&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;All input formats&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;No&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;transactional&apos;=&apos;true&apos;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;transactional_properties&apos;=&apos;insert_only&apos;&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;CRUD Transactional Tables&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;INSERT, UPDATE, DELETE&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;ORC&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Yes&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;transactional&apos;=&apos;true&apos;&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;hive-transactions&quot;&gt;Hive Transactions&lt;/h2&gt;
&lt;p&gt;Hive transactional tables should be accessed under Hive Transactions only. Note that
these transactions are different from Presto transactions and are managed by
Hive. Running DML queries under separate transactions helps in atomicity. Each
transaction is independent and when rolled back will not have any impact on the
state of the table.&lt;/p&gt;

&lt;h2 id=&quot;writeids&quot;&gt;WriteIds&lt;/h2&gt;
&lt;p&gt;DML queries under a transaction write to a unique location under partition/table
described in detail later in “New Sub-Directories” section. This location is derived
by WriteId allocated to the transaction. This provides Isolation of DML queries
and such queries can run in parallel, whenever they can, without interfering
with each other.&lt;/p&gt;

&lt;h2 id=&quot;valid-writeids&quot;&gt;Valid WriteIds&lt;/h2&gt;
&lt;p&gt;Read queries under a transaction get a list of valid WriteIds that belong to the
transactions which were successfully committed. This ensures Consistency by
making results of committed transactions available to all the future
transactions and also provides Isolation as DML and read queries can run in
parallel with read queries not reading partial data written by DML queries.&lt;/p&gt;

&lt;h2 id=&quot;new-sub-directories&quot;&gt;New Sub-Directories&lt;/h2&gt;
&lt;p&gt;Results of a DML queries are written to a unique location derived from WriteId
of the transaction. These unique locations are delta directories under
partition/table location. Apart from the WriteId, this unique location is made
up of the DML operation and depending on the operation type there can be two
types of delta directories:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Delete Delta Directory: This delta directory is created for results of
DELETE statements and is named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta_&amp;lt;writeId&amp;gt;_&amp;lt;writeId&amp;gt;&lt;/code&gt; under
partition/table location.&lt;/li&gt;
  &lt;li&gt;Delta Directory: This type is created for the results of INSERT statements
and is named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta_&amp;lt;writeId&amp;gt;_&amp;lt;writeId&amp;gt;&lt;/code&gt; under partition/table location.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Apart from delta directories, there is another sub-directory that is now added
called “Base directory” and is named as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;base_&amp;lt;writeId&amp;gt;&lt;/code&gt; under partition/table
location. This type of directory is created by INSERT OVERWRITE TABLE query or
by major compaction which is described later.&lt;/p&gt;

&lt;p&gt;The following animation shows how these new sub-directories are created in the
filesystem along with transaction management at metastore with different
queries:
&lt;img src=&quot;/assets/blog/hive-acid/directories.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;rowid&quot;&gt;RowID&lt;/h2&gt;
&lt;p&gt;To uniquely identify each row in the table, a synthetic rowId is created and
added to each row. RowIds are added to CRUD transactional tables only because it
is used in case of DELETE statements only. When a DELETE is performed, the
rowIds of the rows that it would delete are written into the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;
directory and subsequents reads will read all but these rows.&lt;/p&gt;

&lt;p&gt;RowId is made of 5 entries today: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;operation&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;originalTransaction&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bucket&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rowId&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;currentTransaction&lt;/code&gt; but &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;operation&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;currentTransaction&lt;/code&gt; fields
are redundant now.
RowId is added in the root STRUCT of ORC and hence the schema of ORC files is
different from the schema defined in the table, e.g.:&lt;/p&gt;

&lt;p&gt;Schema of CRUD transactional Hive Table:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;n_nationkey : int,
n_name : string,
n_regionkey : int,
n_comment : string
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Schema of ORC file for this table:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct {
    operation : int,
    originalTransaction : bigint,
    bucket : int,
    rowId : bigint,
    currentTransaction : bigint,
    row : struct {
        n_nationkey : int,
        n_name : string,
        n_regionkey : int,
        n_comment : string
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that one level of nesting of table schema, like the inner struct above, is
applicable to flat Hive tables too. The two level nesting of data columns is
added for Orc files of CRUD transactional tables to keep rowId columns isolated
from data columns.&lt;/p&gt;

&lt;h2 id=&quot;compactions&quot;&gt;Compactions&lt;/h2&gt;
&lt;p&gt;The working described above with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories for each
transaction makes the DML queries execute fast but have
the following impact on read queries:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Many delta directories with small data in each directory will slow down
execution of read queries. This is a known problem around
small files where engines end up spending more time opening files than actually
processing the data.&lt;/li&gt;
  &lt;li&gt;Cross referencing all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories to remove all deleted rows
slows down the reads.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To solve these problems, Hive compacts delta directories asynchronously at two
levels:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Minor Compaction: This compaction combines active &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; directories into one
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; directory and active &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories into one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;
directory thereby decreasing the number of small files. Limiting scope of this
compaction to combining only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; directories keeps it fast. Minor compaction
is automatically triggered as soon as active delta directories count reaches
10 (configurable). This compaction creates new delta directories like
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta_&amp;lt;start_write_id&amp;gt;_&amp;lt;end_write_id&amp;gt;&lt;/code&gt; where [start_write_id, end_write_id]
gives the range of existing delta directories that we compacted. Similar naming
convention is used for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directory.&lt;/li&gt;
  &lt;li&gt;Major Compaction: Minor compaction does not work on merging base, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories as that requires rewriting of data with only the
non-deleted rows, hence time consuming. This work is handled by a separate, less
frequent and longer running, compaction called Major compaction. Major
compaction is triggered when the total size of delta directories reaches
10% (configurable) of the base directory size. This compaction creates a new
Base directory.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;locks&quot;&gt;Locks&lt;/h2&gt;
&lt;p&gt;Hive uses shared locks to control what operations can run in parallel on
partition/table. For example, DML queries take a write-lock on partitions they
are modifying while read queries take a read-lock on partitions they are
reading. The read-locks taken by read queries prevents Hive from cleaning up the
delta directories that have been compacted while they are being read by the
query.&lt;/p&gt;

&lt;h1 id=&quot;changes-in-presto-to-support-hive-acid-and-transactional-ables&quot;&gt;Changes in Presto to support Hive ACID and transactional ables&lt;/h1&gt;

&lt;p&gt;At high level, there are changes at two places in Presto to support Hive ACID
and transactional tables: In split generation logic that runs in coordinator and
in ORC reader that is used in workers.&lt;/p&gt;

&lt;h2 id=&quot;split-generation&quot;&gt;Split generation&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;Hive ACID State is setup in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiTransactionalHiveMetastore.beginQuery&lt;/code&gt;,
only for Hive transactional tables:
    &lt;ol&gt;
      &lt;li&gt;A new Hive transaction is opened per Query&lt;/li&gt;
      &lt;li&gt;A shared read-lock is obtained from Metastore server for the partitions
 read in the query&lt;/li&gt;
      &lt;li&gt;A Heartbeat mechanism is set up to inform the Metastore server about
 liveliness periodically. Frequency of heartbeats is figured out from the
 Metastore server but can be overridden with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.transaction-heartbeat-interval&lt;/code&gt;
 property.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundSplitLoader&lt;/code&gt; is set up with valid WriteIds for the partitions as
provided by Metastore server&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BackgroundSplitLoader.loadPartitions&lt;/code&gt; is called in an Executor to create
splits for each partition:
    &lt;ol&gt;
      &lt;li&gt;ACID sub-directories: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;base&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories are
 figured out by listing the partition location&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaLocations&lt;/code&gt;, a registry of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories, is
 created. It contains minimal information through which &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt;
 directory paths can be recreated at workers.&lt;/li&gt;
      &lt;li&gt;HiveSplits are created with each location of base and delta directories.
 Each HiveSplit contains the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaLocations&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;If the table is Insert-Only transactional table then
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DeleteDeltaLocations&lt;/code&gt; is empty and the HiveSplit is same as the HiveSplit
 on flat/non-transactional Hive table&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;reading-hive-transactional-data-in-workers&quot;&gt;Reading Hive transactional data in workers&lt;/h2&gt;

&lt;p&gt;The HiveSplit generated during the split generation phase make their way to
worker nodes where OrcPageSourceFactory is used to create PageSource for
TableScan operator.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Insert-Only transactional tables are read in the same way a non-transactional
tables are read, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSource&lt;/code&gt; is created for their splits which reads the
data for the split and makes it available to TableScanOperator&lt;/li&gt;
  &lt;li&gt;CRUD transactional tables need special handling during reads because the file
schema does not match the table for them due to the synthetic RowId column added
which introduces additional Struct nesting as mentioned earlier:
    &lt;ol&gt;
      &lt;li&gt;RowId columns are added to the list of columns to be read from file&lt;/li&gt;
      &lt;li&gt;ORC reader is setup by accessing column name from the file instead of
 using the column indexes from table schema, equivalent to forcing
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.orc.use-column-names=true&lt;/code&gt; for CRUD transactional tables&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcRecordReader&lt;/code&gt; is created for the ORC file of the split&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt; is created for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; locations, if any.&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcPageSouce&lt;/code&gt; is created that returns rows from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcRecordReader&lt;/code&gt;
 which are not present in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrcDeletedRows&lt;/code&gt;. This cross referencing of deleted
 rows is done lazily for each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Block&lt;/code&gt; of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; only when that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Block&lt;/code&gt; is
 needed to be read from the PageSource. This works well with the lazy
 materialization logic of Presto to skip over Blocks if a predicate does not
 apply to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; at all.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;performance-numbers&quot;&gt;Performance numbers&lt;/h1&gt;
&lt;p&gt;Each Insert on Hive transactional table can create additional splits for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delta&lt;/code&gt;
directories and each delete can create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_delta&lt;/code&gt; directories that adds
additional work of cross referencing deleted rows while reading the split. To
measure the impact of these operations on reads from Presto we ran the following
performance tests where multiple Hive transactional tables are created with
varying number of Insert and Delete operations and runtime of different
read-focused Presto queries were recorded:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Table Type&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;Description&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;delta directories&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;delete_delta directories&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Flat&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;TPCDS store_sales scale 3000 table, 8.6B rows&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Only Base&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Hive transactional store_sales scale 3000 table: 8.6B rows&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 1-Delete&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Only Base” with rows having customer_id=100 deleted by 1 DELETE query: 347 deleted entries&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 1-Delete + 1-Insert&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Base + 1 Delete” with deleted rows added back by 1 INSERT query: 347 deleted entries + 347 inserted entries&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 5-Deletes&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Only Base” with rows for 5 customer_ids deleted by 5 DELETE queries: 1355 rows deleted&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;0&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Base + 5-Deletes + 5-Inserts&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;Derived from “Base + 1 Delete” with deleted rows added back by 5 INSERT queries: 1355 deleted entries + 1355 inserted entries&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;5&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;5&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Following is the result of these tests, ran on a cluster with 5 c3.4xlarge
machines on AWS:
&lt;img src=&quot;/assets/blog/hive-acid/perf.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It was seen that there is an impact of deleted rows on read performance, which
is expected as the work for the reader increases in this case. But with
predicates in place, this impact was reduced as the amount of data to be read
goes down.&lt;/p&gt;

&lt;h1 id=&quot;ongoing-and-future-work&quot;&gt;Ongoing and Future work&lt;/h1&gt;
&lt;p&gt;There has been ongoing work on the Hive ACID integration and some improvements
are planned in future, notably:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Bucketed Hive transactional table support has been added (&lt;a href=&quot;https://github.com/trinodb/trino/pull/1591&quot;&gt;#1591&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Support for original files is in progress (&lt;a href=&quot;https://github.com/trinodb/trino/pull/2930&quot;&gt;#2930&lt;/a&gt;),
this will allow Presto to read the Hive tables that were converted to
transactional table at some point after having non-transactional data&lt;/li&gt;
  &lt;li&gt;Write support will be taken up in future (&lt;a href=&quot;https://github.com/trinodb/trino/issues/1956&quot;&gt;#1956&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;There is ongoing work on Hive side for ACID on Parquet format. Once that
lands, Presto’s implementation will be extended to support Parquet too.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;acknowledgements-and-conclusion&quot;&gt;Acknowledgements and Conclusion&lt;/h1&gt;
&lt;p&gt;Thanks to the folks who helped out in the development of this feature:
&lt;a href=&quot;https://www.linkedin.com/in/abhishek-somani-a946aa1b&quot;&gt;Abhishek Somani&lt;/a&gt; provided
continuous guidance on internals of Hive ACID,
&lt;a href=&quot;https://www.linkedin.com/in/dainsundstrom&quot;&gt;Dain&lt;/a&gt; helped out with simplifying
ORC reader and along with &lt;a href=&quot;https://www.linkedin.com/in/piotrfindeisen/&quot;&gt;Piotr&lt;/a&gt;
helped in code refinement and with multiple rounds of reviews.&lt;/p&gt;

&lt;p&gt;While we continue development on this feature to get full fledged support
including writes, you can start using it on Hive transactional tables which do
not have files in flat format. If you have such tables and want to use Presto
with them then you can apply &lt;a href=&quot;https://github.com/trinodb/trino/pull/2930&quot;&gt;this fix&lt;/a&gt;
to your Presto installation or you can trigger a  major compaction on all
partitions to migrate full table into CRUD transactional table format.&lt;/p&gt;</content>

      
        <author>
          <name>Shubham Tagra, Qubole</name>
        </author>
      

      <summary>Hive ACID and transactional tables are supported in Presto since the 331 release. Hive ACID support is an important step towards GDPR/CCPA compliance, and also towards Hive 3 support as certain distributions of Hive 3 create transactional tables by default. In this blog post we cover the concepts of Hive ACID and transactional tables along with the changes done in Presto to support them. We also cover the performance tests on this integration and look at the future plans for this feature.</summary>

      
      
    </entry>
  
    <entry>
      <title>Apache Pinot Connector</title>
      <link href="https://trino.io/blog/2020/05/25/pinot-connector.html" rel="alternate" type="text/html" title="Apache Pinot Connector" />
      <published>2020-05-25T00:00:00+00:00</published>
      <updated>2020-05-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/05/25/pinot-connector</id>
      <content type="html" xml:base="https://trino.io/blog/2020/05/25/pinot-connector.html">&lt;p&gt;Presto 334 introduces the new &lt;a href=&quot;https://trino.io/docs/current/connector/pinot.html&quot;&gt;Pinot Connector&lt;/a&gt;
which allows Presto to query data stored in &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Apache Pinot™&lt;/a&gt;.
Not only does this allow access to Pinot tables but gives users the ability to do things they could not do with Pinot
alone such as join Pinot tables to other tables and use Presto’s scalar functions, window functions and complex aggregations.&lt;/p&gt;

&lt;p&gt;Pinot UDF’s can be directly used by including the Pinot SQL query in quotes, explained below in the &lt;em&gt;Pinot SQL Passthrough&lt;/em&gt; section.
This enables aggregations and other complex query types to be done directly in Pinot.&lt;/p&gt;

&lt;p&gt;This connector supports Pinot 0.3.0 and newer.&lt;/p&gt;

&lt;h1 id=&quot;setup&quot;&gt;Setup&lt;/h1&gt;

&lt;p&gt;Create a properties file in the catalog directory, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog/pinot.properties&lt;/code&gt; which includes at least the
following to get started:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;connector.name=pinot
pinot.controller-urls=host1:9000,host2:9000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.controller-urls&lt;/code&gt; is a comma separated list of controller hosts. If Pinot is deployed via &lt;a href=&quot;https://kubernetes.io/&quot;&gt;Kubernetes&lt;/a&gt; and you expose the 
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pinot.controller-urls&lt;/code&gt; needs to point to the controller Service endpoint. The Pinot broker and server must be accessible
via DNS as Pinot will return hostnames and not ip addresses.&lt;/p&gt;

&lt;p&gt;If you have a smaller number of Pinot servers than Presto workers or a relatively small number of rows per Pinot segment,
you can minimize the requests to pinot by increasing the number of Pinot segments per split (default is 1 segment per split):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pinot.segments-per-split=15
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If DNS resolution is slow or you get &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Request timed out&lt;/code&gt; errors, you can increase the request timeout as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pinot.request-timeout=3m
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;schema&quot;&gt;Schema&lt;/h1&gt;

&lt;p&gt;Pinot supports the following data types. Currently null values are not supported. The corresponding Presto datatypes are:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Pinot Datatype&lt;/th&gt;
      &lt;th&gt;Presto Datatype&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;boolean&lt;/td&gt;
      &lt;td&gt;boolean&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;integer&lt;/td&gt;
      &lt;td&gt;integer&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;float, double&lt;/td&gt;
      &lt;td&gt;double&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;string, bytes*&lt;/td&gt;
      &lt;td&gt;varchar&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;integer_array&lt;/td&gt;
      &lt;td&gt;array(integer)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;float_array, double_array&lt;/td&gt;
      &lt;td&gt;array(double)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;long_array&lt;/td&gt;
      &lt;td&gt;array(bigint)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;string_array&lt;/td&gt;
      &lt;td&gt;array(varchar)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;ul&gt;
  &lt;li&gt;The Pinot &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bytes&lt;/code&gt; type is converted to a hex-encoded varchar. See the &lt;a href=&quot;https://pinot.apache.org/&quot;&gt;Pinot docs&lt;/a&gt; for more information.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;pinot-sql-passthrough&quot;&gt;Pinot SQL Passthrough&lt;/h1&gt;

&lt;p&gt;If you would like to leverage Pinot’s fast aggregations you can use a “dynamic” table where you specify the Pinot SQL 
query as the table name and it is passed directly to Pinot:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pinot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;&quot;SELECT col3, col4, MAX(col1), COUNT(col2) FROM pinot_table GROUP BY col3, col4&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;FOO&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;BAR&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The filter in the outer presto query will be pushed down into the Pinot query via Presto’s
&lt;a href=&quot;https://github.com/trinodb/trino/blob/334/presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java#L746&quot;&gt;applyFilter()&lt;/a&gt;.
These queries are routed to the broker and
should not return huge amounts of data as broker queries currently return a single response with all the results. This
is more suited to aggregate queries.&lt;/p&gt;

&lt;p&gt;Limits are pushed into the “dynamic” Pinot query via Presto’s
&lt;a href=&quot;https://github.com/trinodb/trino/blob/334/presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java#L727&quot;&gt;applyLimit()&lt;/a&gt;.
The above query would yield the following Pinot PQL query:&lt;/p&gt;

&lt;p&gt;Pinot functions such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PERCENTILEEST&lt;/code&gt; can be used in the quoted sql.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pinot_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;FOO&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;BAR&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you are returning a larger dataset you can issue a normal Presto query which will get routed to the Pinot servers which
store the Pinot segments. Filters and Limits are pushed down to Pinot for regular queries as well.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future Work&lt;/h1&gt;

&lt;p&gt;As Presto and Pinot continue to evolve the Pinot connector will leverage new features such as aggregation pushdown and more.&lt;/p&gt;</content>

      
        <author>
          <name>Elon Azoulay</name>
        </author>
      

      <summary>Presto 334 introduces the new Pinot Connector which allows Presto to query data stored in Apache Pinot™. Not only does this allow access to Pinot tables but gives users the ability to do things they could not do with Pinot alone such as join Pinot tables to other tables and use Presto’s scalar functions, window functions and complex aggregations.</summary>

      
      
    </entry>
  
    <entry>
      <title>State of Presto</title>
      <link href="https://trino.io/blog/2020/05/15/state-of-presto.html" rel="alternate" type="text/html" title="State of Presto" />
      <published>2020-05-15T00:00:00+00:00</published>
      <updated>2020-05-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/05/15/state-of-presto</id>
      <content type="html" xml:base="https://trino.io/blog/2020/05/15/state-of-presto.html">&lt;p&gt;Presto is continuing to gain adoption across many industries and use cases. Our
community is growing rapidly and there is a lot going on, so we are taking the
Presto Summit online. And we are starting with a State of Presto webinar with
the founders of the project.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;We had a great event with lots of questions from the audience, taking us beyond
the planned time frame. Check out the recording to learn more:&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/epdgIsAT3EA&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;!--more--&gt;

&lt;p&gt;Join us virtually to hear Presto co-creators 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;,
&lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;, and 
&lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt; talk about the state of Presto,
followed by a live Q&amp;amp;A moderated by Presto maintainer
&lt;a href=&quot;https://github.com/findepi&quot;&gt;Piotr Findeisen&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Agenda:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;2020 project milestones&lt;/li&gt;
  &lt;li&gt;Community and technical growth&lt;/li&gt;
  &lt;li&gt;Recent Presto updates&lt;/li&gt;
  &lt;li&gt;Project roadmap&lt;/li&gt;
  &lt;li&gt;Live Q&amp;amp;A&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Date: Thursday, 21 May 2020&lt;/p&gt;

&lt;p&gt;Time: 11am PDT (San Francisco), 2pm EDT (New York), 7pm BST (London), 6pm UTC&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;register-now&quot;&gt;&lt;a href=&quot;https://www.starburstdata.com/webinar-state-of-presto/?utm_campaign=Webinar%20-%20State%20of%20Presto%20-%202020%20-%20May&amp;amp;utm_source=trino.io&amp;amp;utm_medium=blog&quot;&gt;Register now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;We look forward to many questions and a lively webinar.&lt;/p&gt;</content>

      
        <author>
          <name>Manfred Moser</name>
        </author>
      

      <summary>Presto is continuing to gain adoption across many industries and use cases. Our community is growing rapidly and there is a lot going on, so we are taking the Presto Summit online. And we are starting with a State of Presto webinar with the founders of the project. Update: We had a great event with lots of questions from the audience, taking us beyond the planned time frame. Check out the recording to learn more:</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto on FLOSS Weekly</title>
      <link href="https://trino.io/blog/2020/05/06/floss-weekly.html" rel="alternate" type="text/html" title="Presto on FLOSS Weekly" />
      <published>2020-05-06T00:00:00+00:00</published>
      <updated>2020-05-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/05/06/floss-weekly</id>
      <content type="html" xml:base="https://trino.io/blog/2020/05/06/floss-weekly.html">&lt;p&gt;Spreading the word about our project is an important task to grow the community
around Presto. With a large, lively community we can ensure the success of
Presto. Today we had the opportunity to talk about Presto on the long running
open source podcast &lt;a href=&quot;https://twit.tv/shows/floss-weekly&quot;&gt;FLOSS Weekly&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;a href=&quot;http://www.stonehenge.com/merlyn/&quot;&gt;Randal Schwartz&lt;/a&gt; was joined by his co-host
&lt;a href=&quot;https://webmink.com/about/&quot;&gt;Simon Phipps&lt;/a&gt;. We introduced Presto overall and
talked about use cases of Presto and the problems it can solve. Both hosts, as
well as the live audience, had some great questions and we did our best to
answer them.&lt;/p&gt;

&lt;p&gt;We moved through the history of Presto, current users and usage, the community
around the project, and Dain talked about some of the upcoming improvements. In
the end it seemed like we just scratched the surface and all wanted to keep
talking about the project.&lt;/p&gt;

&lt;p&gt;It was a great conversation and you should check it out!&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;watch-a-recording-of-the-presto-episode-of-floss-weekly-now&quot;&gt;&lt;a href=&quot;https://twit.tv/shows/floss-weekly/episodes/577?autostart=false&quot;&gt;Watch a recording of the Presto episode of FLOSS Weekly now!&lt;/a&gt;&lt;/h2&gt;
&lt;/blockquote&gt;</content>

      
        <author>
          <name>Dain Sundstrom and Manfred Moser</name>
        </author>
      

      <summary>Spreading the word about our project is an important task to grow the community around Presto. With a large, lively community we can ensure the success of Presto. Today we had the opportunity to talk about Presto on the long running open source podcast FLOSS Weekly.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto: The Definitive Guide</title>
      <link href="https://trino.io/blog/2020/04/11/the-definitive-guide.html" rel="alternate" type="text/html" title="Presto: The Definitive Guide" />
      <published>2020-04-11T00:00:00+00:00</published>
      <updated>2020-04-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/04/11/the-definitive-guide</id>
      <content type="html" xml:base="https://trino.io/blog/2020/04/11/the-definitive-guide.html">&lt;p&gt;Nearly two years ago Matt and Martin got the ball rolling on getting a book
about Presto happening. A thriving project and community like everyone around
Dain, David and Martin, the founders and creators of Presto, just needs a book.
Even in this digital age of online documentation, communities on chat and other
platforms, and videos everywhere, there is great value in a well structured and
written book. Today, we are happy to announce that our book &lt;strong&gt;Presto: The
Definitive Guide&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h2 id=&quot;get-a-free-copy-of-trino-the-definitive-guide-from-starburst-now&quot;&gt;&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;Get a free copy of Trino: The Definitive Guide&lt;/a&gt; from &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt; now!&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;This first book about Presto, is finally available for you all to get, read and
hopefully learn from.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Update April 2021&lt;/strong&gt;: The project has moved to the
&lt;a href=&quot;/blog/2020/12/27/announcing-trino.html&quot;&gt;new name Trino&lt;/a&gt;, and the content
of our book
&lt;a href=&quot;/blog/2021/04/21/the-definitive-guide.html&quot;&gt;has been updated&lt;/a&gt; to
&lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;Trino: The Definitive Guide&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;img src=&quot;/assets/ttdg-cover.png&quot; align=&quot;right&quot; style=&quot;float: right; margin-left: 20px; margin-bottom: 20px; width: 100%; max-width: 350px;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;With the help of O’Reilly, the book is now available in digital form, and paper
copies are just around the corner as well. You can find more information about
the book on &lt;a href=&quot;/trino-the-definitive-guide.html&quot;&gt;our permanent page about
it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is based on the very recent 330 release of Presto, but applicable to any
Presto version. The book is broken up into three separate parts. No matter, if
you are beginner keen to learn, or maybe with just a bit of command line and SQL
knowledge, or an advanced or even expert Presto user, we are certain that you
can learn something from the book and encourage you to check it out.&lt;/p&gt;

&lt;p&gt;The first part of the book establishes what Presto is, and gets you quick wins
to install a minimal setup, run it, connect to it with the CLI and an
application using the JDBC driver and run some SQL queries.&lt;/p&gt;

&lt;p&gt;The second part dives into the details of the Presto architecture, query
planning, connectors for all sorts of data sources and SQL usage. There is a lot
to learn and digest in these main sections.&lt;/p&gt;

&lt;p&gt;In the third part we round things out with tuning tips, a good overview
of the Web UI, usage of other tools, security configuration and more tips to get
Presto into production.&lt;/p&gt;

&lt;p&gt;Of course, putting all this information together requires work from many people.
And in fact we did get lots of help from members of the Presto community and
O’Reilly.&lt;/p&gt;

&lt;p&gt;Specifically, we have some great news from our major supporter, Starburst!
Starburst allowed us to work on the book and bring it across the finish line.&lt;/p&gt;

&lt;p&gt;And that turns out to be great news for you all as well. Not only is the book
finished now, you can also get a
&lt;a href=&quot;https://www.starburst.io/info/oreilly-trino-guide/&quot;&gt;free digital copy of Trino: The Definitive Guide&lt;/a&gt;
from &lt;a href=&quot;https://www.starburst.io&quot;&gt;Starburst&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So what are you waiting for? Go get a copy, check out the &lt;a href=&quot;https://github.com/trinodb/trino-the-definitive-guide&quot;&gt;code repository for
the book&lt;/a&gt;, provide
feedback and contact us on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Looking forward to it all!&lt;/p&gt;

&lt;p&gt;Matt, Manfred and Martin&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Exhausted, but happy authors&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Matt Fuller, Manfred Moser and Martin Traverso</name>
        </author>
      

      <summary>Nearly two years ago Matt and Martin got the ball rolling on getting a book about Presto happening. A thriving project and community like everyone around Dain, David and Martin, the founders and creators of Presto, just needs a book. Even in this digital age of online documentation, communities on chat and other platforms, and videos everywhere, there is great value in a well structured and written book. Today, we are happy to announce that our book Presto: The Definitive Guide. Get a free copy of Trino: The Definitive Guide from Starburst now! This first book about Presto, is finally available for you all to get, read and hopefully learn from. Update April 2021: The project has moved to the new name Trino, and the content of our book has been updated to Trino: The Definitive Guide.</summary>

      
      
    </entry>
  
    <entry>
      <title>Beyond LIMIT, Presto meets OFFSET and TIES</title>
      <link href="https://trino.io/blog/2020/02/03/beyond-limit-presto-meets-offset-and-ties.html" rel="alternate" type="text/html" title="Beyond LIMIT, Presto meets OFFSET and TIES" />
      <published>2020-02-03T00:00:00+00:00</published>
      <updated>2020-02-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/02/03/beyond-limit-presto-meets-offset-and-ties</id>
      <content type="html" xml:base="https://trino.io/blog/2020/02/03/beyond-limit-presto-meets-offset-and-ties.html">&lt;p&gt;Presto follows the SQL Standard faithfully. We extend it only when it is well justified,
we strive to never break it and we always prefer the standard way of doing things.
There was one situation where we stumbled, though. We had a non-standard way of limiting
query results with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; without implementing the standard way of doing that first.
We have corrected that, adding ANSI SQL way of limiting query results, discarding initial
results and – a hidden gem – retaining initial results in case of ties.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;limiting-query-results&quot;&gt;Limiting query results&lt;/h1&gt;

&lt;p&gt;Probably everyone using relational databases knows the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; syntax for limiting query
results. It is supported by e.g. MySQL, PostgreSQL and many more SQL engines following
their example. It is so common that one could think that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; is the standard way
of limiting the query results.  Let’s have a look at how various popular SQL engines
provide this feature.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;DB2, MySQL, MariaDB, PostgreSQL, Redshift, MemSQL, SQLite and many others provide the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;... LIMIT n&lt;/code&gt; syntax.&lt;/li&gt;
  &lt;li&gt;SQL Server provides &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT TOP n ...&lt;/code&gt; syntax.&lt;/li&gt;
  &lt;li&gt;Oracle provides &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;... WHERE ROWNUM &amp;lt;= n&lt;/code&gt; syntax.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And what does the SQL Standard say?&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FIRST&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ONLY&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we look again at the database systems mentioned above, it turns out many of them support the standard
syntax too: Oracle, DB2, SQL Server and PostgreSQL (although that’s not documented currently).&lt;/p&gt;

&lt;p&gt;And Presto? Presto has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; support since 2012. In &lt;a href=&quot;https://trino.io/docs/current/release/release-310.html&quot;&gt;Presto 310&lt;/a&gt;,
we added also the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; support.&lt;/p&gt;

&lt;p&gt;Let’s have a look beyond the limits.&lt;/p&gt;

&lt;h1 id=&quot;tie-break&quot;&gt;Tie break&lt;/h1&gt;

&lt;p&gt;Admittedly, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; syntax is way more verbose than the short &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT n&lt;/code&gt; syntax Presto
always supported (and still does). However, it is also more powerful: it allows selecting rows “top n,
ties included”. Consider a case where you want to list top 3 students with highest score on an exam.
What happens if the 3&lt;sup&gt;rd&lt;/sup&gt;, 4&lt;sup&gt;th&lt;/sup&gt; and 5&lt;sup&gt;th&lt;/sup&gt; persons have equal score? Which
one should be returned? Instead of getting an arbitrary (and indeterminate) result you can use
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS WITH TIES&lt;/code&gt; syntax:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FIRST&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TIES&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS WITH TIES&lt;/code&gt; clause retains all rows with equal values of the ordering keys (the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause) as
the last row that would be returned by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; clause.&lt;/p&gt;

&lt;h1 id=&quot;offset&quot;&gt;Offset&lt;/h1&gt;

&lt;p&gt;Per the SQL Standard, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS ONLY&lt;/code&gt; clause can be prepended with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET m&lt;/code&gt;, to skip &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m&lt;/code&gt; initial rows.
In such a case, it makes sense to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH NEXT ...&lt;/code&gt; variant of the clause – it’s allowed with and without &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;,
but definitely looks better with that clause.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;OFFSET&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NEXT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TIES&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As an extension to SQL Standard, and for the brevity of this syntax, we also allow &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;OFFSET&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;concluding-notes&quot;&gt;Concluding notes&lt;/h1&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST ... ROWS ONLY&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST ... WITH TIES&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt; are powerful and very useful clauses
that come especially handy when writing ad-hoc queries over big data sets. They offer certain syntactic freedom beyond
what is described here, so check out documentation of &lt;a href=&quot;/docs/current/sql/select.html#offset-clause&quot;&gt;OFFSET Clause&lt;/a&gt; and
&lt;a href=&quot;/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;LIMIT or FETCH FIRST Clauses&lt;/a&gt; for all the options.
Since semantics of these clauses depend on query results being well ordered, they are best used with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; that
defines proper ordering. Without proper ordering the results are arbitrary (except for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH TIES&lt;/code&gt;) which may or may
not be a problem, depending on the use case.&lt;/p&gt;

&lt;p&gt;For scheduled queries, or queries that are part of some workflow (as opposed to ad-hoc), we recommend using query
predicates (where relevant) instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;. Read more at
&lt;a href=&quot;https://use-the-index-luke.com/sql/partial-results/fetch-next-page&quot;&gt;https://use-the-index-luke.com/sql/partial-results/fetch-next-page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>Presto follows the SQL Standard faithfully. We extend it only when it is well justified, we strive to never break it and we always prefer the standard way of doing things. There was one situation where we stumbled, though. We had a non-standard way of limiting query results with LIMIT n without implementing the standard way of doing that first. We have corrected that, adding ANSI SQL way of limiting query results, discarding initial results and – a hidden gem – retaining initial results in case of ties.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto in 2019: Year in Review</title>
      <link href="https://trino.io/blog/2020/01/01/2019-summary.html" rel="alternate" type="text/html" title="Presto in 2019: Year in Review" />
      <published>2020-01-01T00:00:00+00:00</published>
      <updated>2020-01-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2020/01/01/2019-summary</id>
      <content type="html" xml:base="https://trino.io/blog/2020/01/01/2019-summary.html">&lt;p&gt;What a great year for the Presto community! We started with the year with the launch of the 
&lt;a href=&quot;/blog/2019/01/31/presto-software-foundation-launch.html&quot;&gt;Presto Software Foundation&lt;/a&gt;, 
with the long term goal of ensuring the project remains collaborative, open and independent from 
any corporate interest, for years to come.&lt;/p&gt;

&lt;p&gt;Since then, the community around Presto has grown and consolidated. We’ve seen contributions 
from more than 120 people across over 20 companies. Every week, 280 users and developers 
interact in the project’s &lt;a href=&quot;/slack.html&quot;&gt;Slack channel&lt;/a&gt;. We’d like to take the opportunity to thank 
everyone that contributed the project in one way or another. Presto wouldn’t be what it is without your 
help.&lt;/p&gt;

&lt;p&gt;With the collaboration of companies such as &lt;a href=&quot;https://starburstdata.com&quot;&gt;Starburst&lt;/a&gt;, &lt;a href=&quot;https://qubole.com&quot;&gt;Qubole&lt;/a&gt;, 
&lt;a href=&quot;https://varada.io&quot;&gt;Varada&lt;/a&gt;, &lt;a href=&quot;https://twitter.com&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.treasuredata.com&quot;&gt;ARM Treasure Data&lt;/a&gt;,
&lt;a href=&quot;https://wix.com&quot;&gt;Wix&lt;/a&gt;, &lt;a href=&quot;https://www.redhat.com&quot;&gt;Red Hat&lt;/a&gt;, and the &lt;a href=&quot;https://www.meetup.com/Big-things-are-happening-here/&quot;&gt;Big Things community&lt;/a&gt;,
we ran several Presto summits across the world:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/05/03/Presto-Conference-Israel.html&quot;&gt;Tel Aviv, Israel, April 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/05/17/Presto-Summit.html&quot;&gt;San Francisco, USA, June 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/07/11/report-for-presto-conference-tokyo.html&quot;&gt;Tokyo, Japan, July 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/09/05/Presto-Summit-Bangalore.html&quot;&gt;Bangalore, India, September, 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.starburstdata.com/technical-blog/nyc-presto-summit-recap/&quot;&gt;New York, USA, December 2019&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these events were a huge success and brought thousands of Presto users, contributors and other community members together to 
share their knowledge and experiences.&lt;/p&gt;

&lt;p&gt;The project has been more active than ever. We completed 28 releases comprised of more than 2850 
commits in over 1500 pull requests. Of course, that alone is not a good measure of progress, so 
let’s take a closer look at everything that went in. And there is a lot to look at!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;language-features&quot;&gt;Language Features&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST n ROWS [ONLY | WITH TIES]&lt;/code&gt;&lt;/a&gt; 
standard syntax. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WITH TIES&lt;/code&gt; clause is particularly useful when some of the rows have the same 
value for the columns being used to order the results of a query. Consider a case where you want to 
list top 5 students with highest score on an exam. If the 6th person has the same score as the 5th, you 
want to know this as well, instead of getting an arbitrary and non-deterministic result:&lt;/p&gt;

    &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;student&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exam_result&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;USING&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;student_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FETCH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FIRST&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ROWS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TIES&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/select.html#offset-clause&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;&lt;/a&gt; syntax, which is especially useful in ad-hoc queries.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/comment.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COMMENT ON &amp;lt;table&amp;gt;&lt;/code&gt;&lt;/a&gt; syntax to 
set or remove table comments. Comments can be shown via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DESCRIBE&lt;/code&gt;
or the new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;system.metadata.table_comments&lt;/code&gt; table.&lt;/li&gt;
  &lt;li&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL&lt;/code&gt; in the context of an outer join.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Support for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; in the context of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LEFT JOIN&lt;/code&gt;. With this feature, it is now possible 
to preserve the outer row when the array contains zero elements or is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt;. Most common usages
of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN&lt;/code&gt; should actually be using this form.&lt;/p&gt;

    &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IGNORE NULLS&lt;/code&gt; clause for window functions. This is useful when combined with 
functions such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lead&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lag&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;first_value&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_value&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nth_value&lt;/code&gt; if the dataset contains nulls.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; expansion using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.*&lt;/code&gt; operator.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/create-schema.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CREATE SCHEMA&lt;/code&gt;&lt;/a&gt; syntax and support 
in various connectors (Hive, Iceberg, MySQL, PostgreSQL, Redshift, SQL Server, Phoenix).&lt;/li&gt;
  &lt;li&gt;Support for correlated subqueries containing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;+&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Subscript operator to access &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; type fields by index. This greatly improves usability 
and readability of queries when dealing with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ROW&lt;/code&gt; types containing anonymous fields.&lt;/p&gt;

    &lt;p&gt;&lt;img src=&quot;/assets/blog/2019-review/row-ordinal.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;query-engine&quot;&gt;Query Engine&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Generalize conditional, lazy loading and processing (a.k.a., Late Materialization) beyond 
Table Scan, Filter and Projection to support Join, Window, TopN and SemiJoin operators. This can dramatically 
reduce latency, CPU and I/O for highly selective queries. This is one of the most important performance 
optimizations in recent times and we will be blogging about this more in coming weeks.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;Unwrap cast/predicate pushdown&lt;/a&gt; optimizations.&lt;/li&gt;
  &lt;li&gt;Connector pushdown during planning for operations such as limit, table sample, or projections. This allows 
connectors to optimize how data is accessed before it’s provided to the Presto engine for further processing.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/06/30/dynamic-filtering.html&quot;&gt;Dynamic filtering&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Cost-Based Optimizer can now consider &lt;a href=&quot;https://github.com/trinodb/trino/pull/247&quot;&gt;estimated query peak memory&lt;/a&gt; 
footprint. This is especially useful for optimizing bigger queries, where not all parts of the query can 
be run concurrently.&lt;/li&gt;
  &lt;li&gt;Improved handling of &lt;a href=&quot;https://github.com/trinodb/trino/pull/1431&quot;&gt;projections&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/trinodb/trino/pull/864&quot;&gt;aggregations&lt;/a&gt; and &lt;a href=&quot;https://github.com/trinodb/trino/pull/1359&quot;&gt;cross joins&lt;/a&gt; 
in cost based optimizer.&lt;/li&gt;
  &lt;li&gt;Improved accounting and reporting of physical and network data read or transmitted during query processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/08/23/unnest-operator-performance-enhancements.html&quot;&gt;10x performance improvement for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;2-7x improvement in performance of &lt;a href=&quot;/blog/2019/04/23/even-faster-orc.html&quot;&gt;ORC decoders&lt;/a&gt;, resulting in a 
10% global CPU improvement for the TPC-DS benchmark.&lt;/li&gt;
  &lt;li&gt;Improvements when reading small Parquet files, files with large number of columns, or files with small row
groups. We found this very useful, for example, when working with data exported from Snowflake.&lt;/li&gt;
  &lt;li&gt;Support for new ORC bloom filters.&lt;/li&gt;
  &lt;li&gt;Remove &lt;a href=&quot;/blog/2019/06/03/redundant-order-by.html&quot;&gt;redundant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;&lt;/a&gt; clauses.&lt;/li&gt;
  &lt;li&gt;Improvements for &lt;a href=&quot;/blog/2019/06/03/redundant-order-by.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT-IN&lt;/code&gt;&lt;/a&gt; with subquery expressions (i.e., semijoin).&lt;/li&gt;
  &lt;li&gt;Huge performance improvements when &lt;a href=&quot;https://github.com/trinodb/trino/pull/1329&quot;&gt;reading from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;information_schema&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Reduce query latency and Hive metastore load, for both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; queries.&lt;/li&gt;
  &lt;li&gt;Improve metadata handling during planning. This can result in dramatic improvements in latency, 
especially for connectors such as MySQL, PostgreSQL, Redshift, SQL Server, etc. Some queries like 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW SCHEMAS&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW TABLES&lt;/code&gt; that could take several minutes to complete now finish in a few seconds.&lt;/li&gt;
  &lt;li&gt;Improved stability, performance, and security when spilling is enabled.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;functions&quot;&gt;Functions&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/array.html#combinations&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;combinations&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/conversion.html#format&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;format&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/uuid.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; type&lt;/a&gt; and related functions.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/array.html#all_match&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;all_match&lt;/code&gt;&lt;/a&gt;,
&lt;a href=&quot;/docs/current/functions/array.html#any_match&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;any_match&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&quot;/docs/current/functions/array.html#none_match&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;none_match&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Support flexible aggregation with lambda expressions using
  &lt;a href=&quot;/docs/current/functions/aggregate.html#reduce_agg&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;reduce_agg&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;New date and time functions: &lt;a href=&quot;/docs/current/functions/datetime.html#last_day_of_month&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;last_day_of_month&lt;/code&gt;&lt;/a&gt;,
&lt;a href=&quot;/docs/current/functions/datetime.html#at_timezone&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;at_timezone&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&quot;/docs/current/functions/datetime.html#with_timezone&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;with_timezone&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;security&quot;&gt;Security&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/create-role.html&quot;&gt;Role-based access control&lt;/a&gt; and related commands.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/sql/create-view.html#security&quot;&gt;INVOKER security mode&lt;/a&gt; for views, which allows views to be run using the permissions of the 
current user.&lt;/li&gt;
  &lt;li&gt;Prevent replay attacks and result hijacking in client APIs.&lt;/li&gt;
  &lt;li&gt;JWT-based &lt;a href=&quot;/docs/current/security/internal-communication.html#internal-authentication&quot;&gt;internal communication&lt;/a&gt; authentication,
which obsoletes the need to use Kerberos or certificates and greatly simplifies secure setups.&lt;/li&gt;
  &lt;li&gt;Credential passthrough, which allows Presto to authenticate with the underlying data source with 
credentials provided by the user running a query. This especially useful when dealing with
Google Storage in GCP or SQL databases that manage user authentication and authorization on 
their own.&lt;/li&gt;
  &lt;li&gt;Impersonation for &lt;a href=&quot;/docs/current/connector/hive.html#hive-thrift-metastore-configuration-properties&quot;&gt;Hive metastore&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Support for reading and writing encrypted files in HDFS using Hadoop KMS.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://trino.io/docs/current/admin/spill.html#spill-encryption&quot;&gt;encrypting spilled data&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;geospatial&quot;&gt;Geospatial&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;New geospatial functions: 
&lt;a href=&quot;/docs/current/functions/geospatial.html#ST_Points&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ST_Points&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/functions/geospatial.html#ST_Length&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ST_Length&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/functions/geospatial.html#ST_Area&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ST_Area&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&quot;/docs/current/functions/geospatial.html#line_interpolate_point&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line_interpolate_point&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&quot;/docs/current/functions/geospatial.html#line_interpolate_points&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line_interpolate_points&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SphericalGeography&lt;/code&gt; type and &lt;a href=&quot;/docs/current/functions/geospatial.html#to_spherical_geography&quot;&gt;related functions&lt;/a&gt; 
to support spatial features in geographic coordinates (latitude / longitude) using a spherical model of the earth.&lt;/li&gt;
  &lt;li&gt;Support for Google Maps Polyline format via &lt;a href=&quot;/docs/current/functions/geospatial.html#to_encoded_polyline&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;to_encoded_polyline&lt;/code&gt;&lt;/a&gt;
and &lt;a href=&quot;/docs/current/functions/geospatial.html#from_encoded_polyline&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_encoded_polyline&lt;/code&gt;&lt;/a&gt; functions.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/functions/geospatial.html#geometry_from_hadoop_shape&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;geometry_from_hadoop_shape&lt;/code&gt;&lt;/a&gt; to decode geometry objects in 
Spatial Framework for Hadoop representation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;cloud-integration&quot;&gt;Cloud Integration&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Support for Azure Data Lake Blob and ADLS Gen2 storage.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/docs/current/connector/hive-gcs-tutorial.html&quot;&gt;Google Cloud Storage&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Several &lt;a href=&quot;/blog/2019/05/06/faster-s3-reads.html&quot;&gt;performance improvements&lt;/a&gt; for AWS S3.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;cli-and-jdbc-driver&quot;&gt;CLI and JDBC Driver&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;JSON output format and improvements to CSV output format.&lt;/li&gt;
  &lt;li&gt;Support and stability improvements for running the CLI and JDBC driver with Java 11.&lt;/li&gt;
  &lt;li&gt;Improve compatibility of JDBC driver with third-party tools.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Syntax highlighting and multi-line editing.&lt;/p&gt;

    &lt;p&gt;&lt;img src=&quot;/assets/blog/2019-review/presto-cli.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;new-connectors&quot;&gt;New Connectors&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/elasticsearch.html&quot;&gt;Elasticsearch&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/googlesheets.html&quot;&gt;Google Sheets&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/kinesis.html&quot;&gt;Amazon Kinesis&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/2019/06/04/phoenix-connector.html&quot;&gt;Apache Phoenix&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/docs/current/connector/memsql.html&quot;&gt;MemSQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Apache Iceberg (preview version still under development)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;other-improvements&quot;&gt;Other Improvements&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://hub.docker.com/r/prestosql/presto&quot;&gt;Presto Docker image&lt;/a&gt; that provides an out-of-the-box single node 
  cluster with the JMX, memory, TPC-DS, and TPC-H catalogs. It can be deployed as a full cluster by 
  mounting in configuration and can be used for Kubernetes deployments.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Support for LZ4 and Zstd compression in Parquet and ORC. LZ4 is currently the recommended algorithm for fast, lightweight
compression, and Zstd otherwise.&lt;/li&gt;
  &lt;li&gt;Support for insert-only Hive transactional tables and Hive bucketing v2 as part of 
&lt;a href=&quot;/blog/2019/12/28/hive-3.html&quot;&gt;making Presto compatible with Hive 3&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Improvements in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ANALYZE&lt;/code&gt; statement for Hive connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;/blog/2019/05/29/improved-hive-bucketing.html&quot;&gt;multiple files per bucket&lt;/a&gt; 
for Hive tables. This allows inserting data into bucketed tables without having to rewrite entire partitions
and improves Presto compatibility with Hive and other tools.&lt;/li&gt;
  &lt;li&gt;Support for upper- and mixed-case table and column names in JDBC-based connectors.&lt;/li&gt;
  &lt;li&gt;New features and improvements in type mappings in PostgreSQL, MySQL, SQL Server and Redshift
connectors. This includes support for PostgreSQL arrays and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;timestamp with time zone&lt;/code&gt; type, and 
the ability to read columns of unsupported types.&lt;/li&gt;
  &lt;li&gt;Improvements in &lt;a href=&quot;https://github.com/trinodb/trino/pull/833&quot;&gt;Hive compatibility with Hive version 2.3&lt;/a&gt; 
and &lt;a href=&quot;https://github.com/trinodb/trino/pull/1937&quot;&gt;with Cloudera (CDH)’s Hive&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Connector provided view definitions, which allow connectors to generate the definition dynamically at query time. 
For example, the connector can provide a union of two tables filtered on a disjoint time range, with the cutoff 
time determined at resolution time.&lt;/li&gt;
  &lt;li&gt;Lots and lots of bug fixes!&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;coming-up&quot;&gt;Coming Up…&lt;/h1&gt;

&lt;p&gt;These are some of the projects that are currently in progress and are likely to land in the short term.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Support for pushing down row dereference expressions into connectors. This will help reduce 
the amount of data and CPU needed to process highly nested columnar formats such as ORC and Parquet.&lt;/li&gt;
  &lt;li&gt;Extend dynamic filtering to support distributed joins and other operators. Use dynamic filters for 
pruning partitions at runtime when querying Hive.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2418&quot;&gt;Extended Late Materialization&lt;/a&gt; support to queries involving 
complex correlated subqueries.&lt;/li&gt;
  &lt;li&gt;Finalize &lt;a href=&quot;/blog/2019/12/28/hive-3.html&quot;&gt;Hive 3 support&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Improved &lt;a href=&quot;https://github.com/trinodb/trino/pull/2358&quot;&gt;INSERT into partitioned tables&lt;/a&gt;, which will help with 
large ETL queries.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/issues/1324&quot;&gt;Improvements and features&lt;/a&gt; in Iceberg connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2028&quot;&gt;Pinot&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/1959&quot;&gt;Oracle&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2397&quot;&gt;Influx&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2321&quot;&gt;Prometheus&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trinodb.slack.com/archives/CQT2JH4KG/p1576038838027500&quot;&gt;Salesforce&lt;/a&gt; connector.&lt;/li&gt;
  &lt;li&gt;Support for &lt;a href=&quot;https://github.com/trinodb/trino/pull/2106&quot;&gt;Confluent registry in Kafka connector&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Revamp of the function registry and function resolution to support dynamically-resolved 
functions and SQL-defined functions.&lt;/li&gt;
  &lt;li&gt;A new &lt;a href=&quot;https://github.com/trinodb/trino/pull/2004&quot;&gt;Parquet writer&lt;/a&gt; optimized to work efficiently 
within Presto.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;… and many, many more.&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso</name>
        </author>
      

      <summary>What a great year for the Presto community! We started with the year with the launch of the Presto Software Foundation, with the long term goal of ensuring the project remains collaborative, open and independent from any corporate interest, for years to come. Since then, the community around Presto has grown and consolidated. We’ve seen contributions from more than 120 people across over 20 companies. Every week, 280 users and developers interact in the project’s Slack channel. We’d like to take the opportunity to thank everyone that contributed the project in one way or another. Presto wouldn’t be what it is without your help. With the collaboration of companies such as Starburst, Qubole, Varada, Twitter, ARM Treasure Data, Wix, Red Hat, and the Big Things community, we ran several Presto summits across the world: Tel Aviv, Israel, April 2019 San Francisco, USA, June 2019 Tokyo, Japan, July 2019 Bangalore, India, September, 2019 New York, USA, December 2019 All these events were a huge success and brought thousands of Presto users, contributors and other community members together to share their knowledge and experiences. The project has been more active than ever. We completed 28 releases comprised of more than 2850 commits in over 1500 pull requests. Of course, that alone is not a good measure of progress, so let’s take a closer look at everything that went in. And there is a lot to look at!</summary>

      
      
    </entry>
  
    <entry>
      <title>Hive 3 support in Presto</title>
      <link href="https://trino.io/blog/2019/12/28/hive-3.html" rel="alternate" type="text/html" title="Hive 3 support in Presto" />
      <published>2019-12-28T00:00:00+00:00</published>
      <updated>2019-12-28T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/12/28/hive-3</id>
      <content type="html" xml:base="https://trino.io/blog/2019/12/28/hive-3.html">&lt;p&gt;The Hive community is centered around a few different Hive distributions, one of them
being Hortonworks Data Platform (HDP). Even after the Cloudera-Hortonworks merger there
is vivid interest in HDP 3, featuring Hive 3. Presto is ready for the game.&lt;/p&gt;

&lt;p&gt;In this post, we summarize which Hive 3 features Presto already supports, covering
all the work that went into Presto to achieve that. We also outline next steps lying
ahead.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;There are several Hive versions in active use by the Hive community: 0.x, 1.x, 2.x
and 3.x. Hive 3 major release brings a number of interesting features, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;support for Hadoop Erasure Coding (EC), allowing &lt;a href=&quot;https://blog.cloudera.com/introduction-to-hdfs-erasure-coding-in-apache-hadoop/&quot;&gt;much better HDFS storage capacity
utilization&lt;/a&gt;
without reducing data availability,&lt;/li&gt;
  &lt;li&gt;update to ORC ACID transactional tables - they no longer need to be bucketed,&lt;/li&gt;
  &lt;li&gt;transactional tables for all file formats (“insert-only” except for ORC),&lt;/li&gt;
  &lt;li&gt;materialized views,&lt;/li&gt;
  &lt;li&gt;new bucketing function, offering a better data distribution and less data skew,&lt;/li&gt;
  &lt;li&gt;new timestamp semantics and timestamp-related changes in file formats,&lt;/li&gt;
  &lt;li&gt;and a lot more (let’s skip over features and changes that are not interesting from
Presto perspective).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s no surprise that many people want to try out all these features and run Hive 3,
either the Apache project’s official release or using HDP version 3.&lt;/p&gt;

&lt;h1 id=&quot;hive-3-in-presto&quot;&gt;Hive 3 in Presto&lt;/h1&gt;

&lt;p&gt;The Presto community expressed interest in using Presto with Hive 3, both in the project’s
&lt;a href=&quot;https://github.com/trinodb/trino/issues/576&quot;&gt;issues&lt;/a&gt; and on &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You spoke, we listened. Actually – we, community, spoke &lt;em&gt;and&lt;/em&gt; listened.&lt;/p&gt;

&lt;p&gt;In collaboration between Starburst, Qubole and the wider Presto community, Presto gradually
improves its compatibility with Hive 3:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto 319 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1532&quot;&gt;fixed issues with backwards-incompatible changes in Hive metastore thrift API&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Presto 320 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1614&quot;&gt;added continuous integration with Hive 3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Presto 321 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1697&quot;&gt;added support for Hive bucketing v2&lt;/a&gt;
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;bucketing_version&quot;=&quot;2&quot;&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;Presto 325 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1958&quot;&gt;added continuous integration with HDP 3’s Hive 3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Presto 327 &lt;a href=&quot;https://github.com/trinodb/trino/pull/1034&quot;&gt;added support for reading from insert-only transactional tables&lt;/a&gt;, and &lt;a href=&quot;https://github.com/trinodb/trino/pull/2099&quot;&gt;added compatibility with timestamp
values stored in ORC by Hive 3.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upcoming improvements already being worked on include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/2068&quot;&gt;Read support for ORC ACID tables&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/trinodb/trino/pull/1591&quot;&gt;Read support for bucketed ORC ACID tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;try-it-out&quot;&gt;Try it out&lt;/h1&gt;

&lt;p&gt;The &lt;a href=&quot;https://twitter.com/findepi/status/1204783485094944768&quot;&gt;amazing Presto community&lt;/a&gt; is working hard on
getting Hive 3 support fully integrated in the Presto project and a lot is already accomplished.
Chances are THAT all you need is already included in the latest release. If you need one of the upcoming
improvements, watch the pull requests linked above, the &lt;a href=&quot;https://github.com/trinodb/trino/issues/1218&quot;&gt;roadmap issue&lt;/a&gt;,
join &lt;a href=&quot;/slack.html&quot;&gt;Slack&lt;/a&gt; and stay tuned for upcoming release announcements. In the meantime, you
can try out the features today by running the &lt;a href=&quot;https://docs.starburstdata.com/latest/release/release-323-e.html&quot;&gt;323-e release&lt;/a&gt; of Starburst Presto.&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>The Hive community is centered around a few different Hive distributions, one of them being Hortonworks Data Platform (HDP). Even after the Cloudera-Hortonworks merger there is vivid interest in HDP 3, featuring Hive 3. Presto is ready for the game. In this post, we summarize which Hive 3 features Presto already supports, covering all the work that went into Presto to achieve that. We also outline next steps lying ahead.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Experiment with Graviton Processor</title>
      <link href="https://trino.io/blog/2019/12/23/Presto-Experiment-with-Graivton-Processor.html" rel="alternate" type="text/html" title="Presto Experiment with Graviton Processor" />
      <published>2019-12-23T00:00:00+00:00</published>
      <updated>2019-12-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/12/23/Presto-Experiment-with-Graivton-Processor</id>
      <content type="html" xml:base="https://trino.io/blog/2019/12/23/Presto-Experiment-with-Graivton-Processor.html">&lt;p&gt;This December, AWS announced new instance types powered by &lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2019/12/announcing-new-amazon-ec2-m6g-c6g-and-r6g-instances-powered-by-next-generation-arm-based-aws-graviton2-processors/&quot;&gt;Arm-based AWS Graviton2 Processor&lt;/a&gt;. M6g, C6g, and R6g are designed to deliver up to 40% improved price/performance compared with the current generation instance types. We can achieve cost-effectiveness by using these instance type series. Presto is just a Java application, so that we should be able to run the workload with this type of cost-effective instance type without any modification.&lt;/p&gt;

&lt;p&gt;But is it true? Initially, we do not have a clear answer to how much effort we need to bring Presto into the world of the different processors. No care about the underlying platform is generally beneficial for development. But if using different processors enables us to accelerate the performance and stability of Presto, we must care about it. We must prove anything unclear by the experiment.&lt;/p&gt;

&lt;p&gt;This article is the report to clarify what we need to do to run Presto on the Arm-based platform and see how much benefit we can potentially obtain with Graviton Processor.&lt;/p&gt;

&lt;p&gt;As the Graviton 2 based instance types are preview state, we tried to run Presto on A1 instance that has the first generation of Graviton processor inside. It still would be a helpful anchor to understand the potential benefit of the Graviton 2 processor.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;how-to-make-presto-compatible-with-arm&quot;&gt;How to make Presto compatible with Arm&lt;/h1&gt;

&lt;p&gt;We are going to build the binary of Presto supporting Arm platform first. From the results, there are not so many things to do so. As long as JVM supports the Arm platform, it should work without any modification in the application code. But Presto has some restrictions on the platform where it runs to protect the functionality, including plugins. For example, the latest Presto supports only &lt;a href=&quot;https://github.com/trinodb/trino/blob/ee05ee5221690d66598039c6e397f7c7cb4c202b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java#L69&quot;&gt;x86 and PowerPC architectures&lt;/a&gt;. This limitation prevents us from using Presto on the Arm platform.&lt;/p&gt;

&lt;p&gt;To make Presto runnable on Arm machine, we need to modify &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java&quot;&gt;PrestoSystemRequirements&lt;/a&gt; class to allow &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aarch64&lt;/code&gt; architecture and more. For experimental purposes, we can apply such a patch to remove the restriction altogether.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;diff --git a/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
index 07b7d12c64..b6a1249681 100644
--- a/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
+++ b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
@@ -71,9 +71,9 @@ final class PrestoSystemRequirements
 String osName = StandardSystemProperty.OS_NAME.value();
 String osArch = StandardSystemProperty.OS_ARCH.value();
 if (&quot;Linux&quot;.equals(osName)) {
- if (!&quot;amd64&quot;.equals(osArch) &amp;amp;&amp;amp; !&quot;ppc64le&quot;.equals(osArch)) {
- failRequirement(&quot;Presto requires amd64 or ppc64le on Linux (found %s)&quot;, osArch);
- }
 if (&quot;ppc64le&quot;.equals(osArch)) {
 warnRequirement(&quot;Support for the POWER architecture is experimental&quot;);
 }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This patch is all we have to do to run Presto on the Arm platform. It should work for most cases except for the usage with &lt;a href=&quot;https://trino.io/docs/current/connector/hive.html&quot;&gt;Hive connector&lt;/a&gt; because it has a native code not yet available for Arm platform.&lt;/p&gt;

&lt;h1 id=&quot;prepare-docker-images&quot;&gt;Prepare Docker Images&lt;/h1&gt;

&lt;p&gt;Docker container is a desirable option to run Presto experimentally due to its availability and easiness of use. But there is one thing to do to build Docker image supporting cross-platform.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://docs.docker.com/buildx/working-with-buildx/&quot;&gt;Docker buildx&lt;/a&gt; is an experimental feature for the full support of &lt;a href=&quot;https://github.com/moby/buildkit&quot;&gt;Moby BuildKit toolkit&lt;/a&gt;. It enables us to build a Docker image supporting multiple platforms, including Arm. The feature is so useful that we can quickly make the cross-platform Docker image with a one-line command. But the feature is not generally available in the typical installation of Docker. Enabling the experimental flag is necessary as follows in the case of macOS.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/docker-daemon.png&quot; alt=&quot;Docker Daemon Experimental Feature&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And make sure to restart the Docker daemon. We can build the Docker image for Presto supporting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aarch64&lt;/code&gt; architecture with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;buildx&lt;/code&gt; command. We have used the source code of &lt;a href=&quot;https://github.com/trinodb/trino/commit/b0c07249de5c70a70b3037875df4fd0477dec9fc&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;317-SNAPSHOT&lt;/code&gt;&lt;/a&gt; with the earlier patch in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PrestoSystemRequirements&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT \
 --platform linux/arm64 \
 -f presto-base/Dockerfile-aarch64 \
 -t lewuathe/presto-base:317-SNAPSHOT-aarch64 \
 presto-base --push

$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT-aarch64 \
 --platform linux/arm64 \
 -t lewuathe/presto-coordinator:317-SNAPSHOT-aarch64 \
 presto-coordinator --push

$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT-aarch64 \
 --platform linux/arm64 \
 -t lewuathe/presto-worker:317-SNAPSHOT-aarch64 \
 presto-worker --push
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We should be able to specify multiple platform names for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--platform&lt;/code&gt; option. But unfortunately, the Docker image of OpenJDK for Arm is distributed under &lt;a href=&quot;https://hub.docker.com/r/arm64v8/openjdk/&quot;&gt;the separated organization&lt;/a&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;arm64v8/openjdk&lt;/code&gt;. Building an image supporting Arm requires us another &lt;a href=&quot;https://github.com/Lewuathe/docker-presto-cluster/blob/master/presto-base/Dockerfile-aarch64&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt;&lt;/a&gt;. Anyway, Docker images containing Presto supporting Arm are now available.&lt;/p&gt;

&lt;h1 id=&quot;setup-a1-instance&quot;&gt;Setup A1 Instance&lt;/h1&gt;

&lt;p&gt;The following setup prepares the environment enough to run docker-compose on the A1 instance. &lt;a href=&quot;https://github.com/docker/compose/issues/5342&quot;&gt;As no docker-compose binary for Arm&lt;/a&gt; is distributed officially, we need to install and build docker-compose with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pip&lt;/code&gt;. Make sure to run them after the instance initialization completes.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Install Docker
$ sudo yum update -y
$ sudo amazon-linux-extras install docker -y
$ sudo service docker start
$ sudo usermod -a -G docker ec2-user

# Install docker-compose
$ sudo yum install python2-pip gcc libffi-devel openssl-devel -y
$ sudo pip install -U docker-compose
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;performance-comparison&quot;&gt;Performance Comparison&lt;/h1&gt;

&lt;p&gt;Let’s briefly take a look into how the performance provided by the Graviton processor looks like. We are going to use &lt;a href=&quot;https://aws.amazon.com/ec2/instance-types/a1/&quot;&gt;a1.4xlarge&lt;/a&gt; as a benchmark instance of Graviton processor.&lt;/p&gt;

&lt;p&gt;Here is our specification of the benchmark conditions.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;We use the commit &lt;a href=&quot;https://github.com/trinodb/trino/commit/b0c07249de5c70a70b3037875df4fd0477dec9fc&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b0c07249de5c70a70b3037875df4fd0477dec9fc&lt;/code&gt;&lt;/a&gt; + the patch previously described.&lt;/li&gt;
  &lt;li&gt;1 coordinator + 2 worker processes run by &lt;a href=&quot;https://docs.docker.com/compose/&quot;&gt;docker-compose&lt;/a&gt; on a single instance.&lt;/li&gt;
  &lt;li&gt;We use a1.4xlarge and c5.4xlarge, whose CPU core and memory are the same as a1.4xlarge. And we also compared with m5.2xlarge, whose on-demand instance cost is close to a1.4xlarge.&lt;/li&gt;
  &lt;li&gt;We use &lt;a href=&quot;https://github.com/trinodb/trino/tree/master/presto-benchto-benchmarks/src/main/resources/sql/presto/tpch&quot;&gt;q01, q10, q18, and q20&lt;/a&gt; run on the TPCH connector. Since the Presto TPCH connector does not access external storage, we can measure pure CPU performance without worrying about network variance.&lt;/li&gt;
  &lt;li&gt;We choose &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sf1&lt;/code&gt; as the scaling factor of TPCH connector&lt;/li&gt;
  &lt;li&gt;Our experiment measures the average time of 5 query runtime after 5 times warmup for every query.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;openjdk-8&quot;&gt;OpenJDK 8&lt;/h4&gt;
&lt;p&gt;Here is the result of our experiment. The vertical axis represents the running time in milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/openjdk8-performance.png&quot; alt=&quot;OpenJDK 8 Performance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It shows c5.4xlarge achieves the best performance consistently in every case. Compared with m5.2xlarge, the result was switched by the query type. a1.4xlarge and m5.2xlarge are probably competing with each other.&lt;/p&gt;

&lt;p&gt;Although we use OpenJDK 8 for this case, it might not be able to generate the code fully optimized for Arm architecture. In general, the later versions, such as &lt;a href=&quot;https://medium.com/@carlosedp/java-benchmarks-on-arm64-17edd8b9ff79&quot;&gt;OpenJDK 9 or 11, give us better performance&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;openjdk-11&quot;&gt;OpenJDK 11&lt;/h4&gt;
&lt;p&gt;Let’s try to run Presto with OpenJDK 11 again.  There is one thing to do. From JDK 9, the &lt;a href=&quot;https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8180425&quot;&gt;Attach API&lt;/a&gt; was disabled as default. We have found that we needed to allow the usage of attach API by adding the following option in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jvm.config&lt;/code&gt; file, otherwise we will see an error message at the bootstrap phase.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;-Djdk.attach.allowAttachSelf=true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here is the performance comparison with OpenJDK 11.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/openjdk11-performance.png&quot; alt=&quot;OpenJDK 11 Performance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;a1.4xlarge and c5.4xlarge achieve even higher performance than OpenJDK 8 for every case. On the contrary, m5.2xlarge shows a slower result in some cases.
While this result still demonstrates c5.4xlarge is the best instance in terms of the performance, the performance gaps between instances are smaller compared with the OpenJDK 8 cases. Especially, a1.4xlarge shows relatively competitive performance with the smaller dataset (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiny&lt;/code&gt;). How does the scaling factor influence performance? We’ll see.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/sf-comparison.png&quot; alt=&quot;Scaling Factor Comparison&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The above chart shows how performance is affected by the scaling factor. c5.4xlarge demonstrates the most stable running time, regardless of the scaling factor. If we want to achieve stable performance as much as possible, c5.4xlarge is a good option in the list. a1.4xlarge and m5.2xlarge show similar volatility against the scaling factor this time.&lt;/p&gt;

&lt;p&gt;Considering the cost of a1.4xlarge instance is 40% cheaper than c5.4xlarge, it may make sense to use a1.4xlarge for the specific case. The on-demand cost of &lt;a href=&quot;https://aws.amazon.com/ec2/pricing/on-demand/&quot;&gt;a1.4xlarge is $9.8/day and c5.4xlarge is $16.3/day for on-demand instance type&lt;/a&gt;. The public announcement says &lt;a href=&quot;https://aws.amazon.com/ec2/graviton/&quot;&gt;Graviton 2 delivers 7x performance compared to the Graviton processor&lt;/a&gt;. We may expect an even better performance by using a new generation processor. We cannot wait for the general availability of Graviton 2.&lt;/p&gt;

&lt;h4 id=&quot;amazon-corretto&quot;&gt;Amazon Corretto&lt;/h4&gt;
&lt;p&gt;How about other JVM distributions? Now we have found Amazon Corretto also supports Arm architecture, and it distributes &lt;a href=&quot;https://hub.docker.com/layers/amazoncorretto/library/amazoncorretto/11/images/sha256-8f06c4a09e6a0784d6da3fb580bd57c4881df3fc8f56de1f3c0fd66dde20e43c&quot;&gt;the Docker image built for Arm&lt;/a&gt;. Let’s try Amazon Corretto similarly.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-experiment-with-graviton-processor/a1-instance-performance.png&quot; alt=&quot;A1 Performance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This chart illustrates the performance result by different JDK implementations, OpenJDK 8, OpenJDK 11, and Amazon Corretto 11. Overall, OpenJDK 11 seems to be the best. But Amazon Corretto achieves the even better performance in some of the sf1 cases interestingly. It indicates that Presto with Amazon Corretto may provide better performance in some query types.&lt;/p&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap Up&lt;/h1&gt;

&lt;p&gt;As Presto is just a Java application, there are not so many things to do to support the Arm platform. Only applying one patch and one JVM option brings us Presto binary supporting the latest platform. It is always exciting to see a new technology used for complicated distributed systems such as Presto. The combination of cutting-edge technologies surely takes us a journey to the new horizon of technological innovation.&lt;/p&gt;

&lt;p&gt;Last but not least, we have used docker-compose and TPCH connectors to execute queries for the Presto cluster quickly in the Arm platform. Note that the performance of a distributed system such as Presto depends on various kinds of factors. Please be sure to run your benchmark carefully when you try to use a new instance type in your production environment.&lt;/p&gt;

&lt;p&gt;We have uploaded the Docker image used for this experiment publicly. Feel free to use them if you are interested in running Presto on the Arm platform.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Image for Armv8 using OpenJDK 11
$ docker pull lewuathe/presto-coordinator:327-SNAPSHOT-aarch64
$ docker pull lewuathe/presto-worker:327-SNAPSHOT-aarch64


# Image for Armv8 using Amazon Corretto 11
$ docker pull lewuathe/presto-coordinator:327-SNAPSHOT-corretto
$ docker pull lewuathe/presto-worker:327-SNAPSHOT-corretto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And also, I have raised &lt;a href=&quot;https://github.com/trinodb/trino/issues/2262&quot;&gt;an issue&lt;/a&gt; to start the discussion of supporting Arm architecture in the community. It would be great if we could get any feedback from those who are interested in it.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;</content>

      
        <author>
          <name>Kai Sasaki, Arm Treasure Data</name>
        </author>
      

      <summary>This December, AWS announced new instance types powered by Arm-based AWS Graviton2 Processor. M6g, C6g, and R6g are designed to deliver up to 40% improved price/performance compared with the current generation instance types. We can achieve cost-effectiveness by using these instance type series. Presto is just a Java application, so that we should be able to run the workload with this type of cost-effective instance type without any modification. But is it true? Initially, we do not have a clear answer to how much effort we need to bring Presto into the world of the different processors. No care about the underlying platform is generally beneficial for development. But if using different processors enables us to accelerate the performance and stability of Presto, we must care about it. We must prove anything unclear by the experiment. This article is the report to clarify what we need to do to run Presto on the Arm-based platform and see how much benefit we can potentially obtain with Graviton Processor. As the Graviton 2 based instance types are preview state, we tried to run Presto on A1 instance that has the first generation of Graviton processor inside. It still would be a helpful anchor to understand the potential benefit of the Graviton 2 processor.</summary>

      
      
    </entry>
  
    <entry>
      <title>First Presto Summit in India, Bangalore, September 2019</title>
      <link href="https://trino.io/blog/2019/09/05/Presto-Summit-Bangalore.html" rel="alternate" type="text/html" title="First Presto Summit in India, Bangalore, September 2019" />
      <published>2019-09-05T00:00:00+00:00</published>
      <updated>2019-09-05T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/09/05/Presto-Summit-Bangalore</id>
      <content type="html" xml:base="https://trino.io/blog/2019/09/05/Presto-Summit-Bangalore.html">&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/MyPost.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.qubole.com/developers/presto-on-qubole/&quot;&gt;Qubole&lt;/a&gt; organized the first ever Presto Summit in India on September 05, 2019. 
Bangalore, as the technology  and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot 
of interest and adoption in this (south asia and asia pacific) region, as was evident with the 
turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, 
on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), 
as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity.&lt;/p&gt;

&lt;p&gt;With 150 attendees from more than 75 companies,  Presto community in India was super excited and 
eager to meet and interact with Presto co-creators - &lt;a href=&quot;https://www.linkedin.com/in/traversomartin/&quot;&gt;Martin Traverso&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/dainsundstrom/&quot;&gt;Dain Sundstrom&lt;/a&gt; and
&lt;a href=&quot;https://www.linkedin.com/in/electrum/&quot;&gt;David Phillips&lt;/a&gt;, who flew down to Bangalore for this  Event.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;welcome-note-by-joydeep-sen-sarma&quot;&gt;Welcome Note by Joydeep Sen Sarma&lt;/h1&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A1895.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/joydeeps/&quot;&gt;Joydeep Sen Sarma&lt;/a&gt;, co-creator Hive and co-founder Qubole, kicked off the event by welcoming 
Presto co-creators, speakers and all the attendees. He also provided a brief historical perspective 
of Qubole’s contributions to Presto and highlighted the importance of Presto in Qubole’s customer base.&lt;/p&gt;

&lt;h1 id=&quot;keynote-by-martin-dain-and-david&quot;&gt;Keynote by Martin, Dain and David&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%201.%20Keynote%20by%20Martin%2C%20David%2C%20Dain.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/viBY8Fa3OjI&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A1911.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This was followed by the most awaited presentation of the day - 
the keynote from Martin, Dain and David. Martin took the audience through Presto’s journey - right from its birth at Facebook, 
to its growth and adoption at Facebook, and finally to the present with the formation of Presto Software Foundation 
for wider community involvement. He also highlighted some of their design choices and some mis-steps they took along the way.&lt;/p&gt;

&lt;h1 id=&quot;presto-at-grab&quot;&gt;Presto at Grab&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%202.%20Talk%20by%20Edwin%20Law%20Grab.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/0TR7Nzs8asc&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/grab-talk.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;First industry speaker of the day was &lt;a href=&quot;https://www.linkedin.com/in/edwinlawhh/&quot;&gt;Edwin Hui Hean Law&lt;/a&gt;, 
Data Engineering Lead at &lt;a href=&quot;https://www.grab.com/sg/&quot;&gt;Grab, Singapore&lt;/a&gt;. He and his team flew all the way 
from Singapore for Presto Summit - a true testament to their passion and interest in Presto. His talk 
covered Grab’s experience of using Presto on Amazon EMR followed by their migration to Presto on Qubole. 
He provided his insights on the relative pros and cons of these platforms. Final part of his talk covered his 
team’s recent experimentation with Presto on Kubernetes.&lt;/p&gt;

&lt;h1 id=&quot;read-support-for-hive-acid-tables-in-presto&quot;&gt;Read Support for Hive ACID tables in Presto&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%203.%20Talk%20by%20Shubham%20Tagra%20Qubole.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/Q2Nv18ohegA&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2023.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Next, &lt;a href=&quot;https://www.linkedin.com/in/shubham-tagra-267a5838/&quot;&gt;Shubham Tagra&lt;/a&gt;, Sr. Staff at &lt;a href=&quot;https://www.qubole.com/developers/presto-on-qubole/&quot;&gt;Qubole&lt;/a&gt;, 
presented his work on providing read support for Hive ACID tables in Presto. This has become increasingly important with the arrival of 
data privacy regulations like GDPR and CCPA that grant users “Right to erasure” and/or “Right to rectification”. 
These regulations require that organisations storing user data are obligated to delete or update user data as per user request. 
Hive ACID is a solution available in open source that addresses these problems around delete and updates. 
Shubham’s talk covered why he picked Hive ACID over other options available in open source, as well as 
details of Hive ACID and Presto integration that he added.&lt;/p&gt;

&lt;h1 id=&quot;presto-optimizations-at-zoho-corporation&quot;&gt;Presto Optimizations at Zoho Corporation&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%204.%20Talk%20by%20Praveen%20Krishna%20Zoho.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/mffX12yZTaU&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2072.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Post lunch, &lt;a href=&quot;https://www.linkedin.com/in/praveenkrishna2112/&quot;&gt;Praveen Krishna&lt;/a&gt; from &lt;a href=&quot;https://www.zohocorp.com/&quot;&gt;Zoho Corporation&lt;/a&gt;, 
presented a summary of his team’s journey with Presto. In order to serve their teams with a pretty small cluster, 
they had to optimize Presto at various levels. Praveen’s team started by analyzing various phases of query execution 
and their impact on performance. Praveen’s team optimized Presto’s planner and reduced the planning time by 
20-30% for queries involving multiple joins on wide tables. He also highlighted how they have integrated 
Apache Lucene to speed up full text search operation. After several iterations his team came up with a model 
where they maintained the Lucene index for each row group in the ORC itself. For columns with higher null ratio, 
replacing normal blocks with run length encoded blocks reduced memory consumption . With this logic implemented 
in ORC reader and Core Presto, they were able to reduce memory pressure in the cluster .&lt;/p&gt;

&lt;h1 id=&quot;presto-at-walmart-labs&quot;&gt;Presto at Walmart Labs&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%205.%20Talk%20by%20Ashish%20Tadose%20Walmart%20Labs.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/wap7Hr7P8Bo&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2092.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Second presentation in this session was from &lt;a href=&quot;https://www.linkedin.com/in/ashish-tadose-78773b22/&quot;&gt;Ashish Kumar Tadose&lt;/a&gt;, 
Principal Engineer at &lt;a href=&quot;https://www.walmartlabs.com/&quot;&gt;Walmart Labs&lt;/a&gt;. He gave an overview of how his team is 
using Presto on Google Compute Cloud (GCP). 
He highlighted the challenges associated with querying diverse data sources at Walmart and how his team has 
tackled these challenges using Presto. His talk also described how his team has implemented monitoring, auto scaling, 
caching (via Alluxio), and security policies via Ranger.&lt;/p&gt;

&lt;h1 id=&quot;presto-at-inmobi&quot;&gt;Presto at InMobi&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%206.%20Talk%20by%20Rohit%20Chatter%20InMobi.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/zEvqrAss7Iw&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2222.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Ater a coffee break, &lt;a href=&quot;https://www.linkedin.com/in/rohit-chatter-525b62/&quot;&gt;Rohit Chatter&lt;/a&gt;, CTO at &lt;a href=&quot;https://www.inmobi.com/&quot;&gt;InMobi&lt;/a&gt;, 
provided a historical perspective of how his team has migrated from Hive in private Data centers to Presto on the 
public cloud. His talk covered various aspects of how his team handles autoscaling and workload management on the cloud.&lt;/p&gt;

&lt;h1 id=&quot;presto-scheduler-changes-for-rubix&quot;&gt;Presto Scheduler Changes for Rubix&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%207.%20Talk%20by%20Garvit%20Gupta%2C%20Microsoft%20and%20Ankit%20Dixit%2C%20Qubole.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/x8xIWuQnEFs&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2258.JPG&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2248.JPG&quot; alt=&quot;&quot; /&gt;
Next, &lt;a href=&quot;https://www.linkedin.com/in/garvitg/&quot;&gt;Garvit Gupta&lt;/a&gt; from &lt;a href=&quot;http://www.microsoft.com&quot;&gt;Microsoft&lt;/a&gt; presented his work on 
Presto scheduler changes for data locality and optimized scheduling for caching engines like &lt;a href=&quot;https://www.qubole.com/rubix/&quot;&gt;RubiX&lt;/a&gt;. 
This work was done primarily as part of his internship at Qubole. This talk was co-presented 
by &lt;a href=&quot;https://www.linkedin.com/in/ankit-dixit-a725545b/&quot;&gt;Ankit Dixit&lt;/a&gt; from &lt;a href=&quot;https://www.qubole.com/developers/presto-on-qubole/&quot;&gt;Qubole&lt;/a&gt;, 
who first gave an overview of the  Rubix caching engine and its architecture.  Garvit highlighted the need for having locality as another dimension 
to be considered while assigning splits to nodes and how this led to the implementation of a new Presto scheduler. 
The new scheduling model manages to prioritize locality while ensuring a uniform distribution of workload to nodes and 
improves efficacy of any data caching framework that you would use with Presto. His talk covered the new scheduler 
changes in detail, and concluded with  performance numbers where he saw upto 9x improvement in cached/local reads with RubiX.&lt;/p&gt;

&lt;h1 id=&quot;presto-at-miq-digital&quot;&gt;Presto at MiQ Digital&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://go.qubole.com/rs/510-QPZ-296/images/Presto%20Summit%20India%20-%208.%20Talk%20by%20Rohit%20Srivastava%20MIQ.pdf&quot;&gt;Slides&lt;/a&gt;
&lt;a href=&quot;https://youtu.be/nOmI48iqlU4&quot;&gt;Video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Bangalore-2019/JE1A2274.JPG&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Final presentation of the day was from &lt;a href=&quot;https://www.linkedin.com/in/rohitsrivastava20/&quot;&gt;Rohit Srivastava&lt;/a&gt;, 
Engineering Manager at &lt;a href=&quot;http://www.wearemiq.com/&quot;&gt;MiQ Digital&lt;/a&gt;, who presented an overview of Unified Insights &amp;amp; Data 
Analytics platform at MiQ. He highlighted several challenges that his team had to overcome, such as scaling the 
team/infrastructure/company, dealing with data copies, duplication of data pre-processing and the cost and 
effort that goes into it, meeting strict SLAs etc.  He gave an overview of how using Presto on Qubole for all 
dashboarding needs with additions like standardising most of their data to be stored in the Apache Parquet format 
on S3 has helped overcome some of these challenges.&lt;/p&gt;

&lt;p&gt;In summary, first Presto Summit in India, had a great mix of  talks - some  were around Presto usage and 
experience of operating large Presto deployments across multiple clouds, while some others focussed on niche 
technical contributions around Presto scheduler changes for data locality, speeding up ORC reader, and read support for 
Hive ACID tables in Presto. Participants had interesting and engaging questions for all the speakers and in general, 
enjoyed interacting with Presto founders, other Presto users and developers in the region.&lt;/p&gt;

&lt;p&gt;Videos and slides for all talks can be found &lt;a href=&quot;https://go.qubole.com/2019-09-05---FE---Presto-Summit-19-Bangalore_Post-Summit-Videos-LP-2.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We look forward to the next Presto Summit in this region soon!&lt;/p&gt;</content>

      
        <author>
          <name>Vijay Mann, Director of Engineering, Qubole</name>
        </author>
      

      <summary>Qubole organized the first ever Presto Summit in India on September 05, 2019. Bangalore, as the technology and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot of interest and adoption in this (south asia and asia pacific) region, as was evident with the turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity. With 150 attendees from more than 75 companies, Presto community in India was super excited and eager to meet and interact with Presto co-creators - Martin Traverso, Dain Sundstrom and David Phillips, who flew down to Bangalore for this Event.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/Bangalore-2019/MyPost.png" />
      
    </entry>
  
    <entry>
      <title>Unnest Operator Performance Enhancement with Dictionary Blocks</title>
      <link href="https://trino.io/blog/2019/08/23/unnest-operator-performance-enhancements.html" rel="alternate" type="text/html" title="Unnest Operator Performance Enhancement with Dictionary Blocks" />
      <published>2019-08-23T00:00:00+00:00</published>
      <updated>2019-08-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/08/23/unnest-operator-performance-enhancements</id>
      <content type="html" xml:base="https://trino.io/blog/2019/08/23/unnest-operator-performance-enhancements.html">&lt;p&gt;Queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause are expected to have a significant performance improvement starting version 316.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;executive-summary&quot;&gt;Executive Summary&lt;/h1&gt;

&lt;p&gt;The execution plans for queries with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause contain an Unnest Operator. The previous implementation of Unnest Operator performed a deep copy on all input blocks to generate output blocks. This caused high CPU consumption and memory allocation for the operator, and impacted the performance of such queries. The impact was worse for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; queries accessing a high number of columns, or even a few columns with deeply nested schema.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We realized that the implementation can be made more efficient by avoiding copies in the Unnest Operator, if possible. Using dictionary blocks to create output blocks pointing to input elements has given us significant CPU and memory benefits by avoiding copies. The benchmark results for the new Unnest Operator implementation show more than ~10x gain in CPU time and 3x~5x gain in memory allocation.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Let’s try to understand this change with an example. At LinkedIn, the most common usage for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause is seen to be for unnesting a single array or map column. A sample query with the clause would look like the following:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnest_c1&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnest_c1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The plots below compare the performance of Unnest Operator in the previous and the current implementation for 3 different cases. Every case evaluates the Unnest Operator performance for a query like the above, on a table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; with two columns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c0&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c1&lt;/code&gt;. For all the 3 cases, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c0&lt;/code&gt; is a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; type column. But the nested column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;c1&lt;/code&gt; is of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MAP(VARCHAR, VARCHAR)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(ROW(VARCHAR, VARCHAR, VARCHAR))&lt;/code&gt; types respectively. All the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; elements in both the columns have length 50, and the arrays in the second column have lengths distributed uniformly between 0 and 300.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We used JMH &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/test/java/io/prestosql/operator/BenchmarkUnnestOperator.java&quot;&gt;benchmark&lt;/a&gt; to measure the performance of the queries in terms of CPU time and memory allocations per operation. An “operation” (for the purposes of this measurement) is defined as the processing of 10,000 rows by an unnest operator.
These results reflect the speedup of the operator and may not extend to the overall query execution.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-cpu.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The figure above compares the CPU times before and after the enhancements. For the three cases, we see that every operation finishes more than 10x faster. The new implementation removes the need of copying data for output block generation in this case, giving us significant CPU time savings.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-memory.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The figure above compares the memory allocation per operation before and after the enhancement. The new Unnest Operator implementation does not allocate new large memory chunks for output blocks. Instead, it uses integer typed pointers pointing to input block elements, which results in smaller memory allocations than creating new VARCHAR blocks. This brings down the allocation rate by 3x-5x in this example.&lt;/p&gt;

&lt;p&gt;Let’s dig into the design and implementation details.&lt;/p&gt;

&lt;h1 id=&quot;background&quot;&gt;Background&lt;/h1&gt;

&lt;p&gt;An Operator in Presto performs a step of computation on data. The local execution plan for a task involves pipelines of operators. Operators process pages coming from the previous Operator in the pipeline, and produce output pages for the next one. Code for an Operator has to be efficient, since it may be evaluated billions of times for a single query.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt; is made of a set of blocks storing data for different columns. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; is one of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Block&lt;/code&gt; implementations in Presto. The elements in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; are represented using an integer array (called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ids&lt;/code&gt;) and a reference to another block. The values in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ids&lt;/code&gt; array represent elements of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; by pointing to element indices in the referenced block. DictionaryBlocks are useful to perform more efficient encoding of columns with duplicates.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The Unnest Operator was implemented before the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; was added. We saw an opportunity to enhance the performance of this Operator by using DictionaryBlocks. A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; can enable the Unnest Operator to reuse already constructed input blocks. Using DictionaryBlock for the Unnest operator eliminates the need for expensive copies and results in significant compute and memory savings.&lt;/p&gt;

&lt;h1 id=&quot;design&quot;&gt;Design&lt;/h1&gt;

&lt;p&gt;Consider the following &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; query on a table with one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt; type and one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt; type columns.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-input-data.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_position&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;positions_held&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_position&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-output-data.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Elements of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt; column are replicated while we unnest elements in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;positions_held&lt;/code&gt; column. In this example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt; is a “replicated column”, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;positions_held&lt;/code&gt;  will be referred to as an “unnested column”.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Multiple unnest columns are also allowed (eg. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST(positions_held, company_name) AS U(unnested_position, unnested_company)&lt;/code&gt;), but that case is not that common. It requires special handling, and we talk about  that &lt;a href=&quot;#dealing-with-multiple-unnest-columns&quot;&gt;later&lt;/a&gt; in the post.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the old design, an element from a replicated column would get copied over &lt;em&gt;n&lt;/em&gt; times for building the output, where &lt;em&gt;n&lt;/em&gt; is the cardinality of the element in the unnest column. For example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Alice&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Bob&lt;/code&gt; will be copied 2 and 3 times respectively. In the new design, the output block will contain &lt;em&gt;n&lt;/em&gt; pointers to the element in the input block, without actually copying. It will store a reference to the input block as well. The benefits here are proportional to the replicated column element sizes. &lt;em&gt;The bigger the element size, the greater the speedup.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-replicate-name.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Unnest columns are handled the same way. The previous design would copy them over one by one. This becomes CPU intensive and requires new memory allocations, especially in case of deeply nested columns, since a deep copy is required. In the new design, we try to use pointers instead of copies in most of the cases. The following figure shows the output block structure of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_positions&lt;/code&gt; column in the query above, for the old and the new implementation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-unnest-positions.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The indices in the output block &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B3&lt;/code&gt; shown above are strictly increasing starting from 0, but that is not always the case. The same input block can be used to generate multiple output blocks, with a different set of indices. Another interesting scenario is when multiple columns are being unnested. In that case, the output may require null appends because of the difference in cardinalities. We look for null elements in the input block and use their indices for handling the null-appends. If that is not possible, we have to fall back to copying data. We discuss this in more detail in the next section.&lt;/p&gt;

&lt;h1 id=&quot;implementation-challenges&quot;&gt;Implementation Challenges&lt;/h1&gt;

&lt;h4 id=&quot;extracting-input-from-nested-blocks&quot;&gt;Extracting Input from Nested Blocks&lt;/h4&gt;

&lt;p&gt;Data in the input unnest columns is represented in terms of nested structures (eg. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ArrayBlock&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MapBlock&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RowBlock&lt;/code&gt;), which creates a layer of indirection on top of the actual element blocks. For the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;positions_held&lt;/code&gt; column from the example above, the input block is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ArrayBlock&lt;/code&gt;, that contains:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;offset information for representing arrays in every row&lt;/li&gt;
  &lt;li&gt;actual data in the form of an underlying element block storing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt;s.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;p&gt;For building an output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt;, we create pointers to this underlying block. While processing entries from input array block, array offsets are translated to indices of the underlying block. Similar translation has been implemented for unnest columns with array type, map type and array of row type. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ColumnarMap&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ColumnarArray&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ColumnarRow&lt;/code&gt; structures are used for enabling such translation of indices.&lt;/p&gt;

&lt;h4 id=&quot;dealing-with-multiple-unnest-columns&quot;&gt;Dealing with Multiple Unnest Columns&lt;/h4&gt;

&lt;p&gt;When there are more than one nested columns in a table, a user may want to unnest multiple columns in the same query. Consider a table &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; with 3 columns: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;schools_attended&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;graduation_dates&lt;/code&gt;. They have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VARCHAR&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ARRAY(VARCHAR)&lt;/code&gt; types respectively. Every row in this table indicates schools attended and corresponding graduation dates for a person. Let’s say a user wants to unnest the contents of the two array columns into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_school&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_graduation_date&lt;/code&gt;.&lt;/p&gt;

&lt;!--more--&gt;
&lt;p&gt;One naive way of doing that is using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause twice, on the two different columns. This translates to two different &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNNEST&lt;/code&gt; operators (as shown in the query below) with a single unnest column producing two independent cross joins, and the execution will proceed the way we discussed earlier. This query structure is not very helpful, since we get blown up cross joined data.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schools_attended&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graduation_dates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;!--more--&gt;
&lt;p&gt;The correct way of unnesting the two columns is using them in the same unnest clause, as shown below.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UNNEST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schools_attended&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graduation_dates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unnested_school&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unnested_graduation_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The arrays/maps being unnested in multiple columns can have different cardinalities. In this example, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;graduation_date&lt;/code&gt; value for the last school may not be present, if the user has not yet graduated. Null elements need to be appended to the output unnest columns in such cases.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In the example data shown below, a NULL element is appended in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unnested_graduation_date&lt;/code&gt; column since the array in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;graduation_dates&lt;/code&gt; column is shorter than that in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;schools_attended&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/unnest-operator-dictionary-block/unnest-blogpost-corner-case.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Since we are using a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; for building the unnest output column, appending a null gets slightly tricky. How do we create a pointer for representing a NULL? The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; implementation, as of now, does not have a way to represent null elements. In such cases, we first check for existence of a null element in the input block. If we find a NULL element there, we use the index of that element while appending NULLs in the output. Otherwise we copy elements from the input to create a new output block, like we used to do in the previous implementation.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;In cases with multiple columns, the length of arrays/maps are usually the same, and misalignments are not that frequent. Having said that, misalignments can result in copying of data while building output blocks if NULL elements are not present in the input. This may reduce the CPU and memory savings (even increase the average memory allocation in some cases), but this specific case is not common.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future Work&lt;/h1&gt;

&lt;p&gt;Performance for the queries with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CROSS JOIN UNNEST&lt;/code&gt; clause can be further improved through the following optimizations.&lt;/p&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;While unnesting a deeply nested column of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;array(row(.....))&lt;/code&gt;, the user is often interested in a small subset of fields from the row. Such cases can benefit from optimization of the logical plan through the pushdown of dereference projections. There are ongoing efforts in the community in this direction.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;The dictionary blocks created in the discussed implementation use the input block as a reference. What happens if the input itself is a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt;? We end up with two levels of dereferencing. Such cases can be further optimized by collapsing the multiple indirections into a single one.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;The common case for unnest column does not involve any NULL appends. The unnested output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; in this case represents a range over the input block. We can avoid the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; creation by using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getRegion&lt;/code&gt; method on the input block.&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;ul&gt;
  &lt;li&gt;For variable-width and complex columns, usage of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DictionaryBlock&lt;/code&gt; can be beneficial in terms of CPU and memory. This may be overkill for primitive types (booleans or integers) and we might be better off copying rather than creating a dictionary block. Selectively choosing to use dictionary blocks based on the type can be helpful.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;LinkedIn’s data ecosystem makes heavy use of tables with deeply nested columns, and this change is beneficial for handling Presto queries on such tables. In our internal experiments with production data, we have seen queries perform up to ~9x faster with as much as ~13x less cpu usage.&lt;/p&gt;

&lt;p&gt;We look forward to people in the community trying this out starting with the 316 release. We would love to hear others’ observations of performance after this change. Feel free to reach out to me over &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;slack&lt;/a&gt; (handle @padesai)  or &lt;a href=&quot;https://www.linkedin.com/in/pratham-desai/&quot;&gt;LinkedIn&lt;/a&gt; with questions or feedback.&lt;/p&gt;</content>

      
        <author>
          <name>Pratham Desai, LinkedIn</name>
        </author>
      

      <summary>Queries with CROSS JOIN UNNEST clause are expected to have a significant performance improvement starting version 316.</summary>

      
      
    </entry>
  
    <entry>
      <title>A Report of First Ever Presto Conference Tokyo</title>
      <link href="https://trino.io/blog/2019/07/11/report-for-presto-conference-tokyo.html" rel="alternate" type="text/html" title="A Report of First Ever Presto Conference Tokyo" />
      <published>2019-07-11T00:00:00+00:00</published>
      <updated>2019-07-11T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/07/11/report-for-presto-conference-tokyo</id>
      <content type="html" xml:base="https://trino.io/blog/2019/07/11/report-for-presto-conference-tokyo.html">&lt;p&gt;Nowadays, Presto is getting much attraction from the various kind of companies all around 
the world. Japan is not an exception. Many companies are using Presto as their primary data 
processing engine.&lt;/p&gt;

&lt;p&gt;To keep in touch with each other among the community members in Japan, we have just held the 
first ever Presto conference in Tokyo with welcoming Presto creators, &lt;a href=&quot;https://github.com/dain&quot;&gt;Dain Sundstrom&lt;/a&gt;, 
&lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt; and &lt;a href=&quot;https://github.com/electrum&quot;&gt;David Phillips&lt;/a&gt;. 
The conference was hosted at the Tokyo office of &lt;a href=&quot;https://www.treasuredata.com/&quot;&gt;Arm Treasure Data&lt;/a&gt;. 
This article is the summary of the conference aiming to convey the excitement in the room.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/presto-conference-tokyo/overall-view.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-current-and-future&quot;&gt;Presto: Current and Future&lt;/h1&gt;

&lt;p&gt;First of all, Presto creators introduced their work in these days and software foundation 
launched in the last year. They covered the following changes and enhancements achieved by 
the community recently.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Presto Software Foundation&lt;/li&gt;
  &lt;li&gt;New Connectors
    &lt;ul&gt;
      &lt;li&gt;Phoenix&lt;/li&gt;
      &lt;li&gt;Elasticsearch&lt;/li&gt;
      &lt;li&gt;Apache Ranger&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Attendees can also learn several plans that will happen shortly.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The plan to support more complex pushdown to connectors&lt;/li&gt;
  &lt;li&gt;Case-sensitive identifier&lt;/li&gt;
  &lt;li&gt;Timestamp semantics&lt;/li&gt;
  &lt;li&gt;Dynamic filtering&lt;/li&gt;
  &lt;li&gt;Connectors such as Iceberg, Kinesis, Druid.&lt;/li&gt;
  &lt;li&gt;Coordinator high availability&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;reading-the-source-code-of-presto&quot;&gt;Reading The Source Code of Presto&lt;/h1&gt;

&lt;p&gt;To make attendees get used to the technical talk about Presto in the conference, 
&lt;a href=&quot;https://github.com/xerial&quot;&gt;Leo&lt;/a&gt; provided a guide for walking around the source code of 
Presto code. Since the Presto source code repository is enormous, it must be helpful as 
a leader to help developers explore the forest of the codebase.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/vTpEZFzu03tVhv&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/taroleo/reading-the-source-code-of-presto&quot; title=&quot;Reading The Source Code of Presto&quot; target=&quot;_blank&quot;&gt;Reading The Source Code of Presto&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo&quot; target=&quot;_blank&quot;&gt;Taro L. Saito&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;h1 id=&quot;presto-at-arm-treasure-data&quot;&gt;Presto At Arm Treasure Data&lt;/h1&gt;

&lt;p&gt;Then &lt;a href=&quot;https://github.com/Lewuathe&quot;&gt;Kai&lt;/a&gt; (it’s me) provides an overview of how Arm Treasure 
Data uses Presto in their service. Presto is heavily used to support many enterprise use 
cases, including IoT data analysis, and it is becoming the hub component processing high 
throughput workload from many kinds of clients such as Spark, ODBC and JDBC.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/cVfDINF85hx0Vx&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/taroleo/presto-at-arm-treasure-data-2019-updates&quot; title=&quot;Presto At Arm Treasure Data - 2019 Updates&quot; target=&quot;_blank&quot;&gt;Presto At Arm Treasure Data - 2019 Updates&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo&quot; target=&quot;_blank&quot;&gt;Taro L. Saito&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;h1 id=&quot;large-scale-migration-from-hive-to-presto-in-yahoo-japan&quot;&gt;Large Scale Migration from Hive to Presto in Yahoo! JAPAN&lt;/h1&gt;

&lt;p&gt;We could learn how hard to migrate large scale workload from Hive to Presto from the 
presentation given by &lt;a href=&quot;https://github.com/oneonestar&quot;&gt;Star&lt;/a&gt; from Yahoo! Japan. Quite a few people 
seem to be interested in the tool they have created to convert HiveQL into Presto SQL. They might 
have faced the same type of challenges.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/ld3tI0uIzAQe1&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/techblogyahoo/large-scale-migration-fromhive-to-presto-at-yahoo-japan&quot; title=&quot;Large scale migration fromHive to Presto at Yahoo! JAPAN&quot; target=&quot;_blank&quot;&gt;Large scale migration fromHive to Presto at Yahoo! JAPAN&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/techblogyahoo&quot; target=&quot;_blank&quot;&gt;Yahoo!デベロッパーネットワーク&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;h1 id=&quot;presto-at-line&quot;&gt;Presto At LINE&lt;/h1&gt;

&lt;p&gt;LINE is the biggest company providing the mobile communication tool in Japan (say WhatsApp in Japan). 
&lt;a href=&quot;https://github.com/wyukawa&quot;&gt;Wataru Yukawa&lt;/a&gt;, &lt;a href=&quot;https://github.com/ebyhr&quot;&gt;Yuya Ebihara&lt;/a&gt; gave us how 
they can improve their platform with collaborating with the community. We could find difficulty 
and challenge primarily provoked by the dependencies on other Hadoop ecosystems such as HDFS and Spark.&lt;/p&gt;

&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/Hx9oz6Pi1su5rj&quot; width=&quot;440&quot; height=&quot;330&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/wyukawa/presto-conferencetokyo2019&quot; title=&quot;Presto conferencetokyo2019&quot; target=&quot;_blank&quot;&gt;Presto conferencetokyo2019&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/wyukawa&quot; target=&quot;_blank&quot;&gt;wyukawa &lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;One notable thing in the session was the question about the discussion of how to make the error 
message excellent provided by Presto. David and creators are genuinely caring about the error message 
shown by the system. To reduce the time consumed to deal with the inquiry about the error, improving 
the error message is one of the best options. That’s the primary reason to maintain the error message 
easy to understand.&lt;/p&gt;

&lt;h1 id=&quot;qa-session&quot;&gt;Q&amp;amp;A Session&lt;/h1&gt;

&lt;p&gt;At the end of the conference, attendees got a chance to freely ask Presto creators about a bunch of 
topics not only Presto technical thing but also their working style, or thoughts. Here is a part of 
the list of Q&amp;amp;A talked at the conference.&lt;/p&gt;

&lt;p&gt;Q: What do you expect most from Japan community?&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Considering the communication in the Israel community, gaining the diversity of the use case will make 
Presto better. We are expecting that kind of diversity. Japan surely has a unique community to solve 
the difficulty. Having a Japanese slack channel might be a good idea to help each other :)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Q: How do you review the pull request code? How to keep the quality of the code review process?&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Code review difficulty depends on the complexity of PR itself. We use IntelliJ extensively to read 
the code base. There are mainly two things to keep the code review quality. One is that involving 
the actual code review will make you a good reviewer. Another thing is automating minor checks 
such as code style. These things are helpful to keep the code review process functional.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;Make it readable is the most important thing in the Presto codebase.&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;Do not use the abbreviation and slang because not everyone can understand these words at a glance&lt;/li&gt;
    &lt;li&gt;Write comment -&amp;gt; Write code -&amp;gt; Delete comment. That is the process to make the code readable itself.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Q: SQL on Everything approach vs. pursuing the performance. Which direction should Presto move forward?&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;It depends on the community decision. However, along with the discussion with several companies 
in the community, even not a single company does not show much concern about the performance of Presto.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1 id=&quot;wrap-up&quot;&gt;Wrap Up&lt;/h1&gt;

&lt;p&gt;This conference was the first ever Presto conference inviting the Presto creators in Tokyo. We were
able to have an exciting discussion with the community developers and creators. One of the great 
things we could find in the conference was the enthusiasm of creators to make Presto usable 
by every developer. They are genuinely caring about the error message checked by users, code 
quality read by developers. Thanks to this type of good usability from the viewpoint of both 
users and developers, Presto keeps gaining attraction from the community.&lt;/p&gt;

&lt;p&gt;That was a great time to have many conversations with the community members. We really appreciate 
developers in the community and creators. Thank you so much for coming to the conference and see 
you next time!&lt;/p&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://techplay.jp/event/733772&quot;&gt;Presto Conference Tokyo 2019&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo/reading-the-source-code-of-presto&quot;&gt;Reading The Source Code of Presto&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/taroleo/presto-at-arm-treasure-data-2019-updates&quot;&gt;Presto At Arm Treasure Data - 2019 Updates&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/techblogyahoo/large-scale-migration-fromhive-to-presto-at-yahoo-japan&quot;&gt;Large Scale Migration from Hive to Presto in Yahoo! JAPAN&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.slideshare.net/wyukawa/presto-conferencetokyo2019&quot;&gt;Presto At LINE&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Kai Sasaki, Arm Treasure Data</name>
        </author>
      

      <summary>Nowadays, Presto is getting much attraction from the various kind of companies all around the world. Japan is not an exception. Many companies are using Presto as their primary data processing engine. To keep in touch with each other among the community members in Japan, we have just held the first ever Presto conference in Tokyo with welcoming Presto creators, Dain Sundstrom, Martin Traverso and David Phillips. The conference was hosted at the Tokyo office of Arm Treasure Data. This article is the summary of the conference aiming to convey the excitement in the room.</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/presto-conference-tokyo/overall-view.jpg" />
      
    </entry>
  
    <entry>
      <title>Introduction to Trino Cost-Based Optimizer</title>
      <link href="https://trino.io/blog/2019/07/04/cbo-introduction.html" rel="alternate" type="text/html" title="Introduction to Trino Cost-Based Optimizer" />
      <published>2019-07-04T00:00:00+00:00</published>
      <updated>2019-07-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/07/04/cbo-introduction</id>
      <content type="html" xml:base="https://trino.io/blog/2019/07/04/cbo-introduction.html">&lt;p&gt;Last edited 15 June 2022: Update to use the Trino project name.&lt;/p&gt;

&lt;p&gt;The Cost-Based Optimizer (CBO) in Trino achieves stunning results in industry
standard benchmarks (and not only in benchmarks)! The CBO makes decisions based
on several factors, including shape of the query, filters and table statistics.
I would like to tell you more about what the table statistics are in Trino and
what information can be derived from them.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;This post was originally published at &lt;a href=&quot;https://www.starburstdata.com/technical-blog/introduction-to-presto-cost-based-optimizer/&quot;&gt;Starburst Data Engineering
Blog&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;background&quot;&gt;Background&lt;/h1&gt;

&lt;p&gt;Before diving deep into how Trino analyzes statistics, let’s set up a stage so
that our considerations are framed in some context. Let’s consider a Data
Scientist who wants to know which customers spend most dollars with the
company, based on history of orders (probably to offer them some discounts).
They would probably fire up a query like this:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lineitem&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, Trino needs to create an execution plan for this query. It does so by
first transforming a query to a plan in the simplest possible way — here it
will create CROSS JOINS for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FROM customer c, orders o, lineitem l&lt;/code&gt; part of the
query and FILTER for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE c.custkey = o.custkey AND l.orderkey = o.orderkey&lt;/code&gt;.
The initial plan is very naïve — CROSS JOINS will produce humongous amounts of
intermediate data. There is no point in even trying to execute such a plan and
Trino won’t do that. Instead, it applies transformation to make the plan more
what user probably wanted, as shown below. Note: for succinctness, only part of
the query plan is drawn, without aggregation (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt;) and sorting (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER
BY&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-eliminate-cross-join.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Indeed, this is much better than the CROSS JOINS. But we can do even better, if
we consider &lt;em&gt;cost&lt;/em&gt;.&lt;/p&gt;

&lt;h1 id=&quot;cost-based-optimizer&quot;&gt;Cost-Based Optimizer&lt;/h1&gt;

&lt;p&gt;Without going into database internals on how JOIN is implemented, let’s take
for granted that it makes a big difference which table is right and which is
left in the JOIN. (Simple explanation would be that the table on the right
basically needs to be kept in the memory while JOIN result is calculated).
Because of that, the following plans produce same result, but may have
different execution time or memory requirements.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-join-flip.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;CPU time, memory requirements and network bandwidth usage are the three
dimensions that contribute to query execution time, both in single query and
concurrent workloads. These dimensions are captured as the &lt;em&gt;cost&lt;/em&gt; in Trino.&lt;/p&gt;

&lt;p&gt;Our Data Scientist knows that most of the customers made at least one order and
every order had at least one item (and many orders had many items), so
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lineitem&lt;/code&gt; is the biggest table, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; is medium and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; is the
smallest. When joining &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt;, having &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; on the right
side of the JOIN is not a good idea! However, how the planner can know that? In
the real world, the query planner cannot reliably deduce information just from
table names. This is where table statistics kick in.&lt;/p&gt;

&lt;h2 id=&quot;table-statistics&quot;&gt;Table statistics&lt;/h2&gt;

&lt;p&gt;Trino has &lt;a href=&quot;https://trino.io/docs/current/develop/connectors.html&quot;&gt;connector-based
architecture&lt;/a&gt;. A
connector can provide &lt;a href=&quot;https://trino.io/docs/current/optimizer/statistics.html&quot;&gt;table and column
statistics&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;number of rows in a table,&lt;/li&gt;
  &lt;li&gt;number of distinct values in a column,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values in a column,&lt;/li&gt;
  &lt;li&gt;minimum/maximum value in a column,&lt;/li&gt;
  &lt;li&gt;average data size for a column.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, if some information is missing — e.g. average text length in a
varchar column is unknown — a connector can still provide other information and
Cost-Based Optimizer will be able to use that.&lt;/p&gt;

&lt;p&gt;In our Data Scientist’s example, data sizes can look something like the
following:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-data-table-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Having this knowledge, &lt;a href=&quot;https://trino.io/docs/current/optimizer/cost-based-optimizations.html&quot;&gt;Trino’s Cost-Based
Optimizer&lt;/a&gt;
will come up with completely different join ordering in the plan.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-cbo-results.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;filter-statistics&quot;&gt;Filter statistics&lt;/h2&gt;

&lt;p&gt;As we saw, knowing the sizes of the tables involved in a query is fundamental
to properly reordering the joins in the query plan. However, knowing just the
sizes is not enough. Returning to our example, the Data Scientist might want to
drill down into results of their previous query, to know which customers
repeatedly bought and spent most money on a particular item (clearly, this must
be some consumable, or a mobile phone). For this, they will use almost
identical query as the original one, adding one more condition.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lineitem&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderkey&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;106170&lt;/span&gt;                              &lt;span class=&quot;c1&quot;&gt;--- additional condition&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The additional FILTER might be applied after the JOIN or before. Obviously,
filtering as early as possible is the best strategy, but this also means the
actual size of the data involved in the JOIN will be different now. In our Data
Scientist’s example, the join order will indeed be different.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/trino-cbo-results-with-filter.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;under-the-hood&quot;&gt;Under the Hood&lt;/h1&gt;

&lt;h2 id=&quot;execution-time-and-cost&quot;&gt;Execution Time and Cost&lt;/h2&gt;

&lt;p&gt;From external perspective, only three things really matter:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;execution time,&lt;/li&gt;
  &lt;li&gt;execution cost (in dollars),&lt;/li&gt;
  &lt;li&gt;ability to run (sufficiently) many concurrent queries at a time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The execution time is often called “wall time” to emphasize that we’re not
really interested in “CPU time” or number of machines/nodes/threads involved.
Our Data Scientist’s clock on the wall is the ultimate judge. It would be nice
if they were not forced to get coffee/eat lunch during each query they run. On
the other hand, a CFO will be interested in keeping cluster costs at the lowest
possible level (without, of course, impeding employees’ effectiveness). Lastly,
a System Administrator needs to ensure that all cluster users can work at the
same time. That is, that the cluster can handle many queries at a time,
yielding enough throughput that “wall time” observed by each of the users is
satisfactory.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/under-the-hood.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It is possible to optimize for only one of the above dimensions. For example,
we can have single node cluster and CFO will be happy (but employees will go
somewhere else). Contrarily, we may have thousand node cluster even if the
company cannot afford that. Users will be (initially) happy, until the company
goes bankrupt. Ultimately, however, we need to balance these trade-offs, which
basically means that queries need to be executed as fast as possible, with as
little resources as possible.&lt;/p&gt;

&lt;p&gt;In Trino, this is modeled with the concept of the cost, which captures
properties like CPU cost, memory requirements and network bandwidth usage.
Different variants of a query execution plan are explored, assigned a cost and
compared. The variant with the least overall cost is selected for execution.
This approach neatly balances the needs of cluster users, administrators and
the CFO.&lt;/p&gt;

&lt;p&gt;The cost of each operation in the query plan is calculated in a way appropriate
for the type of the operation, taking into account statistics of the data
involved in the operation. Now, let’s see where the statistics come from.&lt;/p&gt;

&lt;h2 id=&quot;statistics&quot;&gt;Statistics&lt;/h2&gt;

&lt;p&gt;In our Data Scientist’s example, the row counts for tables were taken directly
from table statistics, i.e. provided by a connector. But where did “~3K rows”
come from? Let’s dive into some nitty-gritty details.&lt;/p&gt;

&lt;p&gt;A query execution plan is made of “building block” operations, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;table scans (reading the table; at runtime this is actually combined with a
filter)&lt;/li&gt;
  &lt;li&gt;filters (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WHERE&lt;/code&gt; clause or any other conditions deduced by the query
planner)&lt;/li&gt;
  &lt;li&gt;projections (i.e. computing output expressions)&lt;/li&gt;
  &lt;li&gt;joins&lt;/li&gt;
  &lt;li&gt;aggregations (in fact there are a few different “building blocks” for
aggregations, but that’s a story for another time)&lt;/li&gt;
  &lt;li&gt;sorting (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;limiting (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;sorting and limiting combined (SQL’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY .. LIMIT ..&lt;/code&gt; deserves
specialized support)&lt;/li&gt;
  &lt;li&gt;and a lot more!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The way how the statistics are computed for most interesting “building blocks”
is discussed below.&lt;/p&gt;

&lt;h2 id=&quot;table-scan-statistics&quot;&gt;Table Scan statistics&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/table-scan-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As explained in “Table statistics” section, the connector which defines the
table is responsible for providing the table statistics. Furthermore, the
connector will be informed about any filtering conditions that are to be
applied to the data read from the table. This may be important e.g. in the case
of Hive partitioned table, where statistics are stored on per-partition basis.
If the filtering condition excludes some (or many) partitions, the statistics
will consider smaller data set (remaining partitions) and will be more
accurate.&lt;/p&gt;

&lt;p&gt;To recall, a connector can provide the following table and column statistics:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;number of rows in a table,&lt;/li&gt;
  &lt;li&gt;number of distinct values in a column,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values in a column,&lt;/li&gt;
  &lt;li&gt;minimum/maximum value in a column,&lt;/li&gt;
  &lt;li&gt;average data size for a column.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;filter-statistics-1&quot;&gt;Filter statistics&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/filter-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When considering a filtering operation, a filter’s condition is analyzed and
the following estimations are calculated:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;what is the probability that data row will pass the filtering condition. From
this, expected number of rows after the filter is derived,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values for columns involved in the filtering condition (for
most conditions, this will simply be 0%),&lt;/li&gt;
  &lt;li&gt;number of distinct values for columns involved in the filtering condition,&lt;/li&gt;
  &lt;li&gt;number of distinct values for columns that were not part of the filtering
condition, if their original number of distinct values was more than the
expected number of data rows that pass the filter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, for a condition like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item = 106170&lt;/code&gt; we can observe that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;no rows with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item&lt;/code&gt; being &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; will meet the condition,&lt;/li&gt;
  &lt;li&gt;there will be only one distinct value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item&lt;/code&gt; (106170) after the
filtering operation,&lt;/li&gt;
  &lt;li&gt;on average, number of data rows expected to pass the filter will be equal to
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;number_of_input_rows * fraction_of_non_nulls / distinct_values&lt;/code&gt;. (This
assumes, of course, that users most often drill down in the data they really
have, which is quite a reasonable assumption and also safe to make).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;projection-statistics&quot;&gt;Projection statistics&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/cbo-introduction/projection-statistics.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Projections (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;l.item – 1 AS iid&lt;/code&gt;) are similar to filters, except that, of
course, they do not impact the expected number of rows after the operation.&lt;/p&gt;

&lt;p&gt;For a projection, the following types of column statistics are calculated (if
possible for given projection expression):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;number of distinct values produced by the projection,&lt;/li&gt;
  &lt;li&gt;fraction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; values produced by the projection,&lt;/li&gt;
  &lt;li&gt;minimum/maximum value produced by the projection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Naturally, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iid&lt;/code&gt; is only returned to the user, then these statistics are not
useful. However, if it’s later used in filter or join operation, these
statistics are important to correctly estimate the number of rows that meet the
filter condition or are returned from the join.&lt;/p&gt;

&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;Summing up, Trino’s Cost-Based Optimizer is conceptually a very simple thing.
Alternative query plans are considered, the best plan is chosen and executed.
Details are not so simple, though. Fortunately, to use
&lt;a href=&quot;https://trino.io/&quot;&gt;Trino&lt;/a&gt;, one doesn’t need to know all these details.
Of course, anyone with a technical inclination that like to wander in database
internals is invited to study &lt;a href=&quot;https://github.com/trinodb/trino&quot;&gt;the Trino code&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Enabling Trino CBO is really simple:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimizer.join-reordering-strategy=AUTOMATIC&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;join-distribution-type=AUTOMATIC&lt;/code&gt; in your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.properties&lt;/code&gt;,&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://trino.io/docs/current/sql/analyze.html&quot;&gt;analyze&lt;/a&gt; your tables,&lt;/li&gt;
  &lt;li&gt;no, there is no third step. That’s it!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Take Trino CBO for a spin today and let us know about &lt;em&gt;your&lt;/em&gt; Trino
experience!&lt;/p&gt;

&lt;p&gt;□&lt;/p&gt;</content>

      
        <author>
          <name>Piotr Findeisen, Starburst Data</name>
        </author>
      

      <summary>Last edited 15 June 2022: Update to use the Trino project name. The Cost-Based Optimizer (CBO) in Trino achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Trino and what information can be derived from them.</summary>

      
      
    </entry>
  
    <entry>
      <title>Dynamic filtering for highly-selective join optimization</title>
      <link href="https://trino.io/blog/2019/06/30/dynamic-filtering.html" rel="alternate" type="text/html" title="Dynamic filtering for highly-selective join optimization" />
      <published>2019-06-30T00:00:00+00:00</published>
      <updated>2019-06-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/30/dynamic-filtering</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/30/dynamic-filtering.html">&lt;p&gt;By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;In the highly-selective join scenario, most of the probe-side rows are dropped immediately after being read, since they 
don’t match the join criteria.&lt;/p&gt;

&lt;p&gt;Our idea was to extend Presto’s predicate pushdown support from the planning phase to run-time, in order to skip reading 
the non-relevant rows from &lt;a href=&quot;https://www.slideshare.net/OriReshef/presto-for-apps-deck-varada-prestoconf&quot;&gt;our connector&lt;/a&gt; 
into Presto&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot; role=&quot;doc-noteref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. It should allow much faster joins, when the build-side scan results in a low-cardinality table:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-filtering/dynamic-filtering.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The approach above is called “dynamic filtering”, and there is &lt;a href=&quot;https://github.com/trinodb/trino/issues/52&quot;&gt;an ongoing effort&lt;/a&gt; 
to integrate it into Presto.&lt;/p&gt;

&lt;p&gt;The main difficulty is the need to pass the build-side values from the inner-join operator to the probe-side scan operator, 
since the operators may run on different machines. A possible solution is to use the coordinator to facilitate the message 
passing. However, it requires multiple changes in the existing Presto codebase and careful design is needed to avoid overloading
the coordinator.&lt;/p&gt;

&lt;p&gt;Since it’s a complex feature with lots of moving parts, we suggest the approach below that allows solving it in a simpler way 
for specific join use-cases. We note that parts of the implementation below will also help implementing the general dynamic 
filtering solution.&lt;/p&gt;

&lt;h1 id=&quot;design&quot;&gt;Design&lt;/h1&gt;

&lt;p&gt;Our approach relies on the &lt;a href=&quot;https://www.starburst.io/wp-content/uploads/2018/09/Presto-Cost-Based-Query-Optimizer-WP.pdf&quot;&gt;cost-based optimizer&lt;/a&gt; 
(CBO) that allows using “broadcast” join, since in our case the build-side is much smaller than the probe-side. In this case, 
the probe-side scan and the inner-join operators are running in the same process - so the message passing between them becomes 
much simpler.&lt;/p&gt;

&lt;p&gt;Therefore, most of the required changes are at the 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LocalExecutionPlanner&lt;/code&gt;&lt;/a&gt; 
class, and there is no dependencies on the planner nor the coordinator.&lt;/p&gt;

&lt;h1 id=&quot;implementation&quot;&gt;Implementation&lt;/h1&gt;

&lt;p&gt;First, we make sure that a broadcast join is used and that the local stage query plan contains the probe-side 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/plan/TableScanNode.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TableScan&lt;/code&gt;&lt;/a&gt; node.
Otherwise - we don’t apply our the optimization since we need access to the probe-side &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/split/PageSourceProvider.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PageSourceProvider&lt;/code&gt;&lt;/a&gt; 
for predicate pushdown.&lt;/p&gt;

&lt;p&gt;Then, we add a new “collection” operator, just before the hash-builder operator as described below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-filtering/operators.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This operator collects the build-side values, and after its input is over, exposes the resulting dynamic filter as a 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-spi/src/main/java/io/prestosql/spi/predicate/TupleDomain.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TupleDomain&lt;/code&gt;&lt;/a&gt; 
to the probe-side &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/split/PageSourceProvider.java&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PageSourceProvider&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since the probe-side scan operators are running concurrently with the build-side collection, we don’t block the first probe-side 
splits - but allow them to be processed while dynamic filters collection is in progress.&lt;/p&gt;

&lt;p&gt;The lookup-join operator is not changed, but the optimization above allows it to process much less probe-side rows, while 
keeping the result the same.&lt;/p&gt;

&lt;h1 id=&quot;benchmarks&quot;&gt;Benchmarks&lt;/h1&gt;

&lt;p&gt;We ran TPC-DS queries on i3.metal 3-node Varada cluster using TPC-DS scale 1000 data.
The following queries benefit the most for our dynamic filtering implementation (measuring the elapsed time in seconds).&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Query&lt;/th&gt;
      &lt;th&gt;Dynamic filtering &amp;amp; CBO&lt;/th&gt;
      &lt;th&gt;Only CBO&lt;/th&gt;
      &lt;th&gt;No CBO&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q10.sql&quot;&gt;q10&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;2.5&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;10.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q20.sql&quot;&gt;q20&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;3.9&lt;/td&gt;
      &lt;td&gt;12.6&lt;/td&gt;
      &lt;td&gt;26.7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q31.sql&quot;&gt;q31&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;6.5&lt;/td&gt;
      &lt;td&gt;34.8&lt;/td&gt;
      &lt;td&gt;41.5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q32.sql&quot;&gt;q32&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;6.9&lt;/td&gt;
      &lt;td&gt;23.0&lt;/td&gt;
      &lt;td&gt;29.7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q34.sql&quot;&gt;q34&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;3.1&lt;/td&gt;
      &lt;td&gt;11.4&lt;/td&gt;
      &lt;td&gt;14.1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q69.sql&quot;&gt;q69&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;2.7&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;9.9&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q71.sql&quot;&gt;q71&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;9.9&lt;/td&gt;
      &lt;td&gt;91.8&lt;/td&gt;
      &lt;td&gt;107.4&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q77.sql&quot;&gt;q77&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;3.5&lt;/td&gt;
      &lt;td&gt;17.9&lt;/td&gt;
      &lt;td&gt;18.1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q96.sql&quot;&gt;q96&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;1.9&lt;/td&gt;
      &lt;td&gt;8.0&lt;/td&gt;
      &lt;td&gt;10.2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q98.sql&quot;&gt;q98&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;5.8&lt;/td&gt;
      &lt;td&gt;26.5&lt;/td&gt;
      &lt;td&gt;57.1&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/dynamic-filtering/benchmark.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For example, running the &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q71.sql&quot;&gt;TPC-DS q71 query&lt;/a&gt; 
results in ~9x performance improvement:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Dynamic filtering&lt;/th&gt;
      &lt;th&gt;Enabled&lt;/th&gt;
      &lt;th&gt;Disabled&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Elapsed (sec)&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;92&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;CPU (min)&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;127&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Data read (GB)&lt;/td&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;112&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1 id=&quot;discussion&quot;&gt;Discussion&lt;/h1&gt;

&lt;p&gt;These queries are joining large fact “sales” tables with much smaller and filtered dimension tables (e.g. “items”, “customers”, “stores”) - 
resulting in significant optimization by using dynamic filtering.&lt;/p&gt;

&lt;p&gt;Note that we rely on the fact that our connector allows efficient run-time filtering of the build-side table, by using an inline index 
for every column for each split.&lt;/p&gt;

&lt;p&gt;We also rely on the CBO and statistics’ estimation to correctly convert join distribution type to “broadcast” join. Since current statistics’ 
estimation doesn’t support all query plans, this optimization cannot be currently applied for some types of 
&lt;a href=&quot;https://github.com/trinodb/trino/blob/58b86da0eda9d479d418d9752b8cdd4d2c44d9ae/presto-main/src/main/java/io/prestosql/cost/AggregationStatsRule.java&quot;&gt;aggregations&lt;/a&gt; 
(e.g. &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q19.sql&quot;&gt;TPC-DS q19 query&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;In addition, our current dynamic filtering doesn’t support multiple join operators in the same stage, so there are some TPC-DS queries 
(e.g. &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q13.sql&quot;&gt;q13&lt;/a&gt;) 
that may be optimized further.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;

&lt;p&gt;The implementation above is currently in the process of being &lt;a href=&quot;https://github.com/trinodb/trino/pull/931&quot;&gt;reviewed&lt;/a&gt; and will be 
available in a release soon. In addition, we intend to improve the existing implementation to resolve the limitations described above, 
and to support more join patterns.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot;&gt;
      &lt;p&gt;Initially we had experimented with adding &lt;a href=&quot;https://github.com/trinodb/trino/blob/1afbe98bb1eebfcf9050efa5c9a6bb6ccad80c8c/presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java#L527-L533&quot;&gt;Index Join support&lt;/a&gt; to our connector, but since it requires a global index and efficient lookups for high performance, we switched to the dynamic filtering approach. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content>

      
        <author>
          <name>Roman Zeyde</name>
        </author>
      

      <summary>By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 315</title>
      <link href="https://trino.io/blog/2019/06/15/release-315.html" rel="alternate" type="text/html" title="Release 315" />
      <published>2019-06-15T00:00:00+00:00</published>
      <updated>2019-06-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/15/release-315</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/15/release-315.html">&lt;p&gt;This version adds support for
&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST ... WITH TIES&lt;/code&gt;&lt;/a&gt;
syntax, locality-awareness to default scheduler for better workload balancing, the new
&lt;a href=&quot;https://trino.io/docs/current/functions/conversion.html#format&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;format()&lt;/code&gt;&lt;/a&gt; function,
and improved support for ORC bloom filters. Additionally, connectors can now provide
view definitions, which opens up several new use cases.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-315.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds support for FETCH FIRST ... WITH TIES syntax, locality-awareness to default scheduler for better workload balancing, the new format() function, and improved support for ORC bloom filters. Additionally, connectors can now provide view definitions, which opens up several new use cases. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 314</title>
      <link href="https://trino.io/blog/2019/06/08/release-314.html" rel="alternate" type="text/html" title="Release 314" />
      <published>2019-06-08T00:00:00+00:00</published>
      <updated>2019-06-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/08/release-314</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/08/release-314.html">&lt;p&gt;This version adds support for reading ZSTD and LZ4-compressed Parquet data
and writing ZSTD-compressed ORC data, improves compatibility with the Hive
2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON
output format for the CLI, and improves the rendering of the plan structure
in &lt;a href=&quot;https://trino.io/docs/current/sql/explain.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN&lt;/code&gt;&lt;/a&gt; output.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-314.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds support for reading ZSTD and LZ4-compressed Parquet data and writing ZSTD-compressed ORC data, improves compatibility with the Hive 2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON output format for the CLI, and improves the rendering of the plan structure in EXPLAIN output. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Apache Phoenix Connector</title>
      <link href="https://trino.io/blog/2019/06/04/phoenix-connector.html" rel="alternate" type="text/html" title="Apache Phoenix Connector" />
      <published>2019-06-04T00:00:00+00:00</published>
      <updated>2019-06-04T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/04/phoenix-connector</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/04/phoenix-connector.html">&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;
introduces a new &lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html&quot;&gt;Apache Phoenix Connector&lt;/a&gt;, 
which allows Presto to query data stored in &lt;a href=&quot;https://hbase.apache.org/&quot;&gt;HBase&lt;/a&gt;
using &lt;a href=&quot;https://phoenix.apache.org/&quot;&gt;Apache Phoenix&lt;/a&gt;.  This unlocks new capabilities that previously
weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and
joining Phoenix data with data from other Presto data sources.&lt;/p&gt;

&lt;h1 id=&quot;setup&quot;&gt;Setup&lt;/h1&gt;
&lt;p&gt;To get started, simply drop in a new catalog properties file, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etc/catalog/phoenix.properties&lt;/code&gt;,
which defines the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;connector.name=phoenix
phoenix.connection-url=jdbc:phoenix:host1,host2,host3:2181:/hbase
phoenix.config.resources=/path/to/hbase-site.xml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix.connection-url&lt;/code&gt; is the standard Phoenix connection string, which contains the zookeeper
quorum host information and root zookeeper node.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix.config.resources&lt;/code&gt; is a comma separated list of configuration files, used to specify any
&lt;a href=&quot;https://phoenix.apache.org/tuning.html&quot;&gt;custom connection properties&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;schema&quot;&gt;Schema&lt;/h1&gt;
&lt;p&gt;For the most part, data types in Phoenix match up with those in Presto, with a few
&lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html#data-types&quot;&gt;minor exceptions&lt;/a&gt;.  One thing
to note, however, is that tables in Phoenix require a primary key, whereas Presto has no concept of
primary keys.  To handle this, the Phoenix connector uses a table property to specify the primary key. 
For example, consider the following statement in Phoenix:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;example&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_2&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;CONSTRAINT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pk_part_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pk_part_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The equivalent statement in Presto would look something like:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pk_part_2&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;rowkeys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;pk_part_1,pk_part2&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Additional Phoenix and HBase table properties can be specified in a 
&lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html#table-properties-phoenix&quot;&gt;similar way&lt;/a&gt;. 
Note also that the default (empty) schema in Phoenix will always map to a Presto schema named “default”.&lt;/p&gt;

&lt;h1 id=&quot;beyond-mapreduce&quot;&gt;Beyond MapReduce&lt;/h1&gt;
&lt;p&gt;When Phoenix users want to run long-running queries that scan over all/most of the data in a table,
they typically have used the Phoenix &lt;a href=&quot;https://phoenix.apache.org/phoenix_mr.html&quot;&gt;MapReduce integration&lt;/a&gt;. 
However, this has limitations, as the document states:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note: The SELECT query must not perform any aggregation or use DISTINCT as these are not supported by our map-reduce integration.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is because the framework only constructs simple Mappers which scan over each region.  To
do more complex operations like aggregations, the framework would need Reducers as well.
Someone could implement that, but then they would essentially be on the path towards rewriting
Hive from scratch.&lt;/p&gt;

&lt;p&gt;Presto now provides the ability to do these more complex operations.  The Phoenix connector
performs the same filtered scans as the MapReduce framework, but now the Presto engine does
the aggregations, joins, etc.&lt;/p&gt;

&lt;h1 id=&quot;federation&quot;&gt;Federation&lt;/h1&gt;
&lt;p&gt;With the Phoenix connector, querying multiple Phoenix clusters is as easy as querying the
respective catalogs.  As a simple example, suppose we have one cluster in region &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;us-west&lt;/code&gt; and
another cluster in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;us-east&lt;/code&gt;.  If we create two catalog files, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix_west.properties&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;phoenix_east.properties&lt;/code&gt;, then we can query both:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;us-west&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix_west&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;UNION&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;us-east&apos;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix_east&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;joining-with-other-data-sources&quot;&gt;Joining with other data sources&lt;/h1&gt;
&lt;p&gt;Another nice feature of Presto is the ability to join data in Phoenix with other data sources.
Suppose we have the following tables:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;customer (
  custkey bigint,
  comment varchar,
  ...
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;orders (
  orderkey bigint,
  custkey bigint,
  totalprice double,
  ...
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Suppose further that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Either table can hold large amounts of data&lt;/li&gt;
  &lt;li&gt;The customer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comment&lt;/code&gt; field can change frequently&lt;/li&gt;
  &lt;li&gt;We want to be able to query for orders with a certain &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;totalprice&lt;/code&gt; range, and join with the
customer table to get the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comment&lt;/code&gt; for these orders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phoenix/HBase is a row-oriented storage solution with very fast lookup by primary key.  On the
other hand, ORC is a column-oriented file format that can filter results by column value very
efficiently.  So in this use case, it might make sense to store the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; table in Phoenix
with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt; as the primary key, and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;orders&lt;/code&gt; table in ORC, perhaps in an object store like
S3.  We can then use Presto to leverage the strengths of each of our data stores and combine OLTP
with OLAP:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;comment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;totalprice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custkey&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;insertingupdating-data&quot;&gt;Inserting/Updating data&lt;/h1&gt;
&lt;p&gt;In the prior example, since our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer&lt;/code&gt; data is coming from Phoenix, our OLTP store, we can
easily insert new data:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;phoenix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tpch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;101&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;some comment&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since Presto’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSERT&lt;/code&gt; translates to Phoenix’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPSERT&lt;/code&gt;, inserting is the same as updating - i.e.
if there’s already a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;custkey&lt;/code&gt; of 101, then the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comment&lt;/code&gt; will get updated instead.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;
&lt;p&gt;With upcoming improvements to Presto, there will be opportunities to further optimize the performance
of the Phoenix connector.&lt;/p&gt;

&lt;p&gt;One of the biggest ways Phoenix optimizes performance is through the use of 
&lt;a href=&quot;https://www.3pillarglobal.com/insights/hbase-coprocessors&quot;&gt;HBase coprocessors&lt;/a&gt;, which allow custom
code to be run on each regionserver.  For example, to do aggregations, Phoenix runs a partial
aggregation in the coprocessor of each table region, and the result for each region is then passed
back to the client for a final aggregation.  That way, the table data itself doesn’t need to be
sent from each region to the client - just the partial aggregation result.  However, currently only
filters are pushed down to the Phoenix connector.  With the ongoing work in Presto to support more
&lt;a href=&quot;https://github.com/trinodb/trino/issues/18&quot;&gt;complex pushdown&lt;/a&gt; to connectors, we will be able to
pushdown operations like aggregations to the Phoenix connector, which in turn can push them further
down to the HBase coprocessors.&lt;/p&gt;

&lt;p&gt;Another area of potential improvement is integration with Presto’s 
&lt;a href=&quot;https://www.starburstdata.com/technical-blog/introduction-to-presto-cost-based-optimizer/&quot;&gt;cost-based optimizer&lt;/a&gt;,
which can analyze table statistics to do things like join reordering. Phoenix already supports
&lt;a href=&quot;https://phoenix.apache.org/update_statistics.html&quot;&gt;statistics collection&lt;/a&gt;, with more improvements
underway, so this is just a matter of integrating with the Presto statistics framework.&lt;/p&gt;

&lt;h1 id=&quot;questions&quot;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;If you have any questions about the connector, or Phoenix in general, feel free to ask on the
Phoenix dev mailing list: &lt;a href=&quot;mailto:dev@phoenix.apache.org&quot;&gt;dev@phoenix.apache.org&lt;/a&gt;.&lt;/p&gt;</content>

      
        <author>
          <name>Vincent Poon</name>
        </author>
      

      <summary>Presto 312 introduces a new Apache Phoenix Connector, which allows Presto to query data stored in HBase using Apache Phoenix. This unlocks new capabilities that previously weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and joining Phoenix data with data from other Presto data sources.</summary>

      
      
    </entry>
  
    <entry>
      <title>Removing redundant ORDER BY</title>
      <link href="https://trino.io/blog/2019/06/03/redundant-order-by.html" rel="alternate" type="text/html" title="Removing redundant ORDER BY" />
      <published>2019-06-03T00:00:00+00:00</published>
      <updated>2019-06-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/03/redundant-order-by</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/03/redundant-order-by.html">&lt;p&gt;Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work.
Some SQL constructs such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; do not affect query results in many situations, and can negatively
affect performance unless the optimizer is &lt;em&gt;smart enough&lt;/em&gt; to remove them.&lt;/p&gt;

&lt;p&gt;Until very recently, Presto would insert a sorting step for each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause in a query. This, combined
with users and tools inadvertently using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; in places that have no effect, could result in severe
performance degradation and waste of resources. We finally fixed this in
&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Quoting from the SQL specification (ISO 9075 Part 2):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; can contain an optional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;order by clause&amp;gt;&lt;/code&gt;. The ordering of the rows of the table
 specified by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; is guaranteed only for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; that immediately 
 contains the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;order by clause&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means, a query engine is free to ignore any &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause that doesn’t fit that context. Let’s consider
some examples where the clause is irrelevant.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;another_table&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;field&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;While this query has the semblance of creating a sorted table, that’s not so. Tables in SQL are inherently
unordered. Once the data is written, there’s no guarantee it will come out sorted when read. This is 
particularly true for a parallel, distributed query engine like Presto that reads and processes data using
many threads simultaneously. Note that some storage engines may store data sorted, but that is not controlled
during data insertion. Executing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; just causes the query to perform poorly due to reduced 
parallelism in the merging step of a distributed sort, and consumes more CPU and memory to sort the data.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;another_table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;key&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this case, whether the tables involved in the join are sorted doesn’t matter, since Presto is going to 
build a hash lookup table out of one of them to execute the join operation. As in the previous example
preserving the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; just causes the query to perform poorly.&lt;/p&gt;

&lt;p&gt;When &lt;em&gt;does&lt;/em&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; matter? Since it is “guaranteed only for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; that immediately 
contains the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;order by clause&amp;gt;&lt;/code&gt;”, only operations that are part of the same &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression&amp;gt;&lt;/code&gt; are 
sensitive to it.&lt;/p&gt;

&lt;p&gt;A query expression is a block with the following structure:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;lt;query expression&amp;gt; ::=
  [ &amp;lt;with clause&amp;gt; ] 
  &amp;lt;query expression body&amp;gt;
  [ &amp;lt;order by clause&amp;gt; ] 
  [ &amp;lt;result offset clause&amp;gt; ] 
  [ &amp;lt;fetch first clause&amp;gt; ]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;query expression body&amp;gt;&lt;/code&gt; devolves into one of the set operations (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UNION&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTERSECT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXCEPT&lt;/code&gt;), 
a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; construct, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VALUES&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TABLE&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;The only operations that occur after an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST&lt;/code&gt; (a.k.a., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;. So, 
unless a subquery contains one of these two clauses, the query engine is free to remove the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; 
clause without breaking the semantics dictated by the specification.&lt;/p&gt;

&lt;p&gt;Here’s an example where the clause is meaningful:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;field&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;another_table&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Other databases tackle this in a variety of ways. &lt;a href=&quot;https://mariadb.com/kb/en/library/why-is-order-by-in-a-from-subquery-ignored/&quot;&gt;MariaDB&lt;/a&gt;
and &lt;a href=&quot;https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.remove.orderby.in.subquery&quot;&gt;Hive 3.0&lt;/a&gt;
will ignore redundant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clauses. SQL Server, on the other hand, will produce an error:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table
expressions, unless TOP or FOR XML is also specified.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;whats-the-catch&quot;&gt;What’s the catch?&lt;/h2&gt;

&lt;p&gt;It is a common mistake for users to think the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause has a meaning in the language regardless of where it 
appears in a query. The fact that, for implementation reasons, in some cases &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; is significant for Presto 
complicates matters. We often see users rely on this when formulating queries where aggregation or window functions 
are sensitive to the order of their inputs:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row_number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVER&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The Right Way™ of doing this in SQL is to use the aggregation or window-specific &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause. For the
examples above:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row_number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVER&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In order to ease the transition, the new behavior can be turned off globally via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimizer.skip-redundant-sort&lt;/code&gt;
configuration option or on a per-session basis via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;skip_redundant_sort&lt;/code&gt; session property. 
These options will be removed in a future version.&lt;/p&gt;

&lt;p&gt;Additionally, any time Presto detects a redundant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ORDER BY&lt;/code&gt; clause, it will warn users about it:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/redundant-order-by/redundant-order-by.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso</name>
        </author>
      

      <summary>Optimizers are all about doing work in the most cost-effective manner and avoiding unnecessary work. Some SQL constructs such as ORDER BY do not affect query results in many situations, and can negatively affect performance unless the optimizer is smart enough to remove them.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 313</title>
      <link href="https://trino.io/blog/2019/06/01/release-313.html" rel="alternate" type="text/html" title="Release 313" />
      <published>2019-06-01T00:00:00+00:00</published>
      <updated>2019-06-01T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/06/01/release-313</id>
      <content type="html" xml:base="https://trino.io/blog/2019/06/01/release-313.html">&lt;p&gt;This version fixes incorrect results for queries involving &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUPING SETS&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LIMIT&lt;/code&gt;, fixes selecting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUID&lt;/code&gt; type from the CLI and JDBC driver,
and adds support for compression and encryption when using
&lt;a href=&quot;https://trino.io/docs/current/admin/spill.html&quot;&gt;Spill to Disk&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-313.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version fixes incorrect results for queries involving GROUPING SETS and LIMIT, fixes selecting the UUID type from the CLI and JDBC driver, and adds support for compression and encryption when using Spill to Disk. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Using Precomputed Hash in SemiJoin Operations</title>
      <link href="https://trino.io/blog/2019/05/30/semijoin-precomputed-hasd.html" rel="alternate" type="text/html" title="Using Precomputed Hash in SemiJoin Operations" />
      <published>2019-05-30T00:00:00+00:00</published>
      <updated>2019-05-30T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/30/semijoin-precomputed-hasd</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/30/semijoin-precomputed-hasd.html">&lt;p&gt;Queries involving &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT IN&lt;/code&gt; over a subquery are much faster in 
&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/semijoin-precomputed-hash/semijoin-precomputed-hash-gains.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;We ran the benchmark above with 3 workers (r3.2xlarge) and 1 coordinator (r3.xlarge) on 
TPC-DS scale 1000 stored in ORC format using the following queries:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ss_customer_sk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c_customer_sk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customer&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store_sales&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ss_store_sk&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_store_sk&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_hours&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;8AM-4PM&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;what-was-the-improvement&quot;&gt;What was the improvement?&lt;/h1&gt;

&lt;p&gt;We found that the optimization to use precomputed hashes, which is enabled by 
default, was missing in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiJoin&lt;/code&gt; operator.  Hash values were precomputed at the leaf 
stages but they were not being used in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiJoin&lt;/code&gt; operator leading to re-calculation 
of the hash values at this operator. Since queries involving &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IN&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT IN&lt;/code&gt; over a 
subquery use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SemiJoin&lt;/code&gt; operator, &lt;a href=&quot;https://github.com/trinodb/trino/pull/767&quot;&gt;the fix to use precomputed hash in SemiJoin operator&lt;/a&gt; 
improves the performance of such queries significantly.&lt;/p&gt;

&lt;h1 id=&quot;how-does-optimize-hash-generation-optimization-work&quot;&gt;How does &lt;em&gt;optimize-hash-generation&lt;/em&gt; optimization work&lt;/h1&gt;

&lt;p&gt;Presto divides a query plan into parts called Stages which can be run in parallel on 
multiple nodes, each node working on different set of data. There are two types of stages:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Leaf Stages: these are the stages that are at the leaf of the Query Plan and read 
data from a datasource, like a Hive Table.&lt;/li&gt;
  &lt;li&gt;Intermediate Stages: these are the stages other than the leaf stages and process 
data from other upstream stages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exchange&lt;/code&gt; operator shuffles and transfers the output from upstream stages to the 
intermediate stages. For certain operators like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt;, output data of 
the leaf stage is partitioned by the values of a column and the shuffle operation ensures 
that a particular partition is always processed by the same task of the Intermediate stage. 
This partitioning requires calculation of a hash on that column’s values during exchange 
and later in the intermediate stage same hash is needed during the execution of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; 
or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; operation. To prevent redundant calculations, Presto calculates this hash value 
in the leaf stage, uses it in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exchange&lt;/code&gt; operator and makes it available in the output to let
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GROUP BY&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; operations use it in the intermediate stage.&lt;/p&gt;

&lt;p&gt;Consider this query to count the number of stores per city:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;city&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stores&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;city&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The query plan (simplified) and its division into stages looks like below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/semijoin-precomputed-hash/query-plan.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The leaf stage (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage2&lt;/code&gt;) reads the table from a data source, feeds the partially 
aggregated data to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage1&lt;/code&gt; where final aggregation happens, and finally, the result is available 
via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each row produced by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage2&lt;/code&gt;, needs to be partitioned by the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;city&lt;/code&gt; column in it to ensure 
data for same city is processed by the same task of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage1&lt;/code&gt;. After the exchange, when a row is consumed 
in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Stage1&lt;/code&gt;, it needs to be hashed again to find a group for the row so that the final aggregation 
accumulates results for each city in it’s corresponding group bucket. Double hash calculations on 
the values of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;city&lt;/code&gt; column is prevented by doing this calculation once while reading the data and then 
using it in both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exchange&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Final Aggregation&lt;/code&gt; operations which reduces CPU usage of the query. 
Additionally, pushing this calculation into leaf stage which is better parallelized when there is 
a large number of splits for this stage, improves query latency.&lt;/p&gt;

&lt;h1 id=&quot;how-to-get-this-fix&quot;&gt;How to get this fix?&lt;/h1&gt;

&lt;p&gt;This fix is available in Presto version 312 and above. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimize-hash-generation&lt;/code&gt; setting is enabled 
by default so the fix will be in action as soon as you upgrade your Presto installation.&lt;/p&gt;</content>

      
        <author>
          <name>Shubham Tagra, Qubole</name>
        </author>
      

      <summary>Queries involving IN and NOT IN over a subquery are much faster in Presto 312.</summary>

      
      
    </entry>
  
    <entry>
      <title>Improved Hive Bucketing</title>
      <link href="https://trino.io/blog/2019/05/29/improved-hive-bucketing.html" rel="alternate" type="text/html" title="Improved Hive Bucketing" />
      <published>2019-05-29T00:00:00+00:00</published>
      <updated>2019-05-29T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/29/improved-hive-bucketing</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/29/improved-hive-bucketing.html">&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Presto 312&lt;/a&gt;
adds support for the more flexible bucketing introduced in recent
versions of Hive. Specifically, it allows any number of files per bucket,
including zero. This allows inserting data into an existing partition without
having to rewrite the entire partition, and improves the performance of
writes by not requiring the creation of files for empty buckets.&lt;/p&gt;

&lt;h1 id=&quot;hive-bucketing-overview&quot;&gt;Hive bucketing overview&lt;/h1&gt;

&lt;p&gt;Hive bucketing is a simple form of hash partitioning. A table is bucketed
on one or more columns with a fixed number of hash buckets. For example,
a table definition in Presto syntax looks like this:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;page_views&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;page_url&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;partitioned_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;dt&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;bucketed_by&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ARRAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;user_id&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;bucket_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The bucketing happens within each partition of the table (or across the entire
table if it is not partitioned). In the above example, the table is partitioned
by date and is declared to have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;50&lt;/code&gt; buckets using the user ID column. This
means that the table will have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;50&lt;/code&gt; buckets for each date. The assigned bucket
for each row is determined by hashing the user ID value. This means that all
user IDs with the same value will go into the same bucket.&lt;/p&gt;

&lt;h1 id=&quot;original-hive-bucketing&quot;&gt;Original Hive bucketing&lt;/h1&gt;

&lt;p&gt;Originally, Hive required exactly one file per bucket. The files were named
such that the bucket number was implicit based on the file’s position within
the lexicographic ordering of the file names. For example, the following list
of files represent buckets &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;, respectively:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;00000_0
00001_0
00002_0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;file0
file3
file5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;bucketA
bucketB
bucketD
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The file names are meaningless aside from their ordering with respect to the
other file names.&lt;/p&gt;

&lt;h1 id=&quot;whats-the-problem&quot;&gt;What’s the problem?&lt;/h1&gt;

&lt;p&gt;The original Hive bucketing scheme has a couple of problems:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Inserting data into the table by adding additional files is not possible.
Instead, an insert operation requires rewriting all of the existing files,
which can be quite expensive.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If the data is sparse, some of the buckets might be empty, but because there
must be a file for every bucket, the writer must create an empty file for
each bucket. Some file formats, such as ORC, support zero-byte files as empty
files. Other formats require writing a file with a valid header and footer.
Creating these files adds latency to the write operation, and storing these
tiny files is inefficient for file systems like HDFS which are designed for
large files.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;improved-hive-bucketing&quot;&gt;Improved Hive bucketing&lt;/h1&gt;

&lt;p&gt;Newer versions of Hive support a bucketing scheme where the bucket number is
included in the file name. This is the same naming scheme that Hive has always
used, thus it is backwards compatible with existing data. The naming convention
has the bucket number as the start of the file name, and requires that the
number starts with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The following list of files shows what data written by Hive might look like for
a table with a bucket count of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;4&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;000000_0            # bucket 0
000000_0_copy_1     # bucket 0
000000_0_copy_2     # bucket 0
000001_0            # bucket 1
000001_0_copy_1     # bucket 1
000003_0            # bucket 3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can see that there are multiple files for buckets &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;, one file for
bucket &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;3&lt;/code&gt;, and no files for bucket &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately, Presto used a different naming convention that was valid
according to the lexicographical ordering requirement, but not the newer
explicit numbering convention. File names written by Presto used to look
like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;20180102_030405_00641_x1y2z_bucket-00234
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;20180102_030405_00641_x1y2z&lt;/code&gt; value at the start of the file name
is the Presto query ID for the query that wrote the data. This is followed
by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bucket-&lt;/code&gt; plus the padded bucket number. Presto now writes file names
that match the new Hive naming convention, with the bucket number at the
the start and the query ID at the end:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;000234_0_20180102_030405_00641_x1y2z
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When reading bucketed tables, Presto supports both the new Hive convention
and the old Presto convention. Additionally, it still supports the original
Hive scheme when the files do not match either of the naming conventions,
keeping the requirement that there must be exactly one file per bucket.&lt;/p&gt;

&lt;h1 id=&quot;skipping-empty-buckets-for-faster-writes&quot;&gt;Skipping empty buckets for faster writes&lt;/h1&gt;

&lt;p&gt;Now that Hive and Presto no longer require files for empty buckets, Presto
does not need to create them. They are still created by default for
compatibility with earlier versions of Hive, Presto, and other tools, but
we expect to disable it in a future release, making writes faster by default.
Or you may choose to disable them now if that works for your environment.
This is controlled by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.create-empty-bucket-files&lt;/code&gt; configuration
property or the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;create_empty_bucket_files&lt;/code&gt; session property.&lt;/p&gt;</content>

      
        <author>
          <name>David Phillips</name>
        </author>
      

      <summary>Presto 312 adds support for the more flexible bucketing introduced in recent versions of Hive. Specifically, it allows any number of files per bucket, including zero. This allows inserting data into an existing partition without having to rewrite the entire partition, and improves the performance of writes by not requiring the creation of files for empty buckets.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 312</title>
      <link href="https://trino.io/blog/2019/05/29/release-312.html" rel="alternate" type="text/html" title="Release 312" />
      <published>2019-05-29T00:00:00+00:00</published>
      <updated>2019-05-29T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/29/release-312</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/29/release-312.html">&lt;p&gt;This version has many performance improvements (including
&lt;a href=&quot;/blog/2019/05/21/optimizing-the-casts-away.html&quot;&gt;cast optimization&lt;/a&gt;),
a new &lt;a href=&quot;https://trino.io/docs/current/language/types.html#uuid-type&quot;&gt;UUID&lt;/a&gt; data type
and &lt;a href=&quot;https://trino.io/docs/current/functions/uuid.html#uuid&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uuid()&lt;/code&gt;&lt;/a&gt; function,
a new &lt;a href=&quot;https://trino.io/docs/current/connector/phoenix.html&quot;&gt;Apache Phoenix connector&lt;/a&gt;,
support for the PostgreSQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TIMESTAMP WITH TIME ZONE&lt;/code&gt; data type,
support for the MySQL &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JSON&lt;/code&gt; data type,
&lt;a href=&quot;/blog/2019/05/29/improved-hive-bucketing.html&quot;&gt;improved support for Hive bucketed tables&lt;/a&gt;,
and some bug fixes.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-312.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version has many performance improvements (including cast optimization), a new UUID data type and uuid() function, a new Apache Phoenix connector, support for the PostgreSQL TIMESTAMP WITH TIME ZONE data type, support for the MySQL JSON data type, improved support for Hive bucketed tables, and some bug fixes. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Optimizing the Casts Away</title>
      <link href="https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html" rel="alternate" type="text/html" title="Optimizing the Casts Away" />
      <published>2019-05-21T00:00:00+00:00</published>
      <updated>2019-05-21T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/21/optimizing-the-casts-away</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html">&lt;p&gt;The next release of Presto (version 312) will include a new optimization to remove unnecessary casts 
which might have been added implicitly by the query planner or explicitly by users when they wrote the query.&lt;/p&gt;

&lt;p&gt;This is a long post explaining how the optimization works. If you’re only interested in the results,
skip to the &lt;a href=&quot;#results&quot;&gt;last section&lt;/a&gt;. For the full details, read on!&lt;/p&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS_CHTML&quot;&gt;
&lt;/script&gt;

&lt;div style=&quot;display:none&quot;&gt;
$$ 
\newcommand\cast[2]{
    \text{cast}_{\text{#1} \rightarrow \text{#2}}
} 
\newcommand\trueOrNull[1]{
  \text{if}(#1 \text{ is null}, \text{null}, \text{true})
} 
\newcommand\falseOrNull[1]{
  \text{if}(#1 \text{ is null}, \text{null}, \text{false})
} 
$$
&lt;/div&gt;

&lt;p&gt;Like many programming languages, SQL allows certain operations between values of different 
types if there are implicit conversions (a.k.a., implicit casts or coercions) between those types.
This improves usability, as it allows writing expressions like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.5 &amp;gt; 2&lt;/code&gt; without worrying &lt;em&gt;too much&lt;/em&gt;
whether the types are compatible (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.5&lt;/code&gt; is of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(2,1)&lt;/code&gt;, while &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt; is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;During query analysis and planning, Presto introduces explicit casts for any implicit conversion in the
original query as it translates it into the intermediate query plan representation the engine uses 
internally for optimization and execution. This eliminates a layer of complexity for the optimizer, 
which, as a result, doesn’t need to reason about types (type inference) or worry about whether expressions 
are properly typed.&lt;/p&gt;

&lt;p&gt;More importantly, it simplifies the job of defining and implementing operators (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=&lt;/code&gt;, etc). 
Without implicit conversions, there would need to exist a variant of every operator for every combination
 of compatible types. For example, it would be necessary to have an implementation of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=&lt;/code&gt; operator for 
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, tinyint)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, smallint)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, integer)&lt;/code&gt;, 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(tinyint, bigint)&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(smallint, integer)&lt;/code&gt;, and so on.&lt;/p&gt;

&lt;p&gt;Given two columns, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t :: smallint&lt;/code&gt;, and an expression such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s = t&lt;/code&gt;, the planner 
determines that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; can be implicitly coerced to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; and derives the following expression:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) = t   
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is not without challenges. The predicate pushdown logic relies on simple equality and 
range comparisons to move predicates around, and importantly, to infer that certain predicates
in one branch of a join can be used to constrain the values on the other side of the join. An
expression like the one above is not “simple” from this perspective due to the type conversion 
involved, and it can defeat the (arguably simplistic) predicate inference algorithm.&lt;/p&gt;

&lt;p&gt;Secondly, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is a constant (or an expression that is effectively constant), the engine has to 
convert every value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; it sees during query execution in order to compare it with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt;. This 
brings up the obvious question: “can’t it somehow convert &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; and compare directly”?
It would look like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s = CAST(t AS tinyint)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is a constant, the term &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CAST(t AS tinyint)&lt;/code&gt; can be trivially pre-computed and reused 
for the entire query. It’s not that simple in the general case, though. Narrowing cast, such 
as a conversion from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;, or from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; can fail or alter
the value due to rounding or truncation, so we must take special care to avoid errors or 
change query semantics. We discuss this at length in the sections below.&lt;/p&gt;

&lt;h1 id=&quot;some-properties-of-well-behaved-implicit-casts&quot;&gt;Some properties of (well-behaved) implicit casts&lt;/h1&gt;

&lt;p&gt;Let’s take a short detour and talk briefly about some properties of well-behaved implicit 
casts we can exploit to do the transformation we described in the previous section.&lt;/p&gt;

&lt;p&gt;Since the query engine is free to insert implicit casts wherever it sees fit, these functions
need to follow some ground rules. Failure to do so can result in queries producing incorrect
results due to changes in query semantics.&lt;/p&gt;

&lt;p&gt;Implicit casts need to have the following properties:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Injective_function&quot;&gt;Injective&lt;/a&gt;. Given \(\cast{S}{T}\) every value in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; 
must map to a distinct value in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; (this does not imply that every value in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; has to map to a value 
in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, though).&lt;/li&gt;
  &lt;li&gt;Order-preserving. Given \(s_1 \in S\) and \(s_2 \in S\),&lt;/li&gt;
&lt;/ul&gt;

\[\begin{equation}
s_1 = s_2 \quad \Rightarrow \quad \cast{S}{T}(s_1) = \cast{S}{T}(s_2) \\
s_1 &amp;lt; s_2 \quad \Rightarrow \quad \cast{S}{T}(s_1) &amp;lt; \cast{S}{T}(s_2) \\
s_1 &amp;gt; s_2 \quad \Rightarrow \quad \cast{S}{T}(s_1) &amp;gt; \cast{S}{T}(s_2)
\end{equation}\]

&lt;p&gt;For exact numeric types (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal&lt;/code&gt;, etc.), this holds as long as 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; has enough integer digits to hold the integral part of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; and enough fractional digits to 
hold the fractional part of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As an example, the picture below depicts how every value of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;, which has a range
of \([-128, 127]\), maps to a distinct value of a wider type such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt;. Also, every value 
of the wider type that is within the range of representable values of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; has a distinct 
mapping to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;. So, for the values within the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; range, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt;
conversion is &lt;a href=&quot;https://en.wikipedia.org/wiki/Bijection&quot;&gt;bijective&lt;/a&gt;. This is not necessary for the 
transformation to work, but it simplifies one of the cases we’ll consider. We’ll cover this more later.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/tinyint-integer.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;On the other hand, some conversions such as those between integer types and decimal types with fractional
parts are injective but not bijective, even when excluding the values outside the range of the narrower
 type.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/tinyint-decimal.svg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The properties clearly hold for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;biginteger&lt;/code&gt;. They also hold for:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(3,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(4,1)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(5,2)&lt;/code&gt; → …&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(5,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(6,1)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(7,2)&lt;/code&gt; → …&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(10,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(11,1)&lt;/code&gt; → …&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(19,0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(20, 1)&lt;/code&gt; → …&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It even works for conversions between exact and approximate numbers, such as:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smallint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It does &lt;em&gt;not&lt;/em&gt; work for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigint&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; when precision is large
because not all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bigint&lt;/code&gt;s fit in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; (64 bits vs 53-bit mantissa) and not all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;integer&lt;/code&gt;s fit in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; 
(32 bits vs 23-bit mantissa). Sadly, for legacy reasons Presto allows those conversions implicitly. We “justify” 
it with the argument that “since they are dealing with approximate numerics anyway, and given the conversions only 
lose precision in the least significant part, they are sort of ok”. This is something we’ll revisit in the
future once we have a reasonable story around dealing with inherent break in backward-compatibility
of removing such conversions.&lt;/p&gt;

&lt;p&gt;Finally, the properties also apply for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt; conversions:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar(0)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar(1)&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar(2)&lt;/code&gt; → … → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varchar&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;getting-to-the-point&quot;&gt;Getting to the point…&lt;/h1&gt;

&lt;p&gt;With this in mind, let’s look at the simplest scenario: conversions between integer types.&lt;/p&gt;

&lt;p&gt;As in the example we covered in the introduction, the transformation is straightforward 
when the constant can be represented in the narrower type. Given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) = smallint &apos;1&apos;     ⟺  s = tinyint &apos;1&apos;
CAST(s AS smallint) = smallint &apos;127&apos;   ⟺  s = tinyint &apos;127&apos;
CAST(s AS smallint) = smallint &apos;-128&apos;  ⟺  s = tinyint &apos;-128&apos;

CAST(s AS smallint) &amp;gt; smallint &apos;10&apos;    ⟺  s &amp;gt; tinyint &apos;10&apos;
CAST(s AS smallint) &amp;lt; smallint &apos;10&apos;    ⟺  s &amp;lt; tinyint &apos;10&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of course, when the value is at the edge of the range of the narrower type, we can cleverly 
turn some inequalities into equalities:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;gt;= smallint &apos;127&apos;   ⟺  s &amp;gt;= tinyint &apos;127&apos;  
                                        ⟺  s =  tinyint &apos;127&apos;
                                       
CAST(s AS smallint) &amp;lt;= smallint &apos;-128&apos;  ⟺  s &amp;lt;= tinyint &apos;-128&apos;  
                                        ⟺  s =  tinyint &apos;-128&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Additionally, we may be able to tell that an expression is always &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;. Special
care needs to be taken when the value is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;, though, since in SQL any comparison with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt; 
yields &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;gt; smallint &apos;127&apos;    ⟺  s &amp;gt; tinyint &apos;127&apos;  
                                        ⟺  if(s is null, null, false)
                                        
CAST(s AS smallint) &amp;lt;= smallint &apos;127&apos;   ⟺  s &amp;lt;= tinyint &apos;127&apos;  
                                        ⟺  if(s is null, null, true)

CAST(s AS smallint) &amp;lt; smallint &apos;-128&apos;   ⟺  s &amp;lt; tinyint &apos;-128&apos;  
                                        ⟺  if(s is null, null, false)
                                        
CAST(s AS smallint) &amp;gt;= smallint &apos;-128&apos;  ⟺  s &amp;gt;= tinyint &apos;-128&apos;  
                                        ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can make similar inferences when the value is outside the range of possible values
for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;. For equality comparisons, it’s trivial.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) = smallint &apos;1000&apos;  ⟺  if(s is null, null, false)    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Conversely,&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;lt;&amp;gt; smallint &apos;1000&apos;  ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Just like the earlier cases involving comparisons with values at the edge of the range,
we can apply the same idea when the value falls outside of the range:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS smallint) &amp;lt; smallint &apos;1000&apos;   ⟺  if(s is null, null, true) 
CAST(s AS smallint) &amp;lt; smallint &apos;-1000&apos;  ⟺  if(s is null, null, false)

CAST(s AS smallint) &amp;gt; smallint &apos;1000&apos;   ⟺  if(s is null, null, false) 
CAST(s AS smallint) &amp;gt; smallint &apos;-1000&apos;  ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;unrepresentable-values&quot;&gt;Unrepresentable values&lt;/h1&gt;

&lt;p&gt;Values that are outside the range of the narrower type may not be the only ones without a mapping. 
For example, for a type such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decimal(2,1)&lt;/code&gt;, any value with a fractional part (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.5&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2.3&lt;/code&gt;) cannot 
be represented as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We can tell whether a value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; is representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; by converting it to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; and back to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt;. We’ll 
call this value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&apos;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t &amp;lt;&amp;gt; t&apos;&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is not representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, and similar rules as for out-of-range values apply when the 
expression involves an equality. For example, given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS double) =  double &apos;1.1&apos;  ⟺  if(s is null, null, false)    
CAST(s AS double) &amp;lt;&amp;gt; double &apos;1.1&apos;  ⟺  if(s is null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When some values in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; are not representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, the cast between &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T → S&lt;/code&gt; will generally either truncate
or round. The SQL specification doesn’t mandate which of those alternatives an implementation should follow,
and even allows that to vary for conversions between various combinations of types.&lt;/p&gt;

&lt;p&gt;This throws a bit of a wrench in our plans, so to speak. If we can’t tell whether a cast will round or truncate,
how would we know whether a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt; comparison should turn into a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;=&lt;/code&gt; in the resulting expression? To 
illustrate, let’s consider this example. Given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: tinyint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;CAST(s AS double) &amp;gt; double &apos;1.9&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If the conversion from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; → &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tinyint&lt;/code&gt; truncates, the expression above is equivalent to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s &amp;gt; tinyint &apos;1&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the other hand, if the conversion rounds, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.9&lt;/code&gt; becomes &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;, and the expression is equivalent to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s &amp;gt;= tinyint &apos;2&apos;              
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In order to know which operator to use in the transformed expression (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;&lt;/code&gt; vs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;=&lt;/code&gt;), it is therefore 
crucial to distinguish between those two behaviors. The good news is that there’s a simple and elegant way
out of this hole.&lt;/p&gt;

&lt;p&gt;An important observation is that we don’t need to know how the conversion behaves &lt;em&gt;in general&lt;/em&gt;, but only how 
it behaves when applied to the constant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt;. Regardless of whether the conversion truncates or rounds, for a 
given value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt;, the outcome can be seen to &lt;em&gt;round up&lt;/em&gt; or &lt;em&gt;round down&lt;/em&gt;, as depicted below.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/round-down.svg&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/optimizing-casts/round-up.svg&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;We can easily tell which of those scenarios applies by comparing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&apos;&lt;/code&gt;: if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t &amp;gt; t&apos;&lt;/code&gt;, the operation rounded
down. Conversely, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t &amp;lt; t&apos;&lt;/code&gt;, it rounded up. If &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t = t&apos;&lt;/code&gt;, the value is representable in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;, and the rules from the 
previous section apply.&lt;/p&gt;

&lt;h1 id=&quot;oh-the-nullability&quot;&gt;Oh, the nullability&lt;/h1&gt;

&lt;p&gt;Let’s take another quick detour and talk about the issue of nullability. After all, no discussion about
SQL is complete without an exploration of the semantics of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;SQL uses &lt;a href=&quot;https://en.wikipedia.org/wiki/Three-valued_logic#Application_in_SQL&quot;&gt;three-valued logic&lt;/a&gt;. In addition
to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;, logical expressions can evaluate to an &lt;em&gt;unknown&lt;/em&gt; value, which is indicated by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.
Logical operations &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AND&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OR&lt;/code&gt; behave according to the following rules:&lt;/p&gt;

\[\begin{array}{|c|c|c|c|}
\hline
\text{A} &amp;amp; \text{B} &amp;amp; \text{A and B} &amp;amp; \text{A or B} \\ 
\hline
\text{true}&amp;amp; \text{null} &amp;amp; \text{null} &amp;amp; \text{true} \\ 
\hline
\text{false}&amp;amp; \text{null} &amp;amp; \text{false} &amp;amp; \text{null} \\ 
\hline
\end{array}\]

&lt;p&gt;The logical comparison operators =, &amp;lt;&amp;gt;, &amp;gt;, ≥, &amp;lt;, ≤ evaluate to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt; when one or both operands are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.
Hence, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;, our expression &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cast(s as smallint) = t&lt;/code&gt; can be simply replaced with a constant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As we mentioned in the previous section, there are cases where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cast(s as smallint) = t&lt;/code&gt; can be reduced to 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;, &lt;em&gt;except&lt;/em&gt; for the fact that if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; is null, the expression needs to return &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;null&lt;/code&gt; to preserve
semantics. So, we use the following forms to capture this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;if(s IS null, null, false)
if(s IS null, null, true)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The catch with that is that the optimizer does not understand the semantics of these &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if&lt;/code&gt; expressions and cannot 
use them for deriving additional properties. In essence, it becomes an optimization barrier. On the other hand,
the optimizer is pretty good at manipulating logical conjunctions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AND&lt;/code&gt;) and disjunctions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OR&lt;/code&gt;). So, let’s see 
how we can use boolean logic to obtain an equivalent formulation.&lt;/p&gt;

&lt;p&gt;We can exploit the properties of SQL boolean logic to derive expressions that behave in the same manner as the 
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if()&lt;/code&gt; constructs from above:&lt;/p&gt;

\[\begin{align}
    \text{if}(s \text{ is null}, \text{null}, \text{false}) &amp;amp; \iff (s \text{ is null}) \text{ and null} \\
    \text{if}(s \text{ is null}, \text{null}, \text{true})  &amp;amp; \iff (s \text{ is not null}) \text{ or null} \\
\end{align}\]

&lt;p&gt;Let’s break it down to see why that works.&lt;/p&gt;

\[\begin{align}         
   \text{if}(s \text{ is null}, \text{null}, \text{false}) &amp;amp; = (s \text{ is null}) \text{ and null} \\ 
      &amp;amp; = \begin{cases}
             \text{true and null}  &amp;amp; = \text{null},   &amp;amp; \text{if } s \text{ is null} \\
             \text{false and null} &amp;amp; = \text{false},  &amp;amp; \text{if } s \text{ is not null} 
          \end{cases} \\[5pt]
   \text{if}(s \text{ is null}, \text{null}, \text{true})  &amp;amp; = (s \text{ is not null}) \text{ or null} \\
      &amp;amp; = \begin{cases}
              \text{false or null}  &amp;amp; = \text{null},   &amp;amp; \text{if } s \text{ is null} \\
              \text{true or null}   &amp;amp; = \text{true},   &amp;amp; \text{if } s \text{ is not null} 
           \end{cases}
\end{align}\]

&lt;h1 id=&quot;putting-it-all-together&quot;&gt;Putting it all together&lt;/h1&gt;

&lt;p&gt;Now that we’ve had a taste of how this optimization works, let’s put it all together into one rule to rule
them all.&lt;/p&gt;

&lt;p&gt;Given an expression of the following form,&lt;/p&gt;

\[\cast{S}{T}(s) \otimes t \quad s \in S, t \in T, \otimes \in [=, \ne, &amp;lt;, \le, &amp;gt;, \ge]\]

&lt;p&gt;we derive a transformation based on the rules below.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;If \(t \text{ is null} \Rightarrow \cast{S}{T}(s) \otimes t \iff \text{null} \tag{1}\) \(\\[5pt]\)&lt;/li&gt;
  &lt;li&gt;If \(\exists s&apos; \in S \ldotp s&apos; = \cast{T}{S}(t)\), we calculate \(t&apos; = \cast{S}{T}(s&apos;)\) and consider 
the following cases:
    &lt;ol&gt;
      &lt;li&gt;&lt;a name=&quot;2.1&quot;&gt;&lt;/a&gt; If \(t = t&apos; \Rightarrow \cast{S}{T}(s) \otimes t \iff s \otimes \cast{T}{S}(t) \tag{2.1}\) \(\\[5pt]\)
        &lt;ul&gt;
          &lt;li&gt;&lt;a name=&quot;2.1.1&quot;&gt;&lt;/a&gt; In the special case where \(\\[5pt]\) \(\quad  s&apos; = \text{min}_S  \Rightarrow   
 \left\{
  \begin{array}{@{}ll@{}}
 \cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s \ne \text{min}_{S}     \\
 \cast{S}{T}(s) \ge t &amp;amp; \iff \trueOrNull{s}           \\
 \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \falseOrNull{s}          \\
 \cast{S}{T}(s) \le t &amp;amp; \iff s = \text{min}_{S}
  \end{array}\right. \tag{2.1.1}  \\[5pt]\)&lt;/li&gt;
          &lt;li&gt;&lt;a name=&quot;2.1.2&quot;&gt;&lt;/a&gt; In the special case where \(\\[5pt]\) \(\quad s&apos; = \text{max}_S  \Rightarrow 
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff \falseOrNull{s}        \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s = \text{max}_{S}     \\
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s \ne \text{max}_{S}   \\
\cast{S}{T}(s) \le t &amp;amp; \iff \trueOrNull{s}
  \end{array}\right. \tag{2.1.2} \\[5pt]\)&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Otherwise, \(\\[5pt]\) \(\quad  t \ne t&apos; \Rightarrow 
 \left\{
  \begin{array}{@{}ll@{}}
   \cast{S}{T}(s) = t   &amp;amp; \iff \falseOrNull{s}        \\
   \cast{S}{T}(s) \ne t &amp;amp; \iff \trueOrNull{s}            
  \end{array}\right. \tag{2.2} \\[5pt]\)&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;
            &lt;p&gt;Further, if \(\\[5pt]\) \(\quad \quad  t &amp;lt; t&apos; \Rightarrow 
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s \ge \cast{T}{S}(t)    \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s \ge \cast{T}{S}(t)    \\
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s &amp;lt;  \cast{T}{S}(t)     \\
\cast{S}{T}(s) \le t &amp;amp; \iff s &amp;lt;  \cast{T}{S}(t)
  \end{array}\right. \tag{2.2.1} \\[5pt]\)&lt;br /&gt;
 In the special case where \(\\[5pt]\) \(\quad \quad s&apos; = \text{max}_S  \Rightarrow  
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s = \text{max}_{S}    \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s = \text{max}_{S}    \\
  \end{array}\right. \\[5pt] \tag{2.2.1.1}\)&lt;/p&gt;
          &lt;/li&gt;
          &lt;li&gt;
            &lt;p&gt;Otherwise, if \(\\[5pt]\) \(\quad \quad  t &amp;gt; t&apos; \Rightarrow
 \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;gt; t   &amp;amp; \iff s &amp;gt;    \cast{T}{S}(t)    \\
\cast{S}{T}(s) \ge t &amp;amp; \iff s &amp;gt;    \cast{T}{S}(t)    \\
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s \le  \cast{T}{S}(t)    \\
\cast{S}{T}(s) \le t &amp;amp; \iff s \le  \cast{T}{S}(t)
  \end{array}\right. \\[5pt] \tag{2.2.2}\)&lt;br /&gt;
 In the special case where \(\\[5pt]\) \(\quad \quad s&apos; = \text{min}_S  \Rightarrow  
  \left\{
  \begin{array}{@{}ll@{}}
\cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s = \text{min}_{S}    \\
\cast{S}{T}(s) \le t &amp;amp; \iff s = \text{min}_{S}
 \end{array}\right. \\[5pt] \tag{2.2.2.1}\)&lt;/p&gt;
          &lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;If \(\cast{T}{S}\) is undefined or \(\cast{T}{S}(t)\) fails, \(\\[5pt]\) \(t &amp;lt; \cast{S}{T}(\text{min}_S) \Rightarrow  
  \left\{
 \begin{array}{@{}ll@{}}
         \cast{S}{T}(s) =   t &amp;amp; \iff \falseOrNull{s}    \\
         \cast{S}{T}(s) \ne t &amp;amp; \iff \trueOrNull{s}     \\
         \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \falseOrNull{s}    \\
         \cast{S}{T}(s) \le t &amp;amp; \iff \falseOrNull{s}    \\
         \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff \trueOrNull{s}     \\
         \cast{S}{T}(s) \ge t &amp;amp; \iff \trueOrNull{s}     
\end{array}\right. \\[5pt] \tag{3.1}\)
\(t = \cast{S}{T}(\text{min}_S) \Rightarrow  
  \left\{
 \begin{array}{@{}ll@{}}
         \cast{S}{T}(s) =   t &amp;amp; \iff s = \text{min}_S       \\
         \cast{S}{T}(s) \ne t &amp;amp; \iff s &amp;gt; \text{min}_S       \\
         \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \falseOrNull{s}        \\
         \cast{S}{T}(s) \le t &amp;amp; \iff s = \text{min}_S       \\
         \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff s &amp;gt; \text{min}_S       \\
         \cast{S}{T}(s) \ge t &amp;amp; \iff \trueOrNull{s}     
\end{array}\right. \\[5pt] \tag{3.2}\)
\(t &amp;gt; \cast{S}{T}(\text{max}_S) \Rightarrow  
  \left\{
    \begin{array}{@{}ll@{}}
            \cast{S}{T}(s) =   t &amp;amp; \iff \falseOrNull{s}    \\
            \cast{S}{T}(s) \ne t &amp;amp; \iff \trueOrNull{s}     \\
            \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff \trueOrNull{s}     \\
            \cast{S}{T}(s) \le t &amp;amp; \iff \trueOrNull{s}     \\
            \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff \falseOrNull{s}    \\
            \cast{S}{T}(s) \ge t &amp;amp; \iff \falseOrNull{s}    
   \end{array}\right. \\[5pt] \tag{3.3}\)
\(t = \cast{S}{T}(\text{max}_S) \Rightarrow  
 \left\{
   \begin{array}{@{}ll@{}}
           \cast{S}{T}(s) =   t &amp;amp; \iff s = \text{max}_S   \\
           \cast{S}{T}(s) \ne t &amp;amp; \iff s &amp;lt; \text{max}_S   \\
           \cast{S}{T}(s) &amp;lt;   t &amp;amp; \iff s &amp;lt; \text{max}_S   \\
           \cast{S}{T}(s) \le t &amp;amp; \iff \trueOrNull{s}     \\
           \cast{S}{T}(s) &amp;gt;   t &amp;amp; \iff \falseOrNull{s}    \\
           \cast{S}{T}(s) \ge t &amp;amp; \iff s = \text{max}_S       
  \end{array}\right. \\[5pt] \tag{3.4}\) &lt;br /&gt;
 Otherwise, the transformation is not applicable.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;omgwtfnan&quot;&gt;OMGWTFNaN&lt;/h1&gt;

&lt;p&gt;As if all of this weren’t enough, there’s an additional complication we need to handle for types such
as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;. Those types are what the SQL specification calls &lt;em&gt;approximate numeric&lt;/em&gt; types.
Presto implements them as &lt;a href=&quot;https://en.wikipedia.org/wiki/IEEE_754&quot;&gt;IEEE-754&lt;/a&gt; single and double 
precision floating point numbers, respectively.&lt;/p&gt;

&lt;p&gt;In addition to finite numbers, IEEE-754 defines an additional set of values: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;∞&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; (not a number).
It is worth noting that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-∞&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+∞&lt;/code&gt; do not behave like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;∞&lt;/code&gt; in the mathematical sense. They are actual values
in the ordered set of numbers, but they don’t represent any finite number. Therefore, the following relations hold:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;-∞ &amp;lt; -1.23E30 &amp;lt; 0 &amp;lt; 3.45E25 &amp;lt; +∞
-∞ = -∞
+∞ = +∞ 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-∞&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+∞&lt;/code&gt; can be treated as regular values, we can use them as the minimum and maximum values of the range
for these types. Any other choice would not work, since all values of a type must be contained within the range of the type
for the transformation to be valid. That is,&lt;/p&gt;

\[\forall v \in T \quad T_{\text{min}} \le v \le T_{\text{max}}\]

&lt;p&gt;Let’s look at an example to understand why this is necessary. Instead of using \([-∞, ∞]\) as the range, 
let’s say we picked the minimum and maximum representable values for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; type (-3.4028235E38 and 3.4028235E38), and
consider this expression (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s :: real&lt;/code&gt;):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cast(s AS double) &amp;gt;= double &apos;3.4028235E38&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the rules in the previous section, \(t = 3.4028235\text{E}38\), \(s&apos;= 3.4028235\text{E}38\) and \(t&apos; = 3.4028235E38\). Since 
\(t = t&apos;\) and \(s&apos; = max_S\), from &lt;a href=&quot;#2.1.2&quot;&gt;rule 2.1.2&lt;/a&gt;, the expression reduces to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s = 3.4028235E38 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is clearly incorrect. When &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s = Infinity&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cast(s AS double)&lt;/code&gt; results in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double &apos;Infinity&apos;&lt;/code&gt;, which is not equal
to 3.4028235E38.&lt;/p&gt;

&lt;p&gt;On the other hand, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; doesn’t obey any of the comparison rules. It’s neither equal nor distinct from itself, and
it’s neither larger, nor smaller than any other value:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;NaN =  NaN  ⟺  false  
NaN &amp;lt;&amp;gt; NaN  ⟺  false
NaN &amp;gt; 0     ⟺  false
NaN = 0     ⟺  false
NaN &amp;lt; 0     ⟺  false
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt; is not part of the ordered set of values for these types, and the requirement that every value be contained 
in the range doesn’t hold. From &lt;a href=&quot;#2.1.1&quot;&gt;rule 2.1.1&lt;/a&gt;, an expression such as:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cast(s AS double) &amp;gt;= double &apos;-Infinity&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;reduces to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if(s is null, null, true)&lt;/code&gt;, which is incorrect, since the expression returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt; when &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Is all hope lost for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;real&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt;? Fortunately, not. The range is only needed as an optimization. If we
forgo defining a range for types that don’t have the required properties, the special cases &lt;a href=&quot;#2.1.1&quot;&gt;2.1.1&lt;/a&gt; and 
&lt;a href=&quot;#2.1.2&quot;&gt;2.1.2&lt;/a&gt; don’t apply, and by &lt;a href=&quot;#2.1&quot;&gt;rule 2.1&lt;/a&gt;, the expression is equivalent to:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;s &amp;gt;= real &apos;-Infinity&apos;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which correctly returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt; when &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NaN&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;-show-me-the-money&quot;&gt;&lt;a name=&quot;results&quot;&gt;&lt;/a&gt; Show me the money!&lt;/h1&gt;

&lt;p&gt;So, does all of this even matter? Why, yes! Glad you asked.&lt;/p&gt;

&lt;p&gt;As with any performance optimization, you can improve things by working smarter (can you avoid work that can be 
proven to be unnecessary) or by working harder (can you do the work you have to do more efficiently). This
optimization does a little of both. Let’s consider three scenarios when it has a positive effect.&lt;/p&gt;

&lt;h4 id=&quot;dead-code&quot;&gt;Dead code&lt;/h4&gt;

&lt;p&gt;Since in some cases it can prove that the comparisons will always produce &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt;, regardless of the input,
it can short-circuit entire conditions or subplans before even a single row of data is read. Some query generation 
tools are not sophisticated enough and may emit queries that contain that kind of construct. Also, everyone makes
mistakes, and it’s not hard to end up with queries that contain what’s effectively &lt;em&gt;dead code&lt;/em&gt;.  The last thing you
want is to sit in front of the screen waiting for a query to complete … waiting … waiting … just for Presto
to tell you &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;¯\_(ツ)_/¯&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example, given:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;smallint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- &amp;lt;insert lots of rows into t&amp;gt; --&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000000&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Produces the following query plan (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Values&lt;/code&gt; is an empty inline table):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- Output[x]
  - Values
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;improved-join-performance&quot;&gt;Improved JOIN performance&lt;/h4&gt;

&lt;p&gt;What’s nice about this optimization is that it &lt;em&gt;enables&lt;/em&gt; other optimizations to work better. We mentioned earlier
that comparisons that are not simple expressions between columns, or between columns and constants, make it harder for the
predicate pushdown optimization to infer predicates that can be propagated to the other branch of a join.&lt;/p&gt;

&lt;p&gt;Given two tables:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;smallint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the following query:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;1&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The query plan without this optimization is:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- Output[name]
  - InnerJoin[expr = v]
    - ScanFilterProject[t1, filter = CAST(v AS bigint) = BIGINT &apos;1&apos;]
        expr := CAST(v AS bigint)
    - TableScan[t2]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The optimization allows the predicate pushdown logic to apply the condition to the other side of the join, producing
a much better plan. If data in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t1&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t2&lt;/code&gt; is somehow organized by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v&lt;/code&gt; (e.g., a partition key in Hive), or if the
connector understands how to apply the filter at the source, the query won’t need to even read certain parts of the
table. The query plan with the optimization enabled:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- Output[name]
  - CrossJoin
    - ScanFilterProject[t1, filter = (v = SMALLINT &apos;1&apos;)]
    - ScanFilterProject[t2, filter = (v = BIGINT &apos;1&apos;)]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;best-bang-for-the-buck&quot;&gt;Best bang for the buck&lt;/h4&gt;

&lt;p&gt;Finally, if the condition absolutely needs to be evaluated, the transformed expression could be significantly
more efficient, especially when the cast between the two types is expensive. To illustrate, given a table
with 1 billion rows and a column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k :: bigint&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count_if&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CAST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;decimal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;19&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; 
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Without the optimization:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- [...]
    - ScanProject
===&amp;gt;    CPU: 3.75m (66.34%), Scheduled: 5.56m (145.22%)
        expr := (CAST(&quot;k&quot; AS decimal(19,0)) &amp;gt; CAST(DECIMAL &apos;0&apos; AS decimal(19,0)))
        
        
Query 20190515_072240_00006_rgzb4, FINISHED, 4 nodes
Splits: 110 total, 110 done (100.00%)
0:22 [1000M rows, 8.4GB] [46M rows/s, 395MB/s]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With the optimization:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;- [...]
    - ScanProject
===&amp;gt;    CPU: 29.93s (58.17%), Scheduled: 47.44s (145.07%)
        expr := (&quot;k&quot; &amp;gt; BIGINT &apos;0&apos;)
        
        
Query 20190515_071912_00005_bz6cb, FINISHED, 4 nodes
Splits: 110 total, 110 done (100.00%)
0:03 [1000M rows, 8.4GB] [335M rows/s, 2.81GB/s]        
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thirsty for more? Here’s the &lt;a href=&quot;https://github.com/trinodb/trino/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/UnwrapCastInComparison.java&quot;&gt;code&lt;/a&gt;. 
Happy querying!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Many thanks to &lt;a href=&quot;https://github.com/kasiafi&quot;&gt;kasiafi&lt;/a&gt; for their thoughtful and thorough feedback on early
drafts of this post.&lt;/em&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso</name>
        </author>
      

      <summary>The next release of Presto (version 312) will include a new optimization to remove unnecessary casts which might have been added implicitly by the query planner or explicitly by users when they wrote the query.</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Summit 2019 @TwitterSF</title>
      <link href="https://trino.io/blog/2019/05/17/Presto-Summit.html" rel="alternate" type="text/html" title="Presto Summit 2019 @TwitterSF" />
      <published>2019-05-17T00:00:00+00:00</published>
      <updated>2019-05-17T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/17/Presto-Summit</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/17/Presto-Summit.html">&lt;p&gt;Next month will mark the 2nd annual Presto Summit hosted by the
&lt;a href=&quot;https://trino.io/foundation.html&quot;&gt;Presto Software Foundation&lt;/a&gt;,
&lt;a href=&quot;https://starburstdata.com&quot;&gt;Starburst Data&lt;/a&gt;, and &lt;a href=&quot;https://twitter.com&quot;&gt;Twitter&lt;/a&gt;. Last year’s event was
a great success (see the
&lt;a href=&quot;https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/&quot;&gt;Presto Summit 2018 recap&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Please join the community of Presto users and developers for an all-day event dedicated to the world’s fastest 
distributed SQL query engine. At the Summit we’ll share the latest on Presto and learn how some of the most 
innovative companies are using this technology to power their analytics platforms.&lt;/p&gt;

&lt;p&gt;The agenda will feature talks from some of the world’s largest and innovative Presto users:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Comcast&lt;/li&gt;
  &lt;li&gt;Twitter&lt;/li&gt;
  &lt;li&gt;Nordstrom&lt;/li&gt;
  &lt;li&gt;Grubhub&lt;/li&gt;
  &lt;li&gt;Lyft&lt;/li&gt;
  &lt;li&gt;Netflix&lt;/li&gt;
  &lt;li&gt;LinkedIn&lt;/li&gt;
  &lt;li&gt;Criteo&lt;/li&gt;
  &lt;li&gt;Starburst&lt;/li&gt;
  &lt;li&gt;Presto Software Foundation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(the details will be announced soon)&lt;/p&gt;

&lt;p&gt;If you wish to speak at the event, the call for papers is still open:
&lt;a href=&quot;https://www.starburstdata.com/2019-presto-summit-speaker-registration/&quot;&gt;2019 Presto Summit – Speaker Registration&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Please RSVP to secure your spot (space is limited):
&lt;a href=&quot;https://prestosummit.splashthat.com/&quot;&gt;Presto Summit 2019 @TwitterSF&lt;/a&gt;&lt;/p&gt;</content>

      
        <author>
          <name>Kamil Bajda-Pawlikowski</name>
        </author>
      

      <summary>Next month will mark the 2nd annual Presto Summit hosted by the Presto Software Foundation, Starburst Data, and Twitter. Last year’s event was a great success (see the Presto Summit 2018 recap).</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 311</title>
      <link href="https://trino.io/blog/2019/05/15/release-311.html" rel="alternate" type="text/html" title="Release 311" />
      <published>2019-05-15T00:00:00+00:00</published>
      <updated>2019-05-15T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/15/release-311</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/15/release-311.html">&lt;p&gt;This version adds standard
&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#offset-clause&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OFFSET&lt;/code&gt;&lt;/a&gt;
syntax, a new function
&lt;a href=&quot;https://trino.io/docs/current/functions/array.html#combinations&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;combinations()&lt;/code&gt;&lt;/a&gt;
for computing k-combinations of array elements,
and support for nested collections in Cassandra.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-311.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds standard OFFSET syntax, a new function combinations() for computing k-combinations of array elements, and support for nested collections in Cassandra. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-05-08</title>
      <link href="https://trino.io/blog/2019/05/08/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-05-08" />
      <published>2019-05-08T00:00:00+00:00</published>
      <updated>2019-05-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/08/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/08/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/FL0O62iCkE8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Existing function support&lt;/li&gt;
  &lt;li&gt;Function namespaces&lt;/li&gt;
  &lt;li&gt;Connector-resolved functions&lt;/li&gt;
  &lt;li&gt;SQL-defined functions&lt;/li&gt;
  &lt;li&gt;Remote functions&lt;/li&gt;
  &lt;li&gt;Polymorphic table functions&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Existing function support Function namespaces Connector-resolved functions SQL-defined functions Remote functions Polymorphic table functions</summary>

      
      
    </entry>
  
    <entry>
      <title>Faster S3 Reads</title>
      <link href="https://trino.io/blog/2019/05/06/faster-s3-reads.html" rel="alternate" type="text/html" title="Faster S3 Reads" />
      <published>2019-05-06T00:00:00+00:00</published>
      <updated>2019-05-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/06/faster-s3-reads</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/06/faster-s3-reads.html">&lt;p&gt;Presto is known for working well with Amazon S3. We recently made an
improvement that greatly reduces network utilization and latency when
reading ORC or Parquet data.&lt;/p&gt;

&lt;h1 id=&quot;the-problem&quot;&gt;The problem&lt;/h1&gt;

&lt;p&gt;The improvement started with a question
from &lt;a href=&quot;https://github.com/bzillins&quot;&gt;Brenton Zillins&lt;/a&gt;
at &lt;a href=&quot;https://www.stackpath.com/&quot;&gt;Stackpath&lt;/a&gt;
on our &lt;a href=&quot;https://trino.io/slack.html&quot;&gt;Slack&lt;/a&gt; workspace. He noticed
that the network traffic to Presto workers was many times larger than the
amount of input data reported by Presto for the query.&lt;/p&gt;

&lt;p&gt;After a lively discussion on the Slack channel, we found the cause. Parquet
would perform a positioned read against the S3 file system to ask for an
exact byte range (start and end). However, the file system only implemented
the streaming API, so it would tell S3 about the starting location, but
not the end location. The file system would stop reading from the stream once
it reached the requested end location, but substantial additional data could
be read from S3 due to various buffers in different parts of the system.&lt;/p&gt;

&lt;p&gt;The streaming API has an additional problem. Establishing a new connection
to S3 incurs latency, especially when using secure connections over TLS.
There is no way to abort a streaming request to S3, other than by closing
the connection, so the file system is forced to close connections after
every request, thus preventing the connection from being reused.&lt;/p&gt;

&lt;h1 id=&quot;the-fix&quot;&gt;The fix&lt;/h1&gt;

&lt;p&gt;We solved this by implementing positioned reads in the S3 file system.
Position reads, which are the only types used by ORC and Parquet, work by
asking S3 for the exact byte range required. These reads use the minimal
amount of network traffic and allow the connection to be reused.&lt;/p&gt;

&lt;p&gt;Brenton tested out the change and reported success:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This PR brought us from &amp;gt;1 GB/s object read rate to under 10 MB/s
the same query. Thank you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While this issue is obvious in retrospect, we are surprised that it took
so long to find it, given that S3 is one of the most popular storage systems.
This is a great example of how the community makes everything better.
Being observant and reporting an issue can have a huge win for everyone.&lt;/p&gt;

&lt;h1 id=&quot;how-to-get-it&quot;&gt;How to get it&lt;/h1&gt;

&lt;p&gt;This improvement is in &lt;a href=&quot;https://trino.io/download.html&quot;&gt;Presto 302+&lt;/a&gt;,
so you will need to upgrade if you are using an earlier version.&lt;/p&gt;</content>

      
        <author>
          <name>David Phillips</name>
        </author>
      

      <summary>Presto is known for working well with Amazon S3. We recently made an improvement that greatly reduces network utilization and latency when reading ORC or Parquet data.</summary>

      
      
    </entry>
  
    <entry>
      <title>A review of the first international Presto Conference, Tel Aviv, April 2019</title>
      <link href="https://trino.io/blog/2019/05/03/Presto-Conference-Israel.html" rel="alternate" type="text/html" title="A review of the first international Presto Conference, Tel Aviv, April 2019" />
      <published>2019-05-03T00:00:00+00:00</published>
      <updated>2019-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/03/Presto-Conference-Israel</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/03/Presto-Conference-Israel.html">&lt;p&gt;&lt;strong&gt;Community&lt;/strong&gt;, &lt;em&gt;noun&lt;/em&gt;: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals”&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/audience.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The fun picture you see here was taken at the first lecture of the First international
Presto summit in Israel last month.&lt;/p&gt;

&lt;p&gt;The atmosphere in the room during the various presentations was unique. It’s as if you
could physically feel the brainpower of 250 engineers fascinated by technology in one room.&lt;/p&gt;

&lt;p&gt;We would like to share with you a bit of the content that was discussed during
the conference. Enjoy the read and the videos!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h1 id=&quot;presto-software-foundation-presentation&quot;&gt;Presto Software Foundation presentation&lt;/h1&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/intro.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The day started with &lt;a href=&quot;https://www.linkedin.com/in/dainsundstrom/&quot;&gt;Dain Sundstrom&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/traversomartin/&quot;&gt;Martin Traverso&lt;/a&gt;, and
&lt;a href=&quot;https://www.linkedin.com/in/electrum/&quot;&gt;David Phillips&lt;/a&gt;, Presto founders
who gave us a great panoramic view on &lt;a href=&quot;https://trino.io/foundation.html&quot;&gt;Presto Software Foundation&lt;/a&gt;,
past, present, and future roadmap.&lt;/p&gt;

&lt;p&gt;The Presto founders presented in their talk the following topics:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Presto foundation creation&lt;/li&gt;
  &lt;li&gt;ORC improvements&lt;/li&gt;
  &lt;li&gt;The complex pushdown algorithm in details&lt;/li&gt;
  &lt;li&gt;The opensource roadmap strategy and more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/pushdown.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can find the entire video of the presentation &lt;a href=&quot;https://vimeo.com/331764101&quot;&gt;here&lt;/a&gt; and the
slides &lt;a href=&quot;https://www.slideshare.net/OriReshef/presto-summit-israel-201904&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;varada-presentation&quot;&gt;Varada presentation&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/david-krakov/&quot;&gt;David Krakov&lt;/a&gt;, co-founder and CTO at &lt;a href=&quot;https://varada.io&quot;&gt;Varada&lt;/a&gt;
explained how Varada is an example of how Presto can be leveraged to create a new innovative technology that
allows interactive analytics on top of a data lakes extracted sets, or in other words Presto for apps.&lt;/p&gt;

&lt;p&gt;David presented the three axes of innovation that the Varada team created, to achieve an indexed big
data on a distributed platform:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;SSD and NVMeF distributed calculation&lt;/li&gt;
  &lt;li&gt;All dimensions are indexed in the ingest process&lt;/li&gt;
  &lt;li&gt;Synchronization&lt;/li&gt;
  &lt;li&gt;Fully automated copy management directly connected to the raw data in the data lake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/varada1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can find the video of the presentation &lt;a href=&quot;https://vimeo.com/331767154&quot;&gt;here&lt;/a&gt; and the slides
&lt;a href=&quot;https://www.slideshare.net/OriReshef/presto-for-apps-deck-varada-prestoconf&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;wix-open-sourcing-quix&quot;&gt;WiX open sourcing Quix&lt;/h1&gt;

&lt;p&gt;The big announcement of the conference came from &lt;a href=&quot;https://www.linkedin.com/in/valeryfrolov/&quot;&gt;Valery Florov&lt;/a&gt;
of &lt;a href=&quot;http://wix.com/&quot;&gt;Wix&lt;/a&gt;. As a web-scale data-driven company, with 150M users, Wix has more than 1000 users
of Presto, and over 100K daily queries.&lt;/p&gt;

&lt;p&gt;All those queries come through a unified front end for data discovery, transformation, and query: the Quix
IDE. Quix is simultaneously:
A notebook manager for users to write and share executable notes&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dataset explorer showing catalogs and metadata&lt;/li&gt;
  &lt;li&gt;Feature-rich SQL query editor&lt;/li&gt;
  &lt;li&gt;Job scheduler for ETL jobs&lt;/li&gt;
  &lt;li&gt;Wix has open-sourced most of Quix, available under an MIT license at https://github.com/wix-incubator/quix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/wix.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As a Presto centric company Wix has developed few more exciting enhancements:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;HBase + Parquet interleaving to mix compacted historic data and latest 14 days&lt;/li&gt;
  &lt;li&gt;One SQL - a query rewriter that unifies usage of Presto and BigQuery to one SQL&lt;/li&gt;
  &lt;li&gt;ActiveDirectory data security layer to control access to data&lt;/li&gt;
  &lt;li&gt;Google Drive integration - run Presto SQL directly on Google Sheets. This is one of the coolest connectors
to be created and generated a lot of excitement. Can’t wait for Wix to open source this one as well!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See more in the &lt;a href=&quot;https://vimeo.com/331767442&quot;&gt;video&lt;/a&gt;,
&lt;a href=&quot;https://www.slideshare.net/OriReshef/quix-presto-ide-presto-summit-il&quot;&gt;slides&lt;/a&gt;,
&lt;a href=&quot;https://github.com/wix-incubator/quix&quot;&gt;source code&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;ironsource----analyzing-data-at-a-petabyte-scale&quot;&gt;Ironsource -  Analyzing data at a petabyte scale.&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://www.ironsrc.com/&quot;&gt;Ironsource&lt;/a&gt; is the ad network of choice for the gaming industry.  Supplying
solutions for application developers, customer engagement solutions and Ad monetization. Ironsource collects
terabytes of events on a daily basis.&lt;/p&gt;

&lt;p&gt;In his talk, &lt;a href=&quot;https://www.linkedin.com/in/korenor/&quot;&gt;Or Koren&lt;/a&gt;, head of the data team at Ironsource, shared
their journey from terabyte scale to petabyte scale. In his talk Or showed how their entire interactive
analytics platform was rebuilt to be based on Presto, and the huge savings they got from it including new
business insights coming from their data science teams and the data analyst team.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/Israel-2019/ironsource1.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
      &lt;td&gt;&lt;img src=&quot;/assets/blog/Israel-2019/ironsource2.png&quot; alt=&quot;&quot; /&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The before and after slides that Or presented in a very clear way the reduction in cost and the increase
in efficiency that the use of Presto brought to Ironsource.&lt;/p&gt;

&lt;p&gt;See Or’s slides &lt;a href=&quot;https://www.slideshare.net/OriReshef/data-analytics-at-a-petabyte-scale-final&quot;&gt;here&lt;/a&gt; and the
talk &lt;a href=&quot;https://vimeo.com/333732300&quot;&gt;video&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;datorama-on-mutable-data-at-scale&quot;&gt;Datorama on mutable data at scale&lt;/h1&gt;

&lt;p&gt;A charismatic presenter, &lt;a href=&quot;https://www.linkedin.com/in/afinkelstein/&quot;&gt;Alexey Finkelstein&lt;/a&gt; from
&lt;a href=&quot;https://datorama.com/&quot;&gt;Salesforce Datorama&lt;/a&gt; had the room rolling with laughter more than once, and
on a topic of no laughter: managing mutable data with Presto.  Datorama provides a marketing intelligence
platform. It has 30,000 customers, who can interactively interact with 1.5PB of data available for interactive
queries.&lt;/p&gt;

&lt;p&gt;Datorama provides for that a “data lake as a service”, called a DatoLake. Files on data lakes by their nature
are not transactionally updatable on a row level, but the users of Datorama require the ability to delete/update
 specific rows in a transactional manner.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/datorama.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To solve this Datorma has embarked on a journey. Based on partitioning the data by a version number (such as
 20190101_&lt;strong&gt;009&lt;/strong&gt;), and rebuilding a partition based on updates.  There were 3 attempts to the journey and
learning on each step:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;At first, using an external Postgres metastore to store the versions, swapping in the metastore and using
that as part of a sub-query to Presto to use the correct version. This approach did not pushdown partition pruning.&lt;/li&gt;
  &lt;li&gt;Next, moving the metastore query to happen before query generation, and be dynamically generate the right filter
at each sub-query. This approach required two-pass processing for each query and did not support direct SQL to clients.&lt;/li&gt;
  &lt;li&gt;And finally, swapping the partition in the Hive Metastore in a transactional manner directly in the Hive Metastore
database (MySQL), and refresh the Presto hive cache. With this approach, queries do not need to know about the
version change and full separation of the mutability logic from the query is achieved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See much more details in the &lt;a href=&quot;https://vimeo.com/333759030&quot;&gt;video&lt;/a&gt;, &lt;a href=&quot;https://www.slideshare.net/OriReshef/mutable-data-scale&quot;&gt;slides&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;varada-join-optimization-and-dynamic-filtering&quot;&gt;Varada, Join Optimization and Dynamic filtering&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/in/romanzeyde/&quot;&gt;Roman Zeyde&lt;/a&gt; is Varada’s Presto architect. Roman has a unique
algorithmic background being a Talpiot graduate and an ex-Googler.&lt;/p&gt;

&lt;p&gt;Roman’s talk discussed a new approach to make Joins work faster. Varada will contribute Roman’s work on dynamic
filtering back to the community. Stay tuned :)&lt;/p&gt;

&lt;p&gt;The talk went over the following major topics:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Presto Cost Based Optimizer feature as a basis for Join optimization&lt;/li&gt;
  &lt;li&gt;Join optimzation strategies&lt;/li&gt;
  &lt;li&gt;Dynamic filtering in the application for join optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/varada2.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Roman’s &lt;a href=&quot;https://vimeo.com/331946107&quot;&gt;talk&lt;/a&gt;, &lt;a href=&quot;https://www.slideshare.net/OriReshef/dynamic-filtering-for-presto-join-optimisation&quot;&gt;slides&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;qa-session&quot;&gt;Q&amp;amp;A session&lt;/h1&gt;

&lt;p&gt;The event finished by an hour-long Q&amp;amp;A session led by &lt;a href=&quot;https://www.linkedin.com/in/demibenari/&quot;&gt;Demi Ben-Ari&lt;/a&gt;, VP R&amp;amp;S at
&lt;a href=&quot;https://www.panorays.com/&quot;&gt;Panorays&lt;/a&gt; and co-founder of Big Things, an Israeli Meetup group having 5000 people listed,
all fans of Big data technologies.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/Israel-2019/qa.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;See you all in the Second international Presto Conference in Tel Aviv!&lt;/p&gt;</content>

      
        <author>
          <name>Ori Reshef, VP Product, Varada</name>
        </author>
      

      <summary>Community, noun: “A feeling of fellowship with others, as a result of sharing common attributes, interests, and goals” The fun picture you see here was taken at the first lecture of the First international Presto summit in Israel last month. The atmosphere in the room during the various presentations was unique. It’s as if you could physically feel the brainpower of 250 engineers fascinated by technology in one room. We would like to share with you a bit of the content that was discussed during the conference. Enjoy the read and the videos!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/Israel-2019/audience.jpg" />
      
    </entry>
  
    <entry>
      <title>Release 310</title>
      <link href="https://trino.io/blog/2019/05/03/release-310.html" rel="alternate" type="text/html" title="Release 310" />
      <published>2019-05-03T00:00:00+00:00</published>
      <updated>2019-05-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/05/03/release-310</id>
      <content type="html" xml:base="https://trino.io/blog/2019/05/03/release-310.html">&lt;p&gt;This version adds standard
&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#limit-or-fetch-first-clauses&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FETCH FIRST&lt;/code&gt;&lt;/a&gt;
syntax, support for using an
&lt;a href=&quot;https://trino.io/docs/current/connector/hive.html#s3-credentials&quot;&gt;alternate AWS role&lt;/a&gt;
when accessing S3 or Glue, and improved handling of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DECIMAL&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DOUBLE&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REAL&lt;/code&gt;
when Hive table and partition metadata differ.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-310.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds standard FETCH FIRST syntax, support for using an alternate AWS role when accessing S3 or Glue, and improved handling of DECIMAL, DOUBLE, and REAL when Hive table and partition metadata differ. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 309</title>
      <link href="https://trino.io/blog/2019/04/25/release-309.html" rel="alternate" type="text/html" title="Release 309" />
      <published>2019-04-25T00:00:00+00:00</published>
      <updated>2019-04-25T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/25/release-309</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/25/release-309.html">&lt;p&gt;This version adds support for case-insensitive name matching in
JDBC-based connectors, more data types in
&lt;a href=&quot;https://trino.io/docs/current/connector/postgresql.html&quot;&gt;PostgreSQL connector&lt;/a&gt;,
and some bug fixes.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-309.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version adds support for case-insensitive name matching in JDBC-based connectors, more data types in PostgreSQL connector, and some bug fixes. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Even Faster ORC</title>
      <link href="https://trino.io/blog/2019/04/23/even-faster-orc.html" rel="alternate" type="text/html" title="Even Faster ORC" />
      <published>2019-04-23T00:00:00+00:00</published>
      <updated>2019-04-23T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/23/even-faster-orc</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/23/even-faster-orc.html">&lt;p&gt;Trino is known for being the fastest SQL on Hadoop engine, and our custom ORC
reader implementation is a big reason for this speed – now it is even faster!&lt;/p&gt;

&lt;h2 id=&quot;why-is-this-important&quot;&gt;Why is this important?&lt;/h2&gt;

&lt;p&gt;For the TPC-DS benchmark, the new reader reduced the global query time by ~5%
and CPU usage by ~9%, which improves user experience while reducing the cost.&lt;/p&gt;

&lt;h2 id=&quot;what-improved&quot;&gt;What improved?&lt;/h2&gt;

&lt;p&gt;ORC uses a two step system to decode data. The first step is a traditional
compression algorithm like gzip that generically reduces data size. The second
step has data type specific compression algorithms that convert the raw bytes
into values (e.g., text, numbers, timestamps). It is this latter step that we
improved.&lt;/p&gt;

&lt;h2 id=&quot;how-much-faster-is-the-decoder&quot;&gt;How much faster is the decoder?&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/assets/blog/orc-speedup.svg&quot; alt=&quot;ORC Speedup&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-exactly-is-this-faster&quot;&gt;Why exactly is this faster?&lt;/h2&gt;

&lt;p&gt;Explaining why the new code is faster requires a brief explanation of the
existing code. In the old code, a typical value reader looked like this:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;skip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RunLengthEncodedBlock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;nc&quot;&gt;BlockBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createBlockBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;nextBit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;appendNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This code does a few things well. First, for the &lt;em&gt;all values are null&lt;/em&gt; case, it
returns a run length encoded block which has custom optimizations throughout
Trino (this &lt;a href=&quot;https://github.com/trinodb/trino/pull/229&quot;&gt;optimization&lt;/a&gt; was
recently added by &lt;a href=&quot;https://github.com/Praveen2112&quot;&gt;Praveen Krishna&lt;/a&gt;). Secondly,
it separates the unconditional &lt;em&gt;no nulls&lt;/em&gt; loop from the conditional &lt;em&gt;mixed nulls&lt;/em&gt;
loop. It is common to have a column without nulls, so it makes sense to split
this out, since unconditional loops are faster than conditional loops.&lt;/p&gt;

&lt;p&gt;On the downside, this code has several performance issues:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Many data encodings can be efficiently read in bulk, but this code reads one
value at a time.&lt;/li&gt;
  &lt;li&gt;In some cases, the code can be called with different type instances, which
result in slow dynamic dispatch call sites in the loop.&lt;/li&gt;
  &lt;li&gt;Value reading in the null loop is conditional, which is expensive.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;optimize-for-bulk-reads&quot;&gt;Optimize for bulk reads&lt;/h3&gt;

&lt;p&gt;As you can see from the code above, Trino is always loading values in batches
(typically 1024). This makes the reader and the downstream code more efficient as
the overhead of processing data is amortized over the batch, and in some cases
data can be processed in parallel. ORC has a small number of low level decoders
for booleans, numbers, bytes and so on. These encodings are optimized for each
data type, which means each must be optimized individually. In some cases, the
decoders already had internal batch output buffers, so the optimization was
trivial. In another equally trivial case, we changed the float and double stream
decoders from loading a value byte at a time to bulk loading an entire array of
values directly from the input and improved the performance more than 10x.&lt;/p&gt;

&lt;p&gt;Some changes, however, were significantly more complex. One example is the
boolean reader, which was changed from decoding a single bit at a time to
decoding 8 bits at a time. This sounds simple, but in practice doing this
efficiently is complex, since reads are not aligned to 8 bits, and there is the
general problem of forming JVM friendly loops. For those interested, the code is
&lt;a href=&quot;https://github.com/trinodb/trino/blob/308/presto-orc/src/main/java/io/prestosql/orc/stream/BooleanInputStream.java#L218&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;avoid-dynamic-dispatch-in-loops&quot;&gt;Avoid dynamic dispatch in loops&lt;/h3&gt;

&lt;p&gt;This is the kind of problem that is not obvious when reading code, and it is
easily missed in benchmarks. The core problem happens when you have a loop
containing a method call whose target class can vary over the lifetime of the
execution. For example, this simple loop from above may or may not be fast,
depending on how many different classes it sees for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type&lt;/code&gt; across multiple
executions:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Most of the ORC column readers can only be called with a single type
implementation, but the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LongStreamReader&lt;/code&gt; is called with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BIGINT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INTEGER&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SMALLINT&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TINYINT&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DATE&lt;/code&gt; types. This causes the JVM to generate a dynamic
dispatch in the core of the loop. Besides the obvious extra work to select the
target code and branch prediction problems, dynamic dispatch calls are normally
not inlined, which disables many powerful optimizations in the JVM. The good news
is that the fix is trivial:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;instanceof&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BigintType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;BlockBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createBlockBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;instanceof&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;IntegerType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;BlockBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createBlockBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The hard part is knowing that this is a problem. The existing benchmarks for ORC
only tested a single type at a time, which allowed the JVM to inline the target
method and produce much more optimal code. In this case, we happen to know that
the code is being invoked with multiple types, so we updated the benchmark to
warm up the JVM with multiple types before benchmarking.&lt;/p&gt;

&lt;p&gt;For more information on this kind of optimization, I suggest reading Aleksey
Shipilëv’s blog posts on JVM performance. Specifically, &lt;a href=&quot;https://shipilev.net/blog/2015/black-magic-method-dispatch&quot;&gt;The Black Magic of (Java)
Method Dispatch&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;improve-null-reading&quot;&gt;Improve null reading&lt;/h3&gt;

&lt;p&gt;With the above improvements, we were getting great performance of 0.5ns to 3ns
per value for most types without nulls, but the benchmarks with nulls were taking
an additional ~6ns per value. Some of that is expected, since we must decode the
additional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;present&lt;/code&gt; boolean stream, but booleans decode at a rate of ~0.5ns per
value, so that isn’t the problem. &lt;a href=&quot;https://github.com/martint&quot;&gt;Martin Traverso&lt;/a&gt;
and I built and benchmarked many different implementations, but we only found one
with really good performance.&lt;/p&gt;

&lt;p&gt;The first implementation we built was simply to bulk read a null array, bulk read
the values packed into the front of an array, and then spread the nulls across
the array:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// bulk read and count null values&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUnsetBits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// bulk read non-values into an array large enough for full results&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;longNonNullValueTemp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// copy non-null values into output position (in reverse order)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullSuppressedPosition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;--)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nullSuppressedPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;nullSuppressedPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;--;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is better because it always bulk reads the values, but there is still a ~4ns
per value penalty for nulls. We haven’t been able to explain why it happens, but
we’ve observed that the number drops dramatically after we adjusted the code to
assign to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;result[outputPosition]&lt;/code&gt; outside the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if&lt;/code&gt; block. We can’t do that
in-place, as in the snippet above, so we introduce a temporary buffer:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// bulk read and count null values&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;presentStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUnsetBits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// bulk read non-values into a temporary array&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tempBuffer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextBatchSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// copy values into result&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempBuffer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNull&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With this change, the null penalty drops to ~1.5ns per value, which is reasonable
given that just reading the null flag counts ~0.5ns per value. There are two
downsides to this approach. Obviously, there is an extra temporary buffer, but
since the reader is single threaded, we can reuse it for the whole file read.
Secondly, the null values are no longer zero. This should not be a problem for
correctly written code, but could potentially trigger latent bugs. We did find
another approach that left the nulls unset, but it was a bit slower and required
another temp buffer, so we settled on this approach.&lt;/p&gt;

&lt;h2 id=&quot;how-much-will-my-setup-improve&quot;&gt;How much will my setup improve?&lt;/h2&gt;

&lt;p&gt;We tested the performance using the standard TPC-DS and TPC-H benchmarks on zlib
compressed ORC files:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Benchmark&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Duration&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;CPU&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;TPC-DS&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;5.6%&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;9.3%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;TPC-H&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;4.5%&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8.3%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;There are a number of reasons you may get a larger or smaller win:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The exact queries matter: In the benchmarks above, some queries saved more than
20% CPU and others only saved 1%.&lt;/li&gt;
  &lt;li&gt;The compression matters: In our tests we used zlib, which is the most expensive
compression supported by ORC. Compression algorithms that use less CPU (e.g.,
Zstd, LZ4, or Snappy) will generally see larger relative improvements.&lt;/li&gt;
  &lt;li&gt;This improvement is only in &lt;a href=&quot;https://trino.io/download.html&quot;&gt;Trino 309+&lt;/a&gt;,
so if you are using an earlier version you will need to upgrade. Also, if you are
still using Facebook’s version of Presto, you can either upgrade to Trino 309+ or
wait to see if they backport it.&lt;/li&gt;
&lt;/ul&gt;</content>

      
        <author>
          <name>Dain Sundstrom, Martin Traverso</name>
        </author>
      

      <summary>Trino is known for being the fastest SQL on Hadoop engine, and our custom ORC reader implementation is a big reason for this speed – now it is even faster!</summary>

      
      
        <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://trino.io/assets/blog/orc-speedup.png" />
      
    </entry>
  
    <entry>
      <title>Release 308</title>
      <link href="https://trino.io/blog/2019/04/12/release-308.html" rel="alternate" type="text/html" title="Release 308" />
      <published>2019-04-12T00:00:00+00:00</published>
      <updated>2019-04-12T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/12/release-308</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/12/release-308.html">&lt;p&gt;This version includes significant 
&lt;a href=&quot;/blog/2019/04/23/even-faster-orc.html&quot;&gt;performance improvements&lt;/a&gt;
when reading ORC data, authorization checks for 
&lt;a href=&quot;https://trino.io/docs/current/sql/show-columns.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SHOW COLUMNS&lt;/code&gt;&lt;/a&gt;,
and limit pushdown for JDBC-based connectors.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-308.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes significant performance improvements when reading ORC data, authorization checks for SHOW COLUMNS, and limit pushdown for JDBC-based connectors. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 307</title>
      <link href="https://trino.io/blog/2019/04/08/release-307.html" rel="alternate" type="text/html" title="Release 307" />
      <published>2019-04-08T00:00:00+00:00</published>
      <updated>2019-04-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/08/release-307</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/08/release-307.html">&lt;p&gt;This version includes some important security fixes, support for inner and outer
joins involving lateral derived tables (&lt;a href=&quot;https://trino.io/docs/current/sql/select.html#lateral&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LATERAL&lt;/code&gt;&lt;/a&gt;),
new syntax for setting &lt;a href=&quot;https://trino.io/docs/current/sql/comment.html&quot;&gt;table comments&lt;/a&gt;, and performance
improvements.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-307.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes some important security fixes, support for inner and outer joins involving lateral derived tables (LATERAL), new syntax for setting table comments, and performance improvements. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-04-03</title>
      <link href="https://trino.io/blog/2019/04/03/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-04-03" />
      <published>2019-04-03T00:00:00+00:00</published>
      <updated>2019-04-03T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/04/03/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/04/03/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/VQhDBPltUyk&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Memory management&lt;/li&gt;
  &lt;li&gt;Spilling&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Memory management Spilling</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 306</title>
      <link href="https://trino.io/blog/2019/03/16/release-306.html" rel="alternate" type="text/html" title="Release 306" />
      <published>2019-03-16T00:00:00+00:00</published>
      <updated>2019-03-16T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/03/16/release-306</id>
      <content type="html" xml:base="https://trino.io/blog/2019/03/16/release-306.html">&lt;p&gt;This version includes some bug fixes, as well as performance improvements when decoding ORC data.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-306.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes some bug fixes, as well as performance improvements when decoding ORC data. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-03-13</title>
      <link href="https://trino.io/blog/2019/03/13/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-03-13" />
      <published>2019-03-13T00:00:00+00:00</published>
      <updated>2019-03-13T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/03/13/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/03/13/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/hMmFM1MBEB8&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Dynamic Filtering&lt;/li&gt;
  &lt;li&gt;Changes to TIMESTAMP semantics&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Dynamic Filtering Changes to TIMESTAMP semantics</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 305</title>
      <link href="https://trino.io/blog/2019/03/08/release-305.html" rel="alternate" type="text/html" title="Release 305" />
      <published>2019-03-08T00:00:00+00:00</published>
      <updated>2019-03-08T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/03/08/release-305</id>
      <content type="html" xml:base="https://trino.io/blog/2019/03/08/release-305.html">&lt;p&gt;Changes in this version include peak-memory awareness in
&lt;a href=&quot;https://trino.io/docs/current/optimizer/cost-based-optimizations.html&quot;&gt;cost-based optimizer&lt;/a&gt;,
improved handling of CSV output in &lt;a href=&quot;https://trino.io/docs/current/client/cli.html&quot;&gt;CLI&lt;/a&gt;,
and performance improvements for Parquet.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-305.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>Changes in this version include peak-memory awareness in cost-based optimizer, improved handling of CSV output in CLI, and performance improvements for Parquet. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-02-27</title>
      <link href="https://trino.io/blog/2019/02/27/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-02-27" />
      <published>2019-02-27T00:00:00+00:00</published>
      <updated>2019-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/27/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/27/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/7bclzfYUfQg&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Pushdown of complex operations (filter, project, join, etc.)&lt;/li&gt;
  &lt;li&gt;Coordinator high availability&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda Pushdown of complex operations (filter, project, join, etc.) Coordinator high availability</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 304</title>
      <link href="https://trino.io/blog/2019/02/27/release-304.html" rel="alternate" type="text/html" title="Release 304" />
      <published>2019-02-27T00:00:00+00:00</published>
      <updated>2019-02-27T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/27/release-304</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/27/release-304.html">&lt;p&gt;New features include &lt;a href=&quot;https://trino.io/docs/current/admin/spill.html&quot;&gt;spilling&lt;/a&gt; for queries
that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive 
&lt;a href=&quot;https://trino.io/docs/current/connector/hive.html#procedures&quot;&gt;procedure&lt;/a&gt; to synchronize 
partition metadata with the file system.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-304.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>New features include spilling for queries that use ORDER BY or window functions, support for PostgreSQL’s json and jsonb types, and a Hive procedure to synchronize partition metadata with the file system. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 303</title>
      <link href="https://trino.io/blog/2019/02/14/release-303.html" rel="alternate" type="text/html" title="Release 303" />
      <published>2019-02-14T00:00:00+00:00</published>
      <updated>2019-02-14T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/14/release-303</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/14/release-303.html">&lt;p&gt;This version includes bug fixes and performance improvements.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-303.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>This version includes bug fixes and performance improvements. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Community Meeting 2019-02-06</title>
      <link href="https://trino.io/blog/2019/02/06/Presto-Community-Meeting.html" rel="alternate" type="text/html" title="Presto Community Meeting 2019-02-06" />
      <published>2019-02-06T00:00:00+00:00</published>
      <updated>2019-02-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/06/Presto-Community-Meeting</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/06/Presto-Community-Meeting.html">&lt;div class=&quot;video-responsive&quot;&gt;
    &lt;iframe width=&quot;720&quot; height=&quot;405&quot; src=&quot;https://www.youtube.com/embed/YfDe_YVzMyI&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h3 id=&quot;agenda&quot;&gt;Agenda&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;About the Foundation&lt;/li&gt;
  &lt;li&gt;Getting involved&lt;/li&gt;
  &lt;li&gt;Summary of new features&lt;/li&gt;
  &lt;li&gt;Top requested features&lt;/li&gt;
  &lt;li&gt;Release verification&lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;</content>

      

      <summary>Agenda About the Foundation Getting involved Summary of new features Top requested features Release verification</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 302</title>
      <link href="https://trino.io/blog/2019/02/06/release-302.html" rel="alternate" type="text/html" title="Release 302" />
      <published>2019-02-06T00:00:00+00:00</published>
      <updated>2019-02-06T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/02/06/release-302</id>
      <content type="html" xml:base="https://trino.io/blog/2019/02/06/release-302.html">&lt;p&gt;New features include native support for 
&lt;a href=&quot;https://trino.io/docs/current/connector/hive-gcs-tutorial.html&quot;&gt;Google Cloud Storage&lt;/a&gt; 
and a connector for 
&lt;a href=&quot;https://trino.io/docs/current/connector/elasticsearch.html&quot;&gt;Elasticsearch&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-302.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>New features include native support for Google Cloud Storage and a connector for Elasticsearch. Release notes Download</summary>

      
      
    </entry>
  
    <entry>
      <title>Presto Software Foundation Launch</title>
      <link href="https://trino.io/blog/2019/01/31/presto-software-foundation-launch.html" rel="alternate" type="text/html" title="Presto Software Foundation Launch" />
      <published>2019-01-31T00:00:00+00:00</published>
      <updated>2019-01-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/01/31/presto-software-foundation-launch</id>
      <content type="html" xml:base="https://trino.io/blog/2019/01/31/presto-software-foundation-launch.html">&lt;p&gt;We are pleased to &lt;a href=&quot;https://www.prweb.com/releases/prweb16070792.htm&quot;&gt;announce&lt;/a&gt;
the launch of the Presto Software Foundation,
a not-for-profit organization dedicated to the advancement of the Presto
open source distributed SQL engine. The foundation is committed to ensuring
the project remains open, collaborative and independent for decades to come.&lt;/p&gt;

&lt;p&gt;We started the Presto project in 2012 as a small team at Facebook,
with the goals of building a high performance, standards compliant, easy-to-use
and dependable query engine capable of scaling to the largest datasets
(exabyte scale) in the world. From day one, we designed and developed Presto
to be maintained by an independent open source community.&lt;/p&gt;

&lt;p&gt;In 2013, we released Presto under the Apache License and opened development to the public.
Since then, the Presto community has expanded globally, with developers in
Brazil, China, Germany, India, Israel, Japan, Poland, Singapore, the U.S., the U.K.,
and more. In recent years, the center of gravity of the Presto community has shifted,
with the majority of contributions now coming from developers outside of Facebook.&lt;/p&gt;

&lt;p&gt;From the beginning, we stressed the importance of code quality, architectural
extensibility, and open collaboration with the community. With the rapid expansion
of both the Presto user base and Presto developer community over the last several
years, establishing a non-profit to institutionalize these values is the next
logical step to ensure that this project stands the test of time.&lt;/p&gt;

&lt;p&gt;The foundation is dedicated to preserving the vision of high quality, performant
and dependable software developed by an open, collaborative and independent
community of developers throughout the world. Everyone is welcome to participate,
whether it be via code contributions, suggestions for improvements, or bug reports.&lt;/p&gt;</content>

      
        <author>
          <name>Martin Traverso, Dain Sundstrom, David Phillips</name>
        </author>
      

      <summary>We are pleased to announce the launch of the Presto Software Foundation, a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL engine. The foundation is committed to ensuring the project remains open, collaborative and independent for decades to come.</summary>

      
      
    </entry>
  
    <entry>
      <title>Release 301</title>
      <link href="https://trino.io/blog/2019/01/31/release-301.html" rel="alternate" type="text/html" title="Release 301" />
      <published>2019-01-31T00:00:00+00:00</published>
      <updated>2019-01-31T00:00:00+00:00</updated>
      <id>https://trino.io/blog/2019/01/31/release-301</id>
      <content type="html" xml:base="https://trino.io/blog/2019/01/31/release-301.html">&lt;p&gt;New features include role-based access control and 
&lt;a href=&quot;https://trino.io/docs/current/sql/create-role.html&quot;&gt;role management&lt;/a&gt;, 
&lt;a href=&quot;https://trino.io/docs/current/sql/create-view.html#security&quot;&gt;invoker security&lt;/a&gt;
mode for views, and &lt;a href=&quot;https://trino.io/docs/current/sql/analyze.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ANALYZE&lt;/code&gt;&lt;/a&gt;
syntax for collecting table statistics.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://trino.io/docs/current/release/release-301.html&quot;&gt;Release notes&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://trino.io/download.html&quot;&gt;Download&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;</content>

      

      <summary>New features include role-based access control and role management, invoker security mode for views, and ANALYZE syntax for collecting table statistics. Release notes Download</summary>

      
      
    </entry>
  

</feed>
