In our last episode we already had a bit of a glimpse. Now the release is really out.
Official announcement items from Martin:
GROUP BY
or LIMIT
.Manfred’s additional notes:
DELETE
and batch insert support for various JDBC-based connectorsMore info at https://trino.io/docs/current/release/release-360.html.
The MATCH_RECOGNIZE
syntax was introduced in the latest SQL specification
of 2016. It is a super powerful tool for analyzing trends in your data. We are
proud to announce that Trino supports this great feature since
version 356. With
MATCH_RECOGNIZE
, you can define a pattern using the well-known regular
expression syntax, and match it to a set of rows. Upon finding a matching row
sequence, you can retrieve all kinds of detailed or summary information about
the match, and pass it on to be processed by the subsequent parts of your
query. This is a new level of what a pure SQL statement can do.
For more details, this blog post
gives you a taste of row pattern matching capabilities, and a quick overview of
the MATCH_RECOGNIZE
syntax.
Let’s look at an example with data similar to the TPCH data. Here is an example,
and the same goal: detect a “V”-shape of the price
values over time for different customers.
trino> WITH orders(customer_id, order_date, price) AS (VALUES
('cust_1', DATE '2020-05-11', 100),
('cust_1', DATE '2020-05-12', 200),
('cust_2', DATE '2020-05-13', 8),
('cust_1', DATE '2020-05-14', 100),
('cust_2', DATE '2020-05-15', 4),
('cust_1', DATE '2020-05-16', 50),
('cust_1', DATE '2020-05-17', 100),
('cust_2', DATE '2020-05-18', 6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
FROM orders
MATCH_RECOGNIZE (
PARTITION BY customer_id
ORDER BY order_date
MEASURES
START.price AS start_price,
LAST(DOWN.price) AS bottom_price,
LAST(UP.price) AS final_price,
START.order_date AS start_date,
LAST(UP.order_date) AS final_date
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS price < PREV(price),
UP AS price > PREV(price)
);
customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
cust_1 | 200 | 50 | 100 | 2020-05-12 | 2020-05-17
cust_2 | 8 | 4 | 6 | 2020-05-13 | 2020-05-18
(2 rows)
Two matches are detected, one for cust_1
, and one for cust_2
.
The matching algorithm was a collaboration between Martin and Kasia. This algorithm lives in the Matcher class.
The running semantics is the default both in the DEFINE
and MESAURES
clauses. Note that FINAL
only applies to the MEASURES
clause.
To sum up, here’s one complex measure expression combining different elements of the special syntax:
The PR of the week, is adding
documentation for applying pattern matching over windows. This is yet another
SQL functionality that Kasia added after getting the patter recognition to work
with MATCH_RECOGNIZE
.
Here are a few examples that Kasia will be running:
Demo preview:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
The query returns many results (many matches). Wrap it in a count()
aggregation to check how many there are:
SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
))
Modify the PATTERN
to limit the results. Now searching for a “big V”:
SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
))
Unwrap from count()
aggregation to see the actual matches:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
Change AFTER MATCH SKIP PAST LAST ROW
to AFTER MATCH SKIP TO NEXT ROW
to
detect overlapping matches:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
Change ONE ROW PER MATCH
to ALL ROWS PER MATCH
(also, revert the previous
change). Discuss the classy
column and explain the running
semantics on the
example of final_date
column:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
Change the semantics of the final_date
column to FINAL
:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
FINAL LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
A StackOverflow user asked how to tag orders in a table that meet a certain
criterion that relies on periodicity. There are certainly some complicated and
inefficient SQL queries that you could craft to address these issues. However,
now with MATCH_RECOGNIZE
it is possible to do this and take advantage of the
efficient matching capabilities that Martin and Kasia have added.
Here is an example orders table represented as a csv table:
Create_time, Order_id, person_id, variable_a
'2021-06-01', 1234, 2232, 1
'2021-06-02', 1235, 2232, 0.6
'2021-06-03', 1236, 2232, 0.33
'2021-06-04', 1237, 2232, 0.7
'2021-06-05', 1238, 2232, 0.6
'2021-06-06', 1239, 2232, 0.4
'2021-06-07', 1240, 2232, 0.8
'2021-06-08', 1241, 2232, 0.7
'2021-06-09', 1242, 2232, 0.4
'2021-06-10', 1243, 2232, 0.6
'2021-06-11', 1244, 2232, 0.7
'2021-06-12', 1245, 2232, 0.6
The grace period logic will produce the final_hit column as the result of this logic:
is_hit
column equals to 1 if the variable A less than equal to 0.5final_hit
.Based on this logic, this is the desired result of the example is:
Create_time, Order_id, person_id, variable_a, is_hit, final_hit
'2021-06-01', 1234, 2232, 1, NULL, NULL
'2021-06-02', 1235, 2232, 0.6, NULL, NULL
'2021-06-03', 1236, 2232, 0.33, true, true
'2021-06-04', 1237, 2232, 0.7, NULL, NULL
'2021-06-05', 1238, 2232, 0.6, NULL, NULL
'2021-06-06', 1239, 2232, 0.4, true, NULL
'2021-06-07', 1240, 2232, 0.8, NULL, NULL
'2021-06-08', 1241, 2232, 0.7, NULL, NULL
'2021-06-09', 1242, 2232, 0.4, true, true
'2021-06-10', 1243, 2232, 0.6, NULL, NULL
'2021-06-11', 1244, 2232, 0.7, NULL, NULL
'2021-06-12', 1245, 2232, 0.6, NULL, NULL
To accomplish this with MATCH_RECOGNIZE
, you can do the following statement,
which gives us the correct answer:
WITH data(Create_time, Order_id, person_id, variable_a) AS (
VALUES
(DATE '2021-06-01', 1234, 2232, 1),
(DATE '2021-06-02', 1235, 2232, 0.6),
(DATE '2021-06-03', 1236, 2232, 0.33),
(DATE '2021-06-04', 1237, 2232, 0.7),
(DATE '2021-06-05', 1238, 2232, 0.6),
(DATE '2021-06-06', 1239, 2232, 0.4),
(DATE '2021-06-07', 1240, 2232, 0.8),
(DATE '2021-06-08', 1241, 2232, 0.7),
(DATE '2021-06-09', 1242, 2232, 0.4),
(DATE '2021-06-10', 1243, 2232, 0.6),
(DATE '2021-06-11', 1244, 2232, 0.7),
(DATE '2021-06-12', 1245, 2232, 0.6)
)
SELECT Create_time, Order_id, person_id, variable_a, if(variable_a <= 0.5, true, null) is_hit, final_hit
FROM data
MATCH_RECOGNIZE (
PARTITION BY person_id
ORDER BY Create_time
MEASURES if(classifier() = 'HIT', true, null) AS final_hit
ALL ROWS PER MATCH WITH UNMATCHED ROWS
AFTER MATCH SKIP PAST LAST ROW
PATTERN (HIT G{,4})
DEFINE /* G -- grace period */
HIT AS HIT.variable_a <= 0.5
)
Check out Martin and Kasia’s full answer to this question.
Trino Meetup groups
If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.