In our last episode we already had a bit of a glimpse. Now the release is really out.
Official announcement items from Martin:
GROUP BY or LIMIT.Manfred’s additional notes:
DELETE and batch insert support for various JDBC-based connectorsMore info at https://trino.io/docs/current/release/release-360.html.
The MATCH_RECOGNIZE syntax was introduced in the latest SQL specification
of 2016. It is a super powerful tool for analyzing trends in your data. We are
proud to announce that Trino supports this great feature since
version 356. With
MATCH_RECOGNIZE, you can define a pattern using the well-known regular
expression syntax, and match it to a set of rows. Upon finding a matching row
sequence, you can retrieve all kinds of detailed or summary information about
the match, and pass it on to be processed by the subsequent parts of your
query. This is a new level of what a pure SQL statement can do.
For more details, this blog post
gives you a taste of row pattern matching capabilities, and a quick overview of
the MATCH_RECOGNIZE syntax.
Let’s look at an example with data similar to the TPCH data. Here is an example,
and the same goal: detect a “V”-shape of the price
values over time for different customers.
trino> WITH orders(customer_id, order_date, price) AS (VALUES
('cust_1', DATE '2020-05-11', 100),
('cust_1', DATE '2020-05-12', 200),
('cust_2', DATE '2020-05-13', 8),
('cust_1', DATE '2020-05-14', 100),
('cust_2', DATE '2020-05-15', 4),
('cust_1', DATE '2020-05-16', 50),
('cust_1', DATE '2020-05-17', 100),
('cust_2', DATE '2020-05-18', 6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
FROM orders
MATCH_RECOGNIZE (
PARTITION BY customer_id
ORDER BY order_date
MEASURES
START.price AS start_price,
LAST(DOWN.price) AS bottom_price,
LAST(UP.price) AS final_price,
START.order_date AS start_date,
LAST(UP.order_date) AS final_date
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS price < PREV(price),
UP AS price > PREV(price)
);
customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
cust_1 | 200 | 50 | 100 | 2020-05-12 | 2020-05-17
cust_2 | 8 | 4 | 6 | 2020-05-13 | 2020-05-18
(2 rows)
Two matches are detected, one for cust_1, and one for cust_2.
The matching algorithm was a collaboration between Martin and Kasia. This algorithm lives in the Matcher class.
The running semantics is the default both in the DEFINE and MESAURES
clauses. Note that FINAL only applies to the MEASURES clause.
To sum up, here’s one complex measure expression combining different elements of the special syntax:
The PR of the week, is adding
documentation for applying pattern matching over windows. This is yet another
SQL functionality that Kasia added after getting the patter recognition to work
with MATCH_RECOGNIZE.
Here are a few examples that Kasia will be running:
Demo preview:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
The query returns many results (many matches). Wrap it in a count()
aggregation to check how many there are:
SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
))
Modify the PATTERN to limit the results. Now searching for a “big V”:
SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
))
Unwrap from count() aggregation to see the actual matches:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
Change AFTER MATCH SKIP PAST LAST ROW to AFTER MATCH SKIP TO NEXT ROW to
detect overlapping matches:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ONE ROW PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
Change ONE ROW PER MATCH to ALL ROWS PER MATCH (also, revert the previous
change). Discuss the classy column and explain the running semantics on the
example of final_date column:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
Change the semantics of the final_date column to FINAL:
SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
FROM orders
MATCH_RECOGNIZE (
PARTITION BY custkey
ORDER BY orderdate, orderkey
MEASURES
START.totalprice AS start_price,
LAST(DOWN.totalprice) AS bottom_price,
LAST(UP.totalprice) AS final_price,
START.orderdate AS start_date,
FINAL LAST(UP.orderdate) AS final_date,
MATCH_NUMBER() AS match_no,
CLASSIFIER() AS classy
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN{3,} UP{4,})
DEFINE
DOWN AS totalprice < PREV(totalprice),
UP AS totalprice > PREV(totalprice)
)
A StackOverflow user asked how to tag orders in a table that meet a certain
criterion that relies on periodicity. There are certainly some complicated and
inefficient SQL queries that you could craft to address these issues. However,
now with MATCH_RECOGNIZE it is possible to do this and take advantage of the
efficient matching capabilities that Martin and Kasia have added.
Here is an example orders table represented as a csv table:
Create_time, Order_id, person_id, variable_a
'2021-06-01', 1234, 2232, 1
'2021-06-02', 1235, 2232, 0.6
'2021-06-03', 1236, 2232, 0.33
'2021-06-04', 1237, 2232, 0.7
'2021-06-05', 1238, 2232, 0.6
'2021-06-06', 1239, 2232, 0.4
'2021-06-07', 1240, 2232, 0.8
'2021-06-08', 1241, 2232, 0.7
'2021-06-09', 1242, 2232, 0.4
'2021-06-10', 1243, 2232, 0.6
'2021-06-11', 1244, 2232, 0.7
'2021-06-12', 1245, 2232, 0.6
The grace period logic will produce the final_hit column as the result of this logic:
is_hit column equals to 1 if the variable A less than equal to 0.5final_hit.Based on this logic, this is the desired result of the example is:
Create_time, Order_id, person_id, variable_a, is_hit, final_hit
'2021-06-01', 1234, 2232, 1, NULL, NULL
'2021-06-02', 1235, 2232, 0.6, NULL, NULL
'2021-06-03', 1236, 2232, 0.33, true, true
'2021-06-04', 1237, 2232, 0.7, NULL, NULL
'2021-06-05', 1238, 2232, 0.6, NULL, NULL
'2021-06-06', 1239, 2232, 0.4, true, NULL
'2021-06-07', 1240, 2232, 0.8, NULL, NULL
'2021-06-08', 1241, 2232, 0.7, NULL, NULL
'2021-06-09', 1242, 2232, 0.4, true, true
'2021-06-10', 1243, 2232, 0.6, NULL, NULL
'2021-06-11', 1244, 2232, 0.7, NULL, NULL
'2021-06-12', 1245, 2232, 0.6, NULL, NULL
To accomplish this with MATCH_RECOGNIZE, you can do the following statement,
which gives us the correct answer:
WITH data(Create_time, Order_id, person_id, variable_a) AS (
VALUES
(DATE '2021-06-01', 1234, 2232, 1),
(DATE '2021-06-02', 1235, 2232, 0.6),
(DATE '2021-06-03', 1236, 2232, 0.33),
(DATE '2021-06-04', 1237, 2232, 0.7),
(DATE '2021-06-05', 1238, 2232, 0.6),
(DATE '2021-06-06', 1239, 2232, 0.4),
(DATE '2021-06-07', 1240, 2232, 0.8),
(DATE '2021-06-08', 1241, 2232, 0.7),
(DATE '2021-06-09', 1242, 2232, 0.4),
(DATE '2021-06-10', 1243, 2232, 0.6),
(DATE '2021-06-11', 1244, 2232, 0.7),
(DATE '2021-06-12', 1245, 2232, 0.6)
)
SELECT Create_time, Order_id, person_id, variable_a, if(variable_a <= 0.5, true, null) is_hit, final_hit
FROM data
MATCH_RECOGNIZE (
PARTITION BY person_id
ORDER BY Create_time
MEASURES if(classifier() = 'HIT', true, null) AS final_hit
ALL ROWS PER MATCH WITH UNMATCHED ROWS
AFTER MATCH SKIP PAST LAST ROW
PATTERN (HIT G{,4})
DEFINE /* G -- grace period */
HIT AS HIT.variable_a <= 0.5
)
Check out Martin and Kasia’s full answer to this question.
Trino Meetup groups
If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.