We discuss the Trino 347 release notes: https://trino.io/docs/current/release/release-347.html
Official release announcement from Martin Traverso:
We’re happy to announce the release of Presto 347! This version includes:
Notes from Manfred:
EXPLAIN to learn what is planned.
Also refer to chapter 4 in
Trino: The Definitive Guide
In this week’s pull request https://github.com/trinodb/trino/pull/730, came from one of the co-creators Martin Traverso. This pull request removes duplicate predicates in logical binary expressions (AND, OR) and canonicalizes commutative arithmetic expressions and comparisons to handle a larger number of variants. Canonicalize is a big word but all it is saying is that if there are multiple representations of the same logic or data, then simplify it to a simpler or agreed upon normal form.
For example the statement
COALESCE(a * (2 * 3), 1 - 1) is
COALESCE(6 * a, 0) as the expression 2 * 3 can
be simplified to static integer.
This is an example of a logical plan because we are talking about the query syntax by optimizing the SQL. It differs from the distributed plan as we are not determining how the plan will be distributed, where this plan will run and it does not run further optimizations that are handled by the cost based optimizer such as pushdown predicates. We’ll talk about this step more in the next episode. For now let’s cover a few examples
In this week’s question, we answer:
How should I allocate memory properties? CPU : 16Core MEM:64GB
Before answering this, we should make sure a few things about memory are clear.
Space needed that the user is capable of reasoning about:
query.max-memory-per-node- maximum amount of user memory that a query is allowed to use on a given worker.
query.max-memory(without the -per-node at the end) - This config caps the amount of user memory used by a single query over all worker nodes in your cluster.
Memory needed to facilitate internal usage
NOTE: There are no settings for this memory as it is implicitly set by the user and total memory settings. Use this to calculate system memory:
Total Memory = System + User, but there are only properties for total and user memory.
query.max-total-memory-per-node- maximum amount of total memory that a query is allowed to use on a given worker.
query.max-total-memory(without the -per-node at the end) - This config caps the total memory used by a single query over all worker nodes in your cluster.
The final setting I would like to cover is the
memory.heap-headroom-per-node. This config sets aside memory for the
JVM heap for allocations that are not tracked by Presto. You can typically go
with the default on this setting which is 30% of the JVM’s max heap size
Now knowing that Presto is a java application means it runs on the JVM. None of these memory settings mean anything until we actually have the JVM that Presto is running on set aside sufficient memory. So how do I know I am setting sufficient memory based on my settings?
-Xmx setting (Java heap)
Dain really covers the proportions well in detail on the recent training videos. Here’s a snippet of what he recommends.
All in all, try to estimate the amount of memory needed by your max anticipated query load, and if possible try to get even more than your estimate. Once Presto is discovered by users, they will start to use it even more and demands on the system will grow.
Latest training from David, Dain, and Martin(Now with timestamps!):
Presto Summit Series - Real world usage
If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.