In this week’s concept, Manfred discusses Hive Partitioning.
See here for more details: https://www.educba.com/partitioning-in-hive/
In this week’s pull request https://github.com/trinodb/trino/pull/223, came from contributor Hao Luo. What this function does is similar to Hive’s MSCK REPAIR TABLE where if it finds a hive partition directory in the filesystem that exist but no partition entry in the metastore, then it will add the entry to the metastore. If there is an entry in the metastore but the partition was deleted from the filesystem, then it will remove the metastore entry. You can find more information about this procedure in the documentation.
Here are the commands and SQL I ran during the show on Presto
We ran some queries against the metastore database. It’s a complicated model so here is a database diagram to show the different tables and their relations in the metastore.
This diagram was generated by niftimusmaximus on The Analytics Anvil.
MariaDB (metastore database)
In this week’s question, we answer:
Why am I getting, “Query exceeded maximum columns. Please reduce the number of columns referenced and re-run the query.”?
I’m running this query to check for duplicates. My table has approx. 650 columns and I get this error.
SELECT *, COUNT(1)
GROUP BY *
HAVING COUNT(1) > 1
getting a stacktrace like this
io.prestosql.spi.PrestoException: Compiler failed
at jdk.internal.reflect.GeneratedMethodAccessor480.invoke(Unknown Source)
The throwable that causes this error
MethodTooLargeException comes from the ASM
library https://asm.ow2.io/ when you ask it to create a method with more
bytecode than is allowed by the JVM specification.
We try to generate code for handling given query and the code generated is too large. Since the code is proportional to number of columns referenced, we rewrap the exception in something more meaningful to the user.
The general strategy would be to lower the number of columns that you reference.
The problem is that in removing columns you will remove important information to the query. For example, in the example looking for duplicates above, you won’t be able to discard false positive duplicate matches, but this may be good enough to help narrow the search space. As always, it depends…
To learn more about the JVM limit and search for code_length in the Java SE specification.
Special thanks to Ashhar Hasan for asking this question and providing some useful context!
Release Notes discussed: https://trino.io/docs/current/release/release-346.html
Manfred’s Training - SQL at any scale https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/ https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/
Latest training from David, Dain, and Martin(Now with timestamps!):
Presto Summit Series - Real world usage
If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.