Do you ❤️ Trino? Give us a 🌟 on GitHub

Trino Community Broadcast

4: Presto on ACID, row-level INSERT/DELETE, and why JDK11?

Nov 4, 2020

Audio

 

Video

Video sections

In this week’s concept, Manfred discusses ACID in general, CAP theorem, HDFS and Hive before ACID, and now ORC ACID and similar support.

ACID https://en.wikipedia.org/wiki/ACID

  • Atomicity - Transaction completely succeeds or completely fails, no partial results so no inconsistent relationships left tangling and such. The database remains in a consistent state.
  • Consistency - database content always adheres to defined rules (key constraints).
  • Isolation, transactions are isolated from each other and can run in parallel with same result as sequentially.
  • Durability - no data is lost after transaction completion.

ACID used to be a crucial criteria for a “serious” relational database system.

Then came big data and the CAP theorem. https://en.wikipedia.org/wiki/CAP_theorem

  • Consistency
  • Availability
  • Partition tolerance

In this week’s pull request https://github.com/trinodb/trino/pull/5402, came from contributor David Stryker. David covers some interesting aspects to working on this pull request. This commit adds support for row-level insert and delete for Hive ACID tables, and product tests that verify that row-level insert and delete where allowed.

Here is the SQL that we ran in the INSERT/DELETE demo

/*
  Ran against Presto
*/
SHOW SCHEMAS IN minio;
SHOW TABLES IN minio.acid;

CREATE SCHEMA minio.acid
WITH (location = 's3a://acid/');


CREATE TABLE minio.acid.test (a int, b int)
WITH (
   format='ORC',
   transactional=true
);

INSERT INTO minio.acid.test VALUES (10, 10), (20, 20);

SELECT * FROM  minio.acid.test;

DELETE FROM minio.acid.test WHERE a = 10;

/*
  Ran against Hive
*/

SHOW DATABASES;

SELECT * FROM acid.test;

David also mentioned this blog to better understand the hive acid model.

In this week’s question we answer, “Why is Java 11 needed in the newer version of Presto and how do I get the older version of Presto as I need the 328 latest on Java 8 as Java 11 isn’t available to use?

Using Java 11 because it is the next LTS verison of java since 8. Java 11 provides significant performance and stability improvements, so we believe everyone should be running that version to get the best experience out of Presto. Moving to Java 11 allows us to take advantage of many improvements to the JDK and the Java language that were introduced since Java 8.

For older versions, you can download it from maven or an older document version. https://repo.maven.apache.org/maven2/io/prestosql/presto-server/ https://trino.io/docs/328/

One thing to point out is you’re only required to use JDK11 for the server. The client can be on JDK8. One reason you would need to run Presto on JDK8 is if the server had to be run with another service running JDK8 which we do not recommend as this will degrade the performance of your cluster and could cause other issues if Presto is fighting for resources.

Another possibility is that there is a company policy requiring specific JDKs be installed on all servers. You can have side-by-side installs of multiple versions of the JDK and use the appropriate one. You just need to launch Presto with the correct java command. If your company is against using a newer JDK, you can point out the arguments above to update the policy to at least include JDK11.

Release Notes discussed: https://trino.io/docs/current/release/release-345.html

Manfred’s Training - SQL at any scale https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/ https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/

Blogs

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more

https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm

Upcoming events

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto Summit Series - Real world usage

Recent Podcasts:

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.