The Hive community is centered around a few different Hive distributions, one of them being Hortonworks Data Platform (HDP). Even after the Cloudera-Hortonworks merger there is vivid interest in HDP 3, featuring Hive 3. Presto is ready for the game.
In this post, we summarize which Hive 3 features Presto already supports, covering all the work that went into Presto to achieve that. We also outline next steps lying ahead.
Introduction #
There are several Hive versions in active use by the Hive community: 0.x, 1.x, 2.x and 3.x. Hive 3 major release brings a number of interesting features, including:
- support for Hadoop Erasure Coding (EC), allowing much better HDFS storage capacity utilization without reducing data availability,
- update to ORC ACID transactional tables - they no longer need to be bucketed,
- transactional tables for all file formats (“insert-only” except for ORC),
- materialized views,
- new bucketing function, offering a better data distribution and less data skew,
- new timestamp semantics and timestamp-related changes in file formats,
- and a lot more (let’s skip over features and changes that are not interesting from Presto perspective).
That’s no surprise that many people want to try out all these features and run Hive 3, either the Apache project’s official release or using HDP version 3.
Hive 3 in Presto #
The Presto community expressed interest in using Presto with Hive 3, both in the project’s issues and on Slack.
You spoke, we listened. Actually – we, community, spoke and listened.
In collaboration between Starburst, Qubole and the wider Presto community, Presto gradually improves its compatibility with Hive 3:
- Presto 319 fixed issues with backwards-incompatible changes in Hive metastore thrift API
- Presto 320 added continuous integration with Hive 3
- Presto 321 added support for Hive bucketing v2
(
"bucketing_version"="2"
) - Presto 325 added continuous integration with HDP 3’s Hive 3
- Presto 327 added support for reading from insert-only transactional tables, and added compatibility with timestamp values stored in ORC by Hive 3.1
Upcoming improvements already being worked on include:
Try it out #
The amazing Presto community is working hard on getting Hive 3 support fully integrated in the Presto project and a lot is already accomplished. Chances are THAT all you need is already included in the latest release. If you need one of the upcoming improvements, watch the pull requests linked above, the roadmap issue, join Slack and stay tuned for upcoming release announcements. In the meantime, you can try out the features today by running the 323-e release of Starburst Presto.
□