Qubole organized the first ever Presto Summit in India on September 05, 2019. Bangalore, as the technology and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot of interest and adoption in this (south asia and asia pacific) region, as was evident with the turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity.
With 150 attendees from more than 75 companies, Presto community in India was super excited and eager to meet and interact with Presto co-creators - Martin Traverso, Dain Sundstrom and David Phillips, who flew down to Bangalore for this Event.
Welcome Note by Joydeep Sen Sarma
Joydeep Sen Sarma, co-creator Hive and co-founder Qubole, kicked off the event by welcoming Presto co-creators, speakers and all the attendees. He also provided a brief historical perspective of Qubole’s contributions to Presto and highlighted the importance of Presto in Qubole’s customer base.
Keynote by Martin, Dain and David
This was followed by the most awaited presentation of the day - the keynote from Martin, Dain and David. Martin took the audience through Presto’s journey - right from its birth at Facebook, to its growth and adoption at Facebook, and finally to the present with the formation of Presto Software Foundation for wider community involvement. He also highlighted some of their design choices and some mis-steps they took along the way.
Presto at Grab
First industry speaker of the day was Edwin Hui Hean Law, Data Engineering Lead at Grab, Singapore. He and his team flew all the way from Singapore for Presto Summit - a true testament to their passion and interest in Presto. His talk covered Grab’s experience of using Presto on Amazon EMR followed by their migration to Presto on Qubole. He provided his insights on the relative pros and cons of these platforms. Final part of his talk covered his team’s recent experimentation with Presto on Kubernetes.
Read Support for Hive ACID tables in Presto
Next, Shubham Tagra, Sr. Staff at Qubole, presented his work on providing read support for Hive ACID tables in Presto. This has become increasingly important with the arrival of data privacy regulations like GDPR and CCPA that grant users “Right to erasure” and/or “Right to rectification”. These regulations require that organisations storing user data are obligated to delete or update user data as per user request. Hive ACID is a solution available in open source that addresses these problems around delete and updates. Shubham’s talk covered why he picked Hive ACID over other options available in open source, as well as details of Hive ACID and Presto integration that he added.
Presto Optimizations at Zoho Corporation
Post lunch, Praveen Krishna from Zoho Corporation, presented a summary of his team’s journey with Presto. In order to serve their teams with a pretty small cluster, they had to optimize Presto at various levels. Praveen’s team started by analyzing various phases of query execution and their impact on performance. Praveen’s team optimized Presto’s planner and reduced the planning time by 20-30% for queries involving multiple joins on wide tables. He also highlighted how they have integrated Apache Lucene to speed up full text search operation. After several iterations his team came up with a model where they maintained the Lucene index for each row group in the ORC itself. For columns with higher null ratio, replacing normal blocks with run length encoded blocks reduced memory consumption . With this logic implemented in ORC reader and Core Presto, they were able to reduce memory pressure in the cluster .
Presto at Walmart Labs
Second presentation in this session was from Ashish Kumar Tadose, Principal Engineer at Walmart Labs. He gave an overview of how his team is using Presto on Google Compute Cloud (GCP). He highlighted the challenges associated with querying diverse data sources at Walmart and how his team has tackled these challenges using Presto. His talk also described how his team has implemented monitoring, auto scaling, caching (via Alluxio), and security policies via Ranger.
Presto at InMobi
Ater a coffee break, Rohit Chatter, CTO at InMobi, provided a historical perspective of how his team has migrated from Hive in private Data centers to Presto on the public cloud. His talk covered various aspects of how his team handles autoscaling and workload management on the cloud.
Presto Scheduler Changes for Rubix
Next, Garvit Gupta from Microsoft presented his work on Presto scheduler changes for data locality and optimized scheduling for caching engines like RubiX. This work was done primarily as part of his internship at Qubole. This talk was co-presented by Ankit Dixit from Qubole, who first gave an overview of the Rubix caching engine and its architecture. Garvit highlighted the need for having locality as another dimension to be considered while assigning splits to nodes and how this led to the implementation of a new Presto scheduler. The new scheduling model manages to prioritize locality while ensuring a uniform distribution of workload to nodes and improves efficacy of any data caching framework that you would use with Presto. His talk covered the new scheduler changes in detail, and concluded with performance numbers where he saw upto 9x improvement in cached/local reads with RubiX.
Presto at MiQ Digital
Final presentation of the day was from Rohit Srivastava, Engineering Manager at MiQ Digital, who presented an overview of Unified Insights & Data Analytics platform at MiQ. He highlighted several challenges that his team had to overcome, such as scaling the team/infrastructure/company, dealing with data copies, duplication of data pre-processing and the cost and effort that goes into it, meeting strict SLAs etc. He gave an overview of how using Presto on Qubole for all dashboarding needs with additions like standardising most of their data to be stored in the Apache Parquet format on S3 has helped overcome some of these challenges.
In summary, first Presto Summit in India, had a great mix of talks - some were around Presto usage and experience of operating large Presto deployments across multiple clouds, while some others focussed on niche technical contributions around Presto scheduler changes for data locality, speeding up ORC reader, and read support for Hive ACID tables in Presto. Participants had interesting and engaging questions for all the speakers and in general, enjoyed interacting with Presto founders, other Presto users and developers in the region.
Videos and slides for all talks can be found here.
We look forward to the next Presto Summit in this region soon!