CDInsights was a media partner of PrestoCon Day
PrestoCon Day, a Linux Foundation event on July 21st, 2022, marked an important milestone. It had been ten years since four engineers started working on Presto, and just three years since Facebook, Twitter, Alibaba, and Uber founded the Presto Foundation, hosted by the Linux Foundation. The PrestoCon sessions—presented by founders, open source community leaders, contributing engineers, and the data teams that have solved some very challenging problems using Presto—amounted to a very deep dive on the popular distributed SQL query engine that has found a new use case in the data lakehouse.
Speakers included companies with a reputation for being innovators and early adopters, proving technology in complex scenarios and often at immense scale: Ahana, Alluxio, Blinkit, Bytedance, CData, Intel, Meta, Onehouse, Platform24, Tencent, Uber, and the Apache Software Foundation.
Presto is finding new life in the data lakehouse
The event was a rich resource of data lakehouse case studies and distributed query optimization deep dives. Sessions covered topics ranging from:
- Using Presto for real-time analytics (at Uber)
- Using Kubernetes to build a cloud-agnostic (or on-premises) data architecture for Presto
- Open data lakehouse architecture
- The process of building an open data lakehouse with Presto (by Blinkit)
- Query optimization for broadcast joins (very useful in a distributed, mega-scale distributed data environment).
Aside from these, project leaders from the open source ecosystem that has sprouted around Presto gave updates to the engineering work. Specifically, the Apache Software Foundation did a deep dive on the recent enhancements to the integration with Hudi, a self-managing, exascale data platform that can access Hive. In addition, Intel presented on the Velox project, and Ahana shared details on using the Ahana Community Cloud as a free service for running Presto instances.
What’s Next for Presto? Technology and Community
Tim Meehan, chair of the Presto Technical Steering Committee and a software engineer at Meta, shared high points of the Presto community’s upcoming work. The Query Predictor project uses ML to estimate the cost of a query. The Presto Router sits in front of multiple Presto clusters and provides load balancing. These two features complement each other well, with one providing information that you can use to manage resource-intensive queries.
The foundation is working to include Presto functions in Velox, an embeddable vectorized engine, and to integrate Velox with Presto on Spark. It also includes the Ranger Plug-in that enables access to the Apache Ranger data security framework. One of the projects that Tim Meehan was most excited about was history-based optimization. It will compensate for unpredictable hardware by introducing interactive fault-tolerance. Meanwhile, the community continues to increase scale (by disaggregating coordinators for one) and performance (in terms of speed and reliability). He noted that Meta still uses Presto its primary exploration engine.
Tim didn’t just discuss the community’s technology. He also outlined some important changes to the community’s organizations to make it easier for more people (not just engineers) to contribute. As a result, participation in the Technical Steering Committee will be easier, and new Code owners have been designated. They are responsible for connectors. However, one of their roles is to help new contributors get started and become part of the community.
View past presentations and attend the next PrestoCon
Even though the conference is over, you can register and access session information and available slides on the PrestoCon site.
You can also watch the recorded presentations on YouTube without registration.
The next Presto Foundation event is PrestoCon, December 7-8, 2022. It will be a hybrid event with a physical footprint in Mountain View, California, and a live virtual stream. The call for proposals is open until Friday, September 30, 11:59 pm PDT. Event information.