Uber Deploys Exactly-Once Processing System for Ads

*Uber’s new ad processing system was built with open-source technologies and an innovative exactly-once semantic system for accuracy and reliability.*

Uber recently launched a new ad system for vendors on Uber Eats, allowing them to purchase sponsored space on the app, alongside other inventory.

The ad system was a “greenfield” project, as Uber had not built ads into its mobility or freight service. That provided the team with the freedom to build a system from the ground-up, which could comprehensively meet client needs.

“This system is responsible for processing all the events that get generated from ads,” said Jacob Tsafatinos, ex-senior software engineer at Uber and lead engineer of the project, in an online presentation. “It has a lot of dependencies, the clients for this system are data scientists, engineers, operations managers for vendors, automated systems, and advertisers. When we talk about the amount of requirements, they come from trying to balance the different needs of each client.”

The three key requirements of the processing system were:

Speed: Having a near-real time system provides clients with a clearer understanding of how much ad budget has been spent. For example, a vendor can reduce the amount of impressions or stall the ad campaign for a set time if there’s been too many sales in a single day.
Reliability: Anything money related needs to be reliable in order to gain client trust. Uber very rarely goes down, and its backend has been built in a similar way to let clients always have access to their dashboard.
Accuracy: As with reliability, a processing system dealing with money needs to be accurate. If Uber were to overcharge a client, it could lead to them stopping their ad campaign, if they undercharge it costs the company.

One of the main functionalities of the ad processing system is exactly-once semantics for delivery, which helps Uber deal with some of the reliability and accuracy problems inherent in a processing system.

“Exactly-once was crucial for the reliability and the accuracy,” said Tsafatinos. “Flink jobs go down for a variety of reasons, and this entire project is built off the back of Flink. When an issue happens, it restarts and reprocesses all the events from the last time it saved, and if you reprocess ad events you’re going to end up overcharging clients. It really helps with the accuracy guarantees that you need, and when things go wrong you do have that confidence to know we’re not going to reprocess everything.”

The choice of Apache Flink, Kafka, and Pinot were made for several reasons. Uber is one of the biggest deployers of Kafka in the world, so it was clear the framework would be used for message queues. Flink and Pinot offered key advantages for client value, ease-of-development, and the needs of the system.

Speaking on Uber’s decision to work on open-source, Tsafatinos said: “I think it says a lot about the state of open-source because these tools can be leveraged by massive companies but they’re the kinds of companies that give back to the scene.”