Change data capture: The critical link for Airbnb, Netflix and Uber

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

The modern data stack (MDS) is foundational for digital disruptors. Consider Netflix. The company pioneered a new business model around video as a service, but much of their success is built upon real-time streaming data.

They’re using analytics to push highly relevant recommendations to viewers. They’re monitoring real-time data to maintain constant visibility into network performance. They’re synchronizing their database of movies and shows with Elasticsearch to enable users to quickly and easily find what they’re looking for.

This has to be in real time, and it has to be 100% accurate. Old-school extract, transform, load (ETL) is simply too slow. To fill this need, Netflix built a change data capture (CDC) tool called DBLog that captures changes in MySQL, PostgreSQL and other data sources, then streams those changes to target data stores for search and analytics.

Netflix required high availability and real-time synchronization. They also needed to minimize the impact on operational databases. CDC keys off of database logs, replicating changes to target databases in the order in which they occur, so it captures changes as they happen, without locking records or otherwise bogging down the source database.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

Data is central to what Netflix does, but they’re not alone in that regard. Companies like Uber, Amazon, Airbnb and Meta are thriving because they truly understand how to make data work to their advantage. Data management and data analytics are strategic pillars for these organizations, and CDC technology plays a central role in their ability to carry out their core missions.

The same can be said of just about any company operating at the top of its game in today’s business environment. If you want your company to operate as an A-player, you need to modernize and master your data. Your competitors are definitely already doing it.

Sub-second integration is the new standard at Airbnb and Uber

In today’s world, a strong customer experience calls for real-time data flows. Airbnb recognized the value of CDC technology in creating a great CX for their customers and hosts. They, too, built their own CDC platform, which they call SpinalTap. Airbnb’s dynamic pricing, availability of listings, and reservation status demand flawless accuracy and consistency across all systems. When an Airbnb customer books a visit, they expect workflows to be very fast and 100% accurate.

For Uber, immediacy is arguably even more important. Whether a customer is waiting for a ride to the airport or ordering a food delivery, timing is critical. Just like Netflix and Airbnb, they developed their own CDC platform to synchronize data across multiple data stores in real-time. Again, a common set of requirements emerged. Uber needed their solution to be extremely fast and fault tolerant, with zero data loss. They also needed a solution that wouldn’t drag down performance on their source databases.

Change data capture for the rest of us

Once again, CDC fits the bill. In the old days, overnight batch-mode ETL might have been adequate to provide a daily executive update or operational reports. Today, real time is increasingly the norm. If information is power, then immediate access to information is turbo power.

That’s why CDC is rapidly becoming a foundational requirement for the modern data stack. It’s all well and good, though, that big companies like Netflix, Airbnb and Uber have the resources to build custom CDC platforms — but what about everyone else?

Off-the-shelf CDC solutions are filling that gap, delivering the same low-latency, high-quality streaming pipelines without the need to build from scratch.

Unfortunately, they’re not all created equal. Most companies operate a collection of systems that handle enterprise resource planning (ERP), customer relationship management (CRM) or specialized operational functions such as procurement or HR. These run on different database platforms, with incongruent data models. If a company operates mainframe systems, then they’re likely dealing with arcane data structures that don’t easily fit alongside modern relational data.

This makes heterogeneous integration especially important. It requires connecting to multiple data sources and targets, including transactional databases like SAP, Oracle, IBM Db2 and Salesforce. It means delivering real-time streaming data to platforms like Databricks, Kafka, Snowflake, Amazon DocumentDB, and Azure Synapse Analytics.

Real-time CDC automation

To drive artificial intelligence (AI) and advanced analytics, enterprises need to push their data to a common MDS platform. That means ingesting information from a variety of sources, transforming it to fit a unified model for analytics, and delivering it to a modern cloud-based data platform.

Change data capture technology serves as a critical link in the data-driven value chain — first by automating data ingestion from source systems, then transforming it on the fly and delivering it to a cloud data platform. Real-time CDC automation ensures that the right information gets to the right place, immediately.

Because they focus only on data that has changed, streaming CDC pipelines offer tremendous efficiency advantages over the batch-mode operations of the past. The best CDC solutions can deliver 100-plus terabytes of data from source to target in less than 30 minutes, with zero data loss.

The shift to cloud computing is well underway. Cloud analytics, in particular, offer distinct advantages for companies that truly understand the transformational role of data. Leading companies in every industry are aligning their strategic visions around data analytics. They’re digitizing their interactions with customers and using algorithms to study data, extract insights, and take action. AI and machine learning are ingesting vast amounts of information, discovering correlations, and identifying anomalies.

Whether you’re leading the way in digital disruption or simply trying to keep up with the pack, CDC technology will play a pivotal role in making the modern data stack a reality and opening the door to digital transformation.

Gary Hagmueller is CEO at Arcion.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers