Alvin improves data quality, maps flows with data lineage platform, nabs $6M

To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.

As the global datasphere continues to grow, companies of all sizes — from startups to enterprises — are aggressively migrating to the modern data stack and leveraging artificial intelligence (AI) and analytics to gain insights across key business functions. The shift has been rapid and such that the global market for big data analytics alone is expected to touch $68 billion by 2025.

Now, while this is good for business, the growth in the volume of data and the number of data consumers is also creating a complex data environment. Essentially, data teams are having a hard time managing complex data pipelines, covering aspects such as data quality, discoverability, reliability, cost and governance.

During their stints with various data companies, Dan Mashiter and Martin Sahlen also encountered similar challenges. As a data engineer, Sahlen was frustrated at learning of errors in the data pipelines via Slack, when it was already too late, while Dan, as a data consumer, found it increasingly difficult to trust data, with metrics looking off and dashboards breaking.

They both traced the problem down to poor tooling for tracing data lineage and identifying errors and inefficiencies that affected data quality.


Low-Code/No-Code Summit

Join today’s leading executives at the Low-Code/No-Code Summit virtually on November 9. Register for your free pass today.

Register Here

Alvin to the rescue

To address the challenge, the duo came up with Alvin, a plug-and-play data lineage platform that lets enterprises map their entire data architecture — starting from how the data is connected to how it is transformed and how it is consumed — to track data quality inefficiencies.

Today, Alvin announced it has raised $6 million in a seed round of funding.

The core technology behind Alvin’s toolkit, which also launches today, automatically builds and maintains a highly accurate graph dataset representing the connections between columns, tables, dashboards, SaaS platforms and people. Then, using this dataset, the platform gives teams an automated way to detect and trace pipeline errors/bugs, reducing data downtime. It also automates regression testing, providing a detailed report of downstream impact before code deployment, as well as cost optimization by identifying unused assets and pipelines and safely removing them.

“By automatically mapping data flows within and across systems, and how it is consumed throughout the business, Alvin is building the operating system for the modern data stack. Alvin gives data teams the tools to measure and improve the key metrics they will now be judged on, and ultimately maximize their impact,” Mashiter said.

Impact analysis on Alvin platform.

The solution connects to enterprise data tools in minutes and starts producing the graph dataset to trace lineage and address data quality issues. It saw organic interest from over 400 companies in the beta stage and is already in use by many of them, Mashiter told VentureBeat.

“Using Alvin, companies succeeded in significantly reducing the time they spent on important data engineering workflows such as removing unused data assets and diagnosing pipeline errors. Alvin has already secured [its] first commercial contracts ahead of [its] full product launch,” he added.

Heated data quality space

A number of companies are already looking at data quality issues, including Monte Carlo, Datafold and Altan. However, as Mashiter said, most of these players see automated data lineage tracing as an added capability.

“Whilst other companies see data lineage as a feature where 70% accuracy and manual curation is acceptable, we see it as the foundational dataset needed to solve many of the challenges facing modern data teams. The accuracy of the automated lineage and usage dataset we are able to generate is market-leading, allowing us to tackle the operational use cases our competitors can’t,” he said.

With this round of funding, which was led by Project A Ventures, the company plans to expand its engineering team and strengthen its product. The roadmap for the platform includes increasing the number of tools it can integrate with to serve more companies and become more integrated into data pipelines and workflows; building out SDKs and CLIs to help engineers build their own tooling and pipelines on top of Alvin; and expanding the feature set of the product, particularly in the area of observability.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.