Datafold automates data pipeline testing for data engineers. With Datafold, data engineers can deal with data quality issues in the pull request by seeing how a change to source code impacts data produced throughout the entire data pipeline/DAG. Datafold is used by data teams at Patreon, Thumbtack, Substack, Angellist, among others, and raised $22M from YC, NEA & Amplify Partners. Our founding story and Launch HN: [https://news.ycombinator.com/item?id=24071955]
So, with diff, we already compute and store detailed statistical profiles on every column in the table. Next, we are going to track those profiles across time.
Diff is just the first tool we've built to get a wedge into the workflows of high-velocity data teams and start adding value, but it's just the beginning of a more comprehensive and, hopefully, valuable product we aspire to deliver.