Summary
- Defines data products as APIs requiring clear documentation.
- Explains data contracts cover usage terms and lifecycle.
- Stresses lineage and metadata for managing dynamic data.
- Supports governance by keeping data in its native environment.
Chapters
Speaking of contracts, we still don't have universal definition either. How are you thinking about data contracts? So we think of a data product as an API for, uh, data.
And as with APIs, you want documentation to make it clear as to what it is that you're getting into, right? And, and what it all means. And my understanding is that all of the different manifestations of data contracts that, uh, that are emerging right now, very similar with some minor differences.
I do think that the, the underlying idea of describing what it is, how it can be used, the terms under which it's provided, whether it's cross charging for it, that's all standard across all of the contracts. The, uh, the SQL script that, you know, may be behind a join or a data set, it will be provided, uh, and data contracts will have a life cycle, right?
As, as with data products, there's going to be the ability to manage a data product through its, uh, inception through to when it gets deleted at the, at the end of it's useful life. Data is not a static thing. One of the things people talk about is, oh, do I have my data in the right place?
But, well, maybe today, but have you thought about lineage and have you thought about metadata? Have you thought about what this data actually means to how it's classified? Because it has downstream consequences on who gets access to some of these data sets or some of the data products to how long they can use them for, or for what purposes.
That's where the contract comes into play, right? That, uh, making sure that the terms and conditions under which that data product is provided are actually being honored in the delivery of the data product.
But where data lives and how it's governed, right, is complex where data is complex and it's messy, and we're trying to simplify all of that. So, so to our way of thinking, data should live in its natural home. But if you're using Salesforce data, leave it in Salesforce, right?
If you need to make that data part of a dashboard that you're presenting, then let's extract the data that we need for the dashboard from Salesforce and, and, and put it there. But what we've seen is that people are building these massive data swamps, data lakes, that are just polluted with all kinds of different, I like data swamps. I, I like the word it.
So what we would like to do is to kind of keep that as clean as possible and as simple as possible by leaving the data where it naturally belongs and, and then providing the ability to put the governance over that.