Data Intelligence

Webinar – Data, Explored #3 – Fundamentals of Metadata Management

November 10, 2025

53:26

Summary

Discusses Ole’s book on data management, logistics, and IT landscapes.
Explores metadata clarity, the Meta Grid concept, and AI’s impact.
Emphasizes leveraging existing data assets, especially in startups.
Calls for continued industry dialogue on metadata applications.

Chapters

00:00Introduction

00:14Event Kickoff

I'm gonna talk a little bit about the logistics and why are we here. So, we are here to talk about book fundamentals of data management, and we're gonna have, uh, ask me anything format on, uh, any topic about metadata you have on your mind. Before we start, I want to do a quick introduction for myself and Ole um, Ole and I, you know, uh, we met when I first read his, uh, first book, which was about enterprise data catalog.

And we've been in touch ever since. We've met at different places, different locations, and I feel that Ole Ha has a deep thought level leadership on metadata. So I'm really excited, uh, for him to talk about, uh, his new book.

Uh, today, uh, myself, I've been in, uh, oil and gas, uh, sector doing data analytics and software engineering for about 17 years. Uh, currently I'm a Databricks, uh, enabling, you know, our different customers, uh, to use the platform effectively. Um, with that, uh, in terms of logistics, uh, all the question and answers.

Uh, please post them in q and a section in your Zoom so that we can see them and prioritize them For, uh, our session today, uh, any conversation you need to have, obviously you can use the Zoom chat. You can't use the audio in this session. Uh, it's just all and me who will be talking.

But all your question and chat are welcome in chat and q and a, uh, appropriately. Um, having said that, I think we can kick off. Um, so Ali, uh, before we kick off, is there anything you wanna say to the audience today?

Well, thank you everyone for coming. I think it's so early that the next couple of minutes people will still log on. Um, but, uh, thank you for everyone for, for joining and continuously joining.

And also Abbe, thank you for, for wanting to do this. I mean, normally I interview people on, on this, uh, webinar series, so we turned the mics around because, uh, you reached out and had a lot of questions on LinkedIn and you're not the only one. And then finally I decided, well, why not?

Uh, why not just do a and ask me anything format? And, and you were up for, for it. And, and so thank you, uh, AB taken the time to, uh, to host this, uh, webinar.

No, no, I'm truly excited about this. You know, I mean, this a great book. It's like, I don't remember seeing a book focused on metadata in a while.

So thank you for, you know, taking time to write the book and, you know, introducing Meta Grid to the world. I mean, it's a topic that a lot of people are trying to figure out, I guess, you know, what is meta grid? How do we use it?

02:50Meta Grid and IT

So I guess we can start with the meta grid. Uh, like what, what do you see, uh, is the value of meta grid once people understand meta grid and try to adopt it in their environments? How do you think that going, Ali?

So, so writing the book, there were actually a lot of people reaching out to me saying, this is exactly what we're doing in our company. And I, I guess I can reveal one guy called, uh, Craig Bob that reached out, uh, from UBS, uh, saying that he was building this kind of architecture. Many more reached out, but uh, had to stay, uh, relatively, um, discreet about it.

Uh, so I couldn't cite them. But basically the re architecture that I described in my book is an architecture. It's a very small and slow and simple architecture to unite or coordinate, I should say, meta data across different meta data repositories.

So it's actually the most, you could say, complicated topic of the book, um, because I talk about metadata in general in the book, uh, but it is also an attempt to deliver an answer in a universe defined very much by the lack of open standards and the complexity of large it landscapes with a lot of technical debt, a lot of political discussions or, uh, like, uh, tendencies in organizations and a lot of uncertainty about what metadata is in general. So I didn't want to leave the reader without a, uh, an answer without some kind of perspective on how to solve the problems that I address, and that answers the better way. Fantastic.

Thank you. And, uh, those who have joined the last few minutes, so please, uh, feel free to post your question in q and a section in Zoom. Um, we just are getting us started.

Um, I don't see any questions right now. Uh, so let's just start with, uh, one question. I talked about meta grid.

What kind of motivated you to write the book and, you know, in such detail that you've written? I think that's an important question too. Uh, the writing process itself, you know, the nature of writing about technical topics is changing and, you know, and, you know, saturation book format.

Uh, can you kind of tell, talk to a little bit about the process itself? Like how did you research about this topic? Why did you think that it was important to talk about meta grid in a consistent fashion?

Right? Yeah, sure, sure. Thank you.

Thank you, Abbe think, um, so it's very, like this book to has been on my mind for, for a long time because I have worked in big, uh, industries in a lot of different verticals, uh, but in particular in regulated industry and pharma. Uh, and throughout my career, I've seen this thing again and again, uh, which is a lack of understanding of the IT landscape, like a fundamental lack of understanding of the, of the IT landscape in a company. Classical questions, how many applications do we have?

How are they integrated? What types of data do they concern? To what extent is that data?

PII like personal identifiable information. To what extent is it confidential? Um, confidential.

Um, what is the reality our, of our physical infrastructure, our servers, where are they located? What's their name? Who owns them?

Like, all of the elements in the IT landscape in a company is extremely opaque. And in my first book, I explored the data catalog as a super effective solution to, to surface the data in, in, in the IT landscape in a company, right? And I, I believe deeply in data catalogs, I, I essentially, I believe deeply in all the technologies we use for various types of, of tasks in, in companies.

But I saw that many of these technologies are implemented in silos. So I've seen that as an enterprise architect, as a data leader in many different companies. I, and I, I, I'm quite certain that it's the reality in every big company, uh, of a certain scale, industrial companies, right?

And so I wanted to write a book about that topic. And it's complicated because it's about reality. So it's not about ideal architectures, it's not about programming languages and greenfield, uh, design ideas.

It's about the reality. And the reality is something different, right? So, so, so that's the book, uh, in its, uh, core essence.

It's a book about all the different types of technologies we use to perform me to data management. And there are quite a lot, and, and a lot of problems arise out of that. I was actually recently asked by someone that's pretty smart.

If I should, if I should, if I should describe my book in one sentence, what would it be? And, and I think my answer to that is companies typically have more than one data catalog. Correct.

And so if you, if you think about that answer, then you can begin thinking about, okay, so what is it you represent in one data catalog? What is it you represent in another? And what's the link between these technologies?

And if you expand that to say, okay, but we also have configuration management databases and asset management systems, and learning management systems and information security management systems, a lot of different technologies that do not actually perform something in the value chain, but just look at the entire IT infrastructure that is performing the value chain. It's just meets the data. And so how is that connected?

That's what, that is what the book is about. Fantastic. That, that's how I read the book.

I'm like, okay, all there are all these catalogs. Well, how do we make sense of them? Right?

Um, yeah.

09:20Data Catalogs and AI

But talking about data catalogs, I think one interesting thing we are seeing because of ai, and I know we are jumping straight to AI now, but I, I think it's important question, like the role of catalogs in ai. Like, it used to be that you organize the catalog, you try to have a centralized enterprise catalog, and you have, you know, all these different catalogs in different places. You know, as you mentioned, asset management, security, learning management systems, so many of them, right?

Um, uh, the, with the advances of AI where, you know, conversational interfaces are becoming kind of more, uh, user friendly, I would say. Like, do you see, do you see the role of these catalogs becoming even more important? Or how do you see, uh, this transition or, uh, this change in terms of how people interact with catalogs?

Right. Oh yeah. Great question.

So, so I think, and I part to discuss that in my book, but it's also a little outside of the book. But, uh, that doesn't matter because I also do discuss it in my book a little bit. So, so the way I see people interacting with catalogs in the age of AI has actually changed quite substantially.

Um, uh, because of two things. What we do in data catalogs will just as in every other metadata repository, be augmented by ai. So a lot of things that are going on in the data catalog will be augmented, is being augmented by ai.

And that is a delicate, uh, development. 'cause some things can be, uh, improved by AI or augmented, and some things cannot. Uh, we're simply talking about the laws of semantics here.

Like certain things will be able, will be, uh, augmented and other things won't. So, so that is definitely something that is, uh, is changing with ai. It's the behavior, it's the actual performance of meta data management inside, in this case a data catalog.

But then on, on the other side, I think data catalogs are also becoming, um, sources themselves, which is a, a completely new thing, a shift in the way, uh, metadata management, uh, is being performed. So instead of just registering a lot of sources, like the core element of, like, the core definition of metadata that I put forward in my book is, is that it is in two places at once so that its purpose making. And think of Amazon, for example, or any other books online bookshop, right?

You search on that online bookshop, and then you find a book. Now, the metadata, the metadata is what ties together the object that you find on that, the online bookshop and the object itself. So that could be the title, it could be the publisher, it could be the author.

All of that is metadata, and it is essentially in two places at once, right? So that's the traditional role of metadata. It's to be in two places at once, so you that we can discover and find and use and manage and govern and, um, the things themselves, be it servers, be it data sets, uh, computers, whatever.

Um, but with the introduction of ai, we also see that metadata structures, especially ontologies, are becoming very, very valuable sources themselves. So like the entire rise of model context, protocol and agent to agent, uh, protocol is something that really testifies to the new role of metadata, which is to be a source. And I describe that in my book, in the ending at the end of my book, uh, using meta grid architecture as a source for, for ai.

Oh, that's fantastic. I think, yeah, it's, uh, it's one of those, uh, things, things that are emerging. And it would be interesting to see how this plays out, because the reason I asked this question was really about, you know, the semantics used to live close to the, where the business users were, right?

Mm-hmm. I was thinking whether semantics should live close to the catalog now, right? Because mm-hmm.

You know, uh, if people have their own semantics close to the edge or close to where they're using this data, uh, where does it make most sense to have that data? Right? I think what are views on that?

Where should semantics live? Like, you know, in the business intelligence, visualization tools or close to the catalog, right? Or, you know, maybe a centralized catalog if there is one, or if people are trying to have a centralized catalog.

Have you thought about that? Is there Yeah. Yeah.

I think, I think, I think there are no right or wrongs in, in what you're saying. I wanna, I wanna flip flop it a little bit in the sense that I, I think a measure of success should be that your business users, your end users do not consider, uh, a data catalog as something that is far away from the business users, right? I think, I think the semantics being very close to, to every single employee, end users, if you will, that should be in a catalog that should not be something that is very far away from a catalog or any other metadata repository.

So that's my ideal. But basically, I think you will not be able to have one solution or one particular setup. I think metadata will live it many different, um, solutions, many different storage solutions and technologies pretty much forever like that.

That is just how metadata, uh, behaves. Yep. We have a couple questions in, uh, q and a.

Lemme, uh, take the first question here. Um, so one of the question is, uh, does a serialized AI model, uh, dependent on this? Or is there an architecture exclusive to a serialized model?

I think the way I read that question is, uh, I think, uh, probably we need some more clarification on that question, but there's a first question about how critical is the data catalog and industry data model adoption of AI model? I think that's a pertinent question, an important question, right? Like in AI adoption, what role does the data catalog really play, and how important is it?

Right? Yeah, I think it's super important if your data catalog is built in the right way. And, and I wanna say this now, I know we're with, um, like I'm the chief of actionist in action, and I, I believe deeply in our technology, um, but keeping it vendor agnostic, I still wanna say something that I think is not technology agnostic, because I think the days of the non, uh, knowledge graph, power data catalogs, at least if you have the ambition to be like an enterprise catalog that is for the entire enterprise, I think the days of, of those, uh, catalogs that are not powered by knowledge graphs are over.

And I see AI as a very, very, uh, strong sign of that. So back to the question, that's why I mentioned this. In the adoption of ai, uh, data catalogs play a huge role simply because, uh, they can be built on a knowledge graph, and that knowledge graph will be able to provide context for AI use cases for many different types of AI use cases.

For example, uh, data analytics, um, knowledge graphs are, are capable of delivering very precious context so that you can improve the ai, uh, performed, uh, data analytics, uh, with a touch of, uh, human intelligence, prove the graph. So I think, I think like, so to answer the question in a simple way, I think data catalogs play a huge role in AI adoption, quite honestly. Okay.

The way I understood it all is that you're talking about catalogs powered by knowledge graph or kind of very crucial to, uh, making AI work in a 11 fashion. Is that kind of a good summary of what you're talking about in terms of knowledge graph in? Yeah, sure.

It's a very popular pattern, right? The knowledge graph, uh, plus, uh, large language models, uh, is something that is super interesting because say the large language model is the text and the, and the knowledge graph is the context, so, right. Uh, so, so that is a very nice, uh, nice way of combining, uh, technologies to achieve, uh, better outcomes for ai.

And it just so happens that that data catalogs, they, they are either built on or connect to, or are expanded by, uh, knowledge graphs. Uh, and, and that, and that knowledge graph context is something that is really precious for AI use cases. Fantastic.

Um, the second question, uh, I think once I get clarity on that, I'll ask that. Uh, but let's go to kind of fundamentals. You know, the, your book is about fundamentals.

How, in your worldview, how you define metadata? What is the criteria when you say something is metadata? How do you, how do you kind of think about that?

Like, okay, this is metadata, right? Because the thing is, you know, people could call uni of metadata, people will call, you know, the, the vector embedding some metadata. I mean, there's, there's so many things that could be tagged as metadata.

So I was very interested in learning about your thought process. Like, how do we say that, oh, this truly is metadata and this is kind of, uh, how do we define it by use case? Like, how do we approach, uh, definition of metadata in an enterprise and in a, you know, enterprise environment or for use case, whichever way you think about it, right?

Yeah. Thank you for that question, Avi. I think, I think that, so to, so to start, and I've been bing the data management community lately.

So, so why not, why don't I continue? Uh, I think the way, the way, uh, metadata has been defined in traditional data management literature, data engineering literature, and there's a couple of exemptions, but, but in, in much literature, I unfortunately find definitions of metadata that are simply lists of subcategories of metadata. So for example, the typical, uh, uh, types of subcategories would be listed as technical metadata, operational metadata, business meta data.

And so these, these explanations, and you find them in a lot of books, they say, okay, metadata comes in various types, comes in these technical types and these operational types and these business types. And there's really nothing wrong with these lists. But like philosophically, and I know this is a dangerous word in the tech context, but philosophically, yeah.

Um, philosophically, you cannot really define anything by just listing subcategories. That's not a definition, that's just a list of subcategories. You don't capture the essence of what you're talking about.

And so, back to your questions, couldn't everything be like a vector? Why is that not metadata? Well, it can be, obviously, it can be, because the core definition of metadata, as I see it, and I stand on the shoulders of, uh, centuries of library information science and library of practice, uh, which is my education.

I have a, like an academic background in this field. Um, metadata is about something, metadata basically is in two places at once. That is the definition of metadata, meaning that it is not what it is, but where it is that characterizes metadata.

Metadata can be anything. It can be operational, it can be technical, it can be business, it can be a lot of other, it can be social media, uh, metadata. Like you can list whatever subcategory you want.

That's not the essence of metadata. It's where it's, and it's, it's in two places at once. That's what characterizes meta data.

It needs to be in two places at once. So yes, a vector embedding, why not list that somewhere to find that vector embedding, then it becomes meet data. Okay.

And, you know, there's a fundamental question. I mean, I'm not gonna get into philosophy, like, you know, business wants data they can trust, you know, I think let's talk about that. And data that is trusted sometimes, you know, people, uh, speak in terms of trust, you know, that this is the data we can trust, right?

So, uh, in terms of, you know, your, uh, definition of metadata, how do we, like how many layers and how much lineage, and how do we kind of build a lineage that did not exist to kind of get to the definition of trusted data? How, how should we approach that in enterprises where typically metadata management hasn't been a priority, right? Like, how do we now think about that and, you know, tackle that head on?

Right? I think, I think the, I think that we're witnessing something that is quite significant. I think we're witnessing a substantial shift in prioritization of, um, of strategic initiatives and companies going from data scientific ambitions with structured data to AI ambitions based on unstructured data.

Mm-hmm. And, and so I think a lot of studies document that, like, uh, and they say a lot of different things. These studies, the MIT study, for example, recently listed that 95% of all AI projects, uh, have no role as you can discuss the details of that study.

Is it even fair to do this analysis at this point in time? Are the metrics, right? Um, I think that the critics have, they, they have some very good points about the, the metrics in this study, but the study itself proves that the strategic interest, like the C-level conversations in companies have fundamentally changed towards ai.

That is out of discussion. Like every company in the world with a bit of ambition wants to push forward on an AI agenda. What does that take?

Well, suddenly it doesn't only take structured data, it also takes unstructured data, right? Right. So building and, and that unstructured data is in the format of, uh, text of, uh, pictures of, uh, everything that Victor embeddings helps out, right?

Speaking of that, right? And so, so I think that metadata in this context is suddenly part, it's suddenly part of these very, very strategic conversations at the sea level. It's not something that, uh, engineers or compliance people, uh, are, are trying to push.

It's actually a very important, uh, ingredient in, in, uh, in being successful with ai because it can deliver so much context, because it can deliver that increase of precision that better performing ai. And, and I think more and more C levels are beginning to understand that. And I think that traditional data management actually finds itself in a bit of a challenge.

Like, if you are in traditional bi, if you've been a traditional data engineer building pipelines, uh, for, uh, more established, uh, data analytics disciplines, then I think that the strategic funding in your company is about to change, and you should adapt to that situation. Uh, and that I think is something that, that is really turning the data community around these years that like the, the hierarchy between structured and unstructured data is changing. The role of meta data is morphing into something else.

It's becoming part of the C-suite, uh, discussions because of AI and a lot of more traditional scientific, uh, machine learning, pro machine learning projects, which are all really cool. Like I have absolutely no, uh, disrespect for these disciplines, but they will have to face a different kind of conversation to get into the c c-suite conversations. So that's how I see it changing, and I'm building trust in data.

Well, I think it will take the same things as before, right? If you're talking about structured data, pretty Much, right? Yeah.

My question was more like, you know, obviously that's like a ever, uh, it's about, you know, constant battle, you know, building trust in data, right? And obviously metadata and lineage plays a huge role. So, um, as you said, you know, as we, uh, confront AI with the, you know, we cannot trust AI responses, so we need to ground it to lineage and metadata and knowledge graphs, you know, ontologies that you mentioned.

Right? So those are the things I think people are working on. Uh, we have other questions about serialized, uh, AI model.

My understanding of serialized AI models is model we can save and kind of retrieve when we need them. The question is really around, is there a recommended metadata management architecture to support the deployment and lifecycle of a serialized AI model? Um, Well, recom, I, I feel like I'm repeating myself here, but, uh, I, I would say that, um, making use of model context protocol to enable, uh, knowledge graphs to be consumed for increased precision is something that is, is really recommendable.

Uh, and by that I mean, look at technologies that, uh, allow for that, have an MCP server or an MCP connector that is, uh, that make their data available on an MCP server. So if your AI technologies, if your AI projects, uh, want to increase their precision with metadata, then, um, then, then connecting to an MCP server, uh, is, is a way to, to, to possibly increase that. So in, in any metadata management strategy, uh, that works with, uh, ai, I think that is a fundamental building block, uh, that you can't ignore.

I hope that answers the question, and if not, please, please expand. Yeah, I'll monitor q and a if there's follow up. But talking about, you know, since you're talking about AI and metadata, and we also talked about kind of definition of metadata quite a bit.

Uh, one of the phenomena of, you know, common usage pattern of AI is kind of summarization. People summarize long documents and, uh, you know, then the thing that kind of contains the essence of that document. What is your view on that?

Do you think auto summaries or AI summaries are a useful metadata or, uh, it's a noise that you can't really turn off, it's just added to the overhead of, uh, our misunder common misunderstanding? You know, I think, Yeah, Well, my, on that, we're just in the beginning of this, so this will become, this will become way better. I think actually we're beginning to see some really cool results and in, in that, uh, particular, um, uh, domain.

So, so, so, no, no, I'm not at all, uh, annoyed by that. And I think the precision, again, I think the precision will increase over time, but I think, so I think one call to action is to, is to begin to think very carefully, carefully about, uh, the unique actions that you perform in the enterprise. I think some very old school disciplines, such as building a business glossary and right, understanding your taxonomies and your ontologies and the various technologies that you make use of is something that should have like top priority, uh, attention, because it'll increase everything.

Uh, ai, it's really, really needed, right? So, so that I think is something that, uh, that will, uh, we'll see increased, uh, increased attention. No, I think you're absolutely right.

I, I even thought about the same topic. I'm like, okay, well, you gotta fine tune the models based on your glossary that's prevalent in kind of your space. Otherwise, you're gonna get this AI generated glossary and, uh, uh, word and usage, that that's not gonna be scalable.

You know? So you gotta probably fine tune the models so that it responds in a way that, uh, employees in your company or in your enterprise understand it, right? So, um, indeed, Indeed.

I, can I just comment on that, Abby? Yeah, Sure, sure, sure. Yeah, because I think one of the, one of the real, uh, interesting, uh, aspects here is that you cannot, you cannot automate the creation of a business glossary that's an addition.

Uh, you, you shouldn't do that. You shouldn't even try to do that. The problem you're facing is that you're creating, like, if we're, to use a long word, a tology, but a more simple word would just be a repetition.

You're not extracting anything from anywhere. You're just, you're just repeating stuff. And so trying to repeat stuff at the metadata level and say, now we have a business glossary, now we have a list of terms or, or even create an ontology based of, uh, simply based of AI would be very messy and very unproductive.

And the thing is, if you do that, if you do that human activity in the beginning of creating either a business glossary or taxonomy or, or even an ontology, right? Like a graph that you, that you create, yeah, that really unifies the understanding of, um, of your enterprise. If you do that and you, and you use that as a, as a source for, for improving ai, then you get great benefits.

Uh, you could, you could auto, like, you could make the activity of tagging. So a very basic example, you could make the activity of, of tagging, uh, data products or data assets in, uh, in a data catalog, uh, something that, that some, that activity AI could perform. But if you have created the business glossary with which you tag these data products, if you created that business glossary with ai, the entire activity falls apart.

It won't deliver any value. So you can automate repetitions, but you cannot automate unique human actions. The creation of the word itself is something that a human needs to do.

Exactly. No, I think that doesn't estimate, that's what I was saying, that you cannot rely on AI for things like glossary because no, you know, that's, that's not a recipe for huge success. Um, okay, so talking about, you know, AI and generative content, um, um, what are your kind of some of the recommended strategies, like in the AI world?

Uh, have you thought about, uh, I don't, I don't think you've written this, uh, in the book itself, but with the ai, how should the workflow of, you know, developing metadata should look like, you know, you're talking about existing repositories, like how do we look at them? How do we augment them with ai? How do we kind of bring them to, uh, discovery, uh, enterprise wide?

Have you given this a thought quite a bit, or what are your thoughts in each this? Yeah, sure. So, so in my book, I, I make it very clear that I don't think that a meta grid architecture is something that you should, uh, put into one specific technology.

It is really about making the existing technology stack and metadata, uh, tech stack that you have in your company better. So if you have a data catalog, or two data catalogs, five data catalogs, what I put suggest is certainly not a new data on top of all these data catalogs to unite them all. I think they do that internally themselves.

Uh, and, uh, the, one of those five data catalogs could be the enterprise data catalog. And that is just fine. I don't propose another layer on top, and you get another technology.

What I propose is that is a methodology, uh, of uniting, uh, organizationally all the teams that are working with technologies so that they can learn from each other and improve from each other. And then if we really push forward, so for example, imagine, uh, list of applications in an enterprise architecture management tool with great descriptions like dedicated for future exploration of how you could make your IT landscape more cost effective, more powerful. If you take that list and you coordinate closely with a configuration management database, then you all of a sudden have better configuration management database and more updated and already configuration management database once these things are handed over to that other team.

And if you take all these teams and all these technologies and explore how they work together, then you can improve the total amount of technologies, how they're performing. So it can save a lot of time, but since you ask about AI of it, you can also use that documentation, all these diagrams, all these descriptions of data metadata types, and you can put that into, you can create a sort of rack architecture that you could potentially have a conversation with, right? Like asking, asking about what kind of metadata repositories are used for, what kind of things, right?

And explore that horizontally. But being a, like a technology in itself, I don't think that would be the point. I wanna improve the existing technologies.

Okay. Make it, that makes sense. Yeah.

Okay. That makes sense. Um, let's talk about, you know, uh, we haven't used the governance word so far, but let's use that a lot of, uh, lot of metadata initiatives, like they kind of, uh, die from the governance fatigue, so to speak, right?

Um, like how do we think about governance when we are talking about, um, you know, metadata initiatives? Like what role does governance play and how is small or how big that needs to be? I guess that's the question.

Mm-hmm. Yeah. So I, I think I would cut it up in two, uh, questions, actually.

So one question is about, is about, um, data governance and, and another question would be about governance in general. So data governance I think is something that is typically performed. Yeah, I, I don't like data governance is typically performed in a, in a, in a relatively siloed way.

I would, I wish, I wish that more companies would be more, uh, holistic in their data, data governance, uh, perspectives. Some companies are definitely have some great authors and like thought leaders out there explaining this. Um, but data governance activities as a, as a, as a dedicated discipline, I think should work, uh, more horizontally in the organization.

And that brings me, so, and I can explain this by saying, talking about governance in more general terms. So you also have a chief information security officer or a data protection officer or, uh, even like quality, uh, quality people that works with measuring the level of, uh, uh, training. Is it as does, does it like, is it sufficient and so forth, right?

If you were in a regulated industry, you know, you need to know the, the processes work with it so forth. So I think a governance aspect altogether is bigger than simply data governance. I would, right?

I really, really encourage data governance people to, to work with other governance and compliance functions in the company to actually improve, to save a lot of time and improve the quality of the metadata in their technologies to be able to able to, to work, uh, together and create more efficient results. No, that's fantastic. Well, thank you for that.

I think, uh, your answer kind of reminded me, another aspect you touch upon your book is the dark metadata. Could you talk a little bit about that and how are you talking about that in your book? And what do we really mean by dark metadata or dark data?

Like how do we think about that? Right. Um, Come again, about rec metadata or what you Say?

No, I think you're talking about the dark metadata and the dark data. Oh, dark, dark metadata. Yes.

Thank you. Yeah. Cool.

Thank you. Thank you for bringing that. All right.

Yeah, I love that because, uh, that you bring it up because that is really something that is close to my heart. Uh, like we're talking about dark data, data that we haven't discovered with our technologies yet, that we haven't met. And, and, and, and I believe in that.

Uh, but dark meta data is something, um, that is, is quite, um, it's actually quite simple to explain, right? Every time we implement a new technology to look at our IT landscape, we start, unfortunately, I have found, and, and this is resonated with us quite like all over the world, to be honest. We started a whiteboard.

We begin by explaining, um, uh, where we should, what, what kind of structure our company has, what kind of employees, what kind of data and so forth. So we begin, we begin from scratch, we begin at a greenfield every single time we implement a metadata technology. And I believe that is wrong, uh, because all the semantics, everything we are trying to map, it already exists.

It exists in technologies that we're not aware of because we typically work in a very siloed approach, right? So, so think of, for example, implementing a data catalog that would be a bunch of data engineers, data scientists, AI analysts and what have you, strategists that work together on implementing a data catalog. But if they interviewed, uh, the asset management system team that would typically be sitting in finance or if they interviewed, uh, like, uh, the records and information management team, uh, yet another compliance function, typically sitting in quality or legal, they would get much of the semantics, much of the metadata that they're like trying to deduct, uh, themselves in a, in a silo.

They would get it just out of the box. And so that's my dark meta data. It's all the metadata that is already sitting out there in your organization.

And that's the reality in every single organization out there, right? Every single company has that dark meta data sitting in a lot of different systems undiscovered. And it's just there for the taking.

And if you take it and use it, you can push faster, way faster with, uh, your initiatives of, uh, like getting value of metadata for AI or simply implementing new technologies. So, so that's, that's dark meta data. It's really this concept that, that I think we're ignoring the reality of the metadata already existing in companies.

It, it has to, like, it has to do with the way, um, companies, uh, push forward these agendas. It's always pushed forward with new technology in a combination of internal employees that wants to be successful, that wants to get promoted, consultants that help these companies implement technologies and then technology vendors that has created the software, right? Yeah.

So, so there, there's no way around that. And that is, that is a reality that you can sometimes be very annoyed at, but that's just the reality. Like you cannot, you cannot step out of that.

So how do we use that reality to, to create something better, faster, smarter? Well, my answer is quite simply, look at the metadata that already exists. Look outside of your silo.

If you're in data analytics, if you're a data scientist, if you are an ai, look at the, like the endpoint management system that the service desk is using to hand out laptops and iPhones for new employees. That system contains a lot of semantics. Look in the knowledge management system, uh, that, that, uh, was implemented 10, 15 years ago, there's a lot of semantics.

Don't let that go to waste. That's dark data. I think you're hinting at kind of thinking about looking at metadata and connecting disciplines across to kind of meet your business objective rather than doing in silos.

Am I, am I inferring your answer correctly? Like you're looking at existing data and trying to connect the dots to build out mm-hmm. To meet your objectives.

You know, you're looking at endpoints, you're looking at knowledge management, you know, knowledge management is, you know, huge, uh, kind of a, a place where there's a lot of metadata about the organizational competency in terms of employees, right? So I think you're talking about using those existing metadata to inform future development programs. And rather than, rather than building a new system altogether to gauge, you know, how employees are doing, I think that's, that's where you're hinting at, right?

That's, that's, that's how I took it. Well, my, the purpose of, of, of, of my book is, is really not to say that you shouldn't implement technology and you should give up and everything is a mess. What I'm saying is that companies typically find themselves in a mess, but there's a way out of the mess.

And that mess takes that, that mess is actually something that is extremely powerful. If you just use it, if you uncover all the metadata in that mess, you can, you can actually improve the existing technologies and you can increase the likelihood of success when implementing new technologies. So I'm not against anything here.

I'm not against technologies. I'm not against companies or persons or, or, or, or consultants. Um, I'm saying that we're working in a way that's not the best way we can do something different.

And that is the meta grid. The meta grid architecture is really saying don't try to explore what applications a company has. Don't try to explore what data types it has.

It is already listed, it is already out there. Go find those other metadata repositories and benefit from that. Harness it, use it instead of wasting your time with yet, yet another unrealistic greenfield project.

46:33Leveraging Resources

That's, um, that's, that's, that's the thinking in the medical architecture link. Exactly. That's what I was looking at too.

I'm like, don't discard what you already have, build upon it. Right. That's what I was saying.

That's Exactly, and I think we're often caught in a mechanic that incentivize the, the contrary we're often caught in, in, in greenfield projects because they are so seductive. If you have no legacy, everything is simple. Right?

But the problem is, even though you can, even though you can implement that in a small bubble, in a little silo of yours, it'll never really scale. 'cause it doesn't address the world around it, right? So, so you, you may be able to like get off the ground and do some interesting work, but if you really wanna scale, you need to talk to your peers, you need to engage with the people around you.

Yep. Yeah, that's what I was exactly, that's exactly how I took it. And that's how I was saying, I'm like, you look at existing assets that you have, you interconnect them and make sense of them.

Even if you're doing greenfield project, they need to be informed by existing, uh, assets that you have in terms of metadata or, you know, data, right? So, yeah. Um, so I think, uh, we touched upon, you know, the graph knowledge, graph ontologies, and the one question that I had in that regard was like, uh, what is your decision test when to choose the graph, uh, uh, or the ontology models versus a simple, a key value pair, right?

Like, how do we, how do we go about thinking, when do we actually need knowledge graph and, you know, elaborate anthologies and things of that? So rather than just key value document, right? I, I think we are now approaching, um, uh, the, the, the, the, like the subtle, uh, border of, uh, advice and ideals, right?

Uh, my ideals, I want to make them very clear. Uh, they are based on my own experience, but I welcome people to differ. But the way I see it is that you would be wise to choose, it's wise to choose, um, a graph, a metadata at the metadata level.

So knowledge graphs are extremely efficient at the metadata level for everything. Metadata management, they are really great. And I know data catalogs based on graphs.

I know enterprise architecture management tools based on graphs, and I know other types of, of metadata tools, knowledge management tools also that are based on graphs. And I find graphs to work really, really well at the metadata layer, at the data layer itself. Uh, i, I am not sure you can run an entire company on a graph database.

I'm not sure you, I would advise for that. So that's where my distinction goes. It's between the data and the metadata layer.

Now, some will, some will, will, will differ in that, and that's completely fine. That's just how I see it. What I've seen work is, is that I know graphs extremely potent at the meet states layer.

Okay. We're coming to the top of the hour. Uh, if there are more questions from audience, please, uh, feel free to put that in question and answer.

Uh, I know you have a hard stop ole and I do as well. Yes. Um, let me see, uh, while we are waiting for more questions, I guess, uh, I may have one more question.

One final question from my side.

50:23Metadata and AI

Um, a you know, a company who doesn't really have lot, you know, a lot of metadata already and they are kind of doing greenfield, you know, it's a startup environment, something like that. Uh, how should they approach, uh, you know, there's, there's no concept of meta grid because they don't really, you know, have where they can tap into existing. Uh, what is your thinking around in that environment, you know, completely green free environment and, you know, AI is kind of powered by metadata, you know, embeddings and you know, all kind of things.

Like what is your ideal approach to metadata management in a very greenfield environment? How do you think people should be thinking about that in that kind of environment? Right?

Oh, well, in that kind of environment, I think, uh, uh, I'm going to sound like such an old man, but I think, I think doing your documentation upfront will save you a lot of, lot of time down the road. So it's a typical startup problem, right? That you're so little that you don't need to do the documentation, and then you grow and then you explode, and then you can't find anything anymore, right?

Yeah. But if, if you're capable of doing the documentation upfront, and perhaps now in this era, I think you're capable of harnessing the results of like problem meta data management instantaneously. And so I think that is a game changer, right?

That because we're in this AI race, you can really use the metadata that you have registered effectively, uh, instantaneously. And so that I think changes, uh, that's a game changer. A real game changer.

No, I think that resonates with me. You know, I think we, for a while we were, uh, from a time where we are like, we don't need documentation, but now there's, uh, machines need a lot of context and documents to answer accurately. So we're going back to the point where we need to have this kind of documentation to be successful with AI in particular.

Uh, if you wanna, we're looking for truth right. On this. Exactly.

Exactly. Yeah. Okay.

Well, um, I guess if, uh, if there are no other questions, I'm happy to, to wrap it off, uh, ab unless you had another question at No, I think, uh, we're good to wrap off then. Thank you so much for taking time today. Uh, there, there are no more questions.

I think we can wrap it off and thank you again. I hope it was useful for audience and uh, I am looking forward to seeing if there's a further interest in, you know, having another discussion around metadata or maybe more kind of tailored towards how this is kind of being utilized in industry already. I'm sure we'll connect and figure it out.

Sure. And I'll see you, uh, around in the world for, um, Databricks events or action events, uh, or maybe joint events. Who knows Exactly.

For sure. I'll see you. Take care.