Summary
- Piethein Strengholt explains medallion architecture concepts and structure.
- Clarifies logical vs. physical layers and practical implementation guidance.
- Compares medallion architecture with traditional data warehousing.
- Shares insights on collaboration, change management, and evolving tech.
Chapters
Thank you everyone for joining. This is a pleasure to host for me.
I'm the Chief Evangelist in Actian. My name is Ole Olesen-Bagneux and in this big medallion debate that we have, I will interview Piethein Strengholt, the author of let me see if I can catch it here, Building Medallion Architectures. And with me, we also have Martin, who will challenge us.
Chesbrough, is that correctly pronounced Chesbro? Yeah, that's perfect. Chesbrough Chesbrough And he was actually part of the reason why we have this debate right now, because I was writing on medallion architectures on LinkedIn, and like the post went viral because I kind of hit a nerve by saying that I don't think medallion architectures will ever go away.
Independently of how modern our data architectures will be, we will have layered architectures to make data ready for analytical consumption. So without further ado, the agenda of this meeting is, or this event, is that I will interview Piethein. I have prepared some detailed questions for his new newly released O'Reilly book Building Medallion Architectures.
It's a great book and that will take approximately 20, 25 minutes where you will learn everything about medallion architecture. And I will ask some critical questions. But the idea is that Martin joins the debate in roughly 20, 25 minutes.
He has prepared a very detailed, very precise set of critical points that he would like to examine on the medallion architecture and the pros and the cons of this architecture. And so we will enter more into a debate later in the conversation. And everyone on the call is obviously most welcome to ask questions.
So, without further ado and I think at this point, Martin, Yeah, I was just gonna say I guess instructions to the participants, they should lodge questions through the Q&A box at the bottom of their Zoom window and then we'll pick those questions up when we get into the debate. Is that the idea? Exactly.
Yeah. Perfect. Thank you.
At which point, I'll turn my video off, let you guys have your interview, and I'll come back later unless I go to sleep. Please stay awake. Please have a hot coffee.
Good, strong, good, strong coffee over there. I've got my coffee yah. Sorry.
Perfect. Thank you so much. I really want, the reason joke by the way is that it's well in the morning for me and I've just got off a flight.
So it's not because of any boredom or anything, I'm actually fascinated for this conversation, so go for it. I can't thank you enough for being here, Martin, thank you so much. And please stay awake.
We'll need you for for the debate later. Okay Peter, so I prepared a lot of questions. As you saw in the preparation of this meeting, I read your book again, I read it very, very carefully.
So I prepared a lot of different questions, but I want to keep it relatively high level. Some of them will go into the level of detail that also matches the level of detail in your great, great book. But for the people on the call here, on the webinar, lets open the set of questions with the obvious one.
Why did you want to write this book, Peter? What motivated you to write it and like, why was that? Yeah, maybe before I answer, let me first quickly, I'll introduce myself.
Yes, Obviously. I'm the author, of course, of this book was building medallion architectures and track the whole conversation. I also wrote another book called Data Management at Scale.
Some people I see in the chat, they might know me from Microsoft where I in the Netherlands worked as the Chief Data Officer. I worked long for ABN AMRO. I recently joined another company called NN Group, so, national, big insurance organization.
Coming back to your question, so why to write a book about medallian architectures. I was fascinated about this subject, while working at Microsoft. So lots of customers, they come to Microsoft, and they would like to get more value out of data, right?
Um, and then the answer often is, well, if you would like to get started with data and generate value, the first thing you need to do is build a platform. So by a data service, platform service. And then the question comes next.
So what should I do then? Well, building out that architecture. And then the best practice that's often provided is, well, you should look at building a medallian architecture.
But there often, honestly, also really the guidance stops. So there's some high level guidance given to enterprises, what these layers are about. And there's bronze and silver and gold, and there's some high level prescriptive guidance.
But in essence, it really boils down to making the right nuances. And yeah, there's empathy as well, and people also find it really hard to interpret these, these design patterns in a precise manner. So, because I saw the lack of guidance there, this motivated me to write a book.
And in the exercise of writing that book, I, before I started writing, I interviewed lots of potential readers, what they wanna do hear from me. And the difference, the difficulty there is there are so many different expectations. And this also highly opinionated architecture.
So I did it in, in this way. So there's an, a more theoretical part. So what is medallion and exactly where do we come from, the layering, et cetera.
Then there's a tutorial, a playbook prescriptively guiding you through the real design with code snippets, screenshots, instructions, how to exactly build this medallion architecture. Then there are case studies, and there I work with different customers from different industries, different sizes. So you will learn, so why they have implemented and how they designed their medallion architectures.
And then lastly, I fast forward and try to connect the dots to other areas of data management and AI. So this is roughly how the book has been structured. Yes, I love that structure.
Reading the book, it's like, you get an introduction to this is how you should do it, then let's do it. Then you see how others have done it. And then finally the book kind of dissolves into a more complicated pattern with multiple medallion architectures that does different things and scales differently.
And I, I totally prefer that last part. And I guess, you know. Also, obviously Piethein, I'm sorry for not giving you more formal introduction.
I'm just taking it for granted that everyone here on the call knows you.
Obviously not everyone would know you, but you are very known author, O'Rielly author, with a lot of readers. Okay so let's dive a little into the medallion architecture itself. What are the three layers, bronze, silver, and gold, briefly explained?
Yeah, briefly it's important to take away upfront, these layers should be interpreted as logical layers. It's a logical design pattern. Often a mistake I see among enterprise I've worked with, they see these as physical layers.
So this is wrong. These are logical layers, how to logically organize your data. So they oversee an end-to-end architecture, but then within these logical boundaries those align to certain responsibilities and concerns.
In the first layer, I see it's mainly about capturing the data, the raw data, validating the data, archiving it in the next layer, in silver, I mainly see. So it's about standardizing data, cleaning data correcting data slightly conforming data. But there, you see lots of discussions already start to happen.
Do you already across source systems need to combine and join that data? Do you build data products, for instance, in that layer? So it very much depends.
Yes. And then in, in gold, that layer I see there, this is where you make the data fit for purpose. So for the eventual consumption.
So the business usage of the data, there, again, well, lots of discussion. Do you need a classic integration layer? If you have overlapping concerns in different groups of users, or do you keep it apart?
What is the role of data products and what are these dense? Again, lots of discussions in these layers. But this is roughly how I see it.
Yes, yes, absolutely. It's very clear when you read the book also. So for people that do not know medallion architecture in depth, just to set the stage, when you talk about these layers being logical and not physical, what does that concretely mean?
What kind of architecture is the outcome of that? Is the question clear? Yeah, the question is clear.
So the physical implementation could very much differ compared to the logical, let's say, abstraction, how to organize your data across these three layers. So you could, for instance, go for a pattern, say you have three physical layers implemented. Right?
Another design pattern. could be, for instance, now I have more physical layers, and I decouple within these layers, again, certain concerns. So the first stage is for instance, for bronze, capturing those raw files, and you keep them Within the original file format.
Then still within bronze, you copy to another inner layer, and you transform the data into the data format. Maybe then again, you copy it to another layer where you merge it with already preexisting data, which is still raw and technical, yet at that stage. So this is an example of three physical layers within one vertical layer.
Yes. Thank you for pointing that out. That's exactly what I was asking for, because I think at least part of the medallion architecture, becomes very heated because people really do not think about that distinction between the logical and the physical layer.
Because if you, if you put, and I should be asking you questions, but I just, I said, don't you, to what extent would you agree in that assumption that people misunderstand the bit, but then, yeah, For me, the motivation of writing that book, I went into so many, many customer discussions, while working at Microsoft, where people interpreted the medallian architecture as just purely three physical layers. Yeah, I really had to open their eyes. Well, no, it's about decoupling concerns.
And therefore, you can have as many layers as you like. Let let me ask you a couple of questions on the layers themselves. First of all, why, so I have a question for each layer.
Why shouldn't you query the bronze layer? Why shouldn't you query it? I mean, a lot of people thinking about it, but why shouldn't you do that?
You could, but it's raw and therefore you are tightly coupled to the original structure of the source systems you have, let's say, on the left of your architecture. So if these structures at the source system side suddenly change, that might cause disrupting effects on the structure of that bronze layer. If you start to operationalize reports and agents on your bronze layer, you'll have difficulties keeping up with all those changes.
So again, a best practice would be to decouple yourself from the structure of the source systems, and then move to the next layer that for this concern. So two questions to the silver layer merged together. First of all, why is the silver layer simple?
You, you call it simple throughout the book. And secondly, why do primarily like a ML and AI engineers, why do they want to query the silver layer? I say simple because I recommend not already to cross source systems start to combine and mix and integrate it.
So the concerns, therefore, should be very much tailored towards what you find at the source system level. In my view, clean and correct the data at that level. So source system level.
Because it still carries forward the authentic context. It's ideal for building operational agents building operational B ports, because the context is very much the same as you would see it within your source system. If you already, in my view, at that stage, start to combine and integrate data across source systems, you need to conform context.
And yeah you will lose the authentic context or the original context as you would expect it from the source system side. And that makes it hard. So then, yeah, if you would like to operationalize or build an operational report, you need to reverse back it into the original context, let's say.
Not ideal. So I would therefore recommend listeners to differ that concern to a later layer. Yeah.
And but again, so this is where you see lots of I also comment, Please go ahead. Sorry. Yeah, so some people, they, they advocate for, for instance, for applying data faults within that silver layer, or already at that time, combining and cross source systems start to integrate, your data already.
Yeah, it depends, again, so look at your own requirements, your own needs. It isn't wrong. It isn't right.
But yeah, I favor the design pattern we discussed. So yeah, park the cross source system integration and defer that to the last link to the gold. Yes.
Agreed. Agreed. Well, I don't wanna say I agree, because I'm not sure I agree on that, but anyway it's what you argue in the book that I agree in.
Um, so on the gold layer my question to the gold layer is you say it's extremely complicated, especially compared to the silver layer. Why is it com why is the gold layer complicated? Yeah, because I often see there are contrasting concerns.
so you would like to harmonize data maybe at an enterprise level or for a certain scope to ensure consistency of data throughout the different use cases. Then there's the need of making data highly specific for what the use case requires. So yeah, there again, you have a dilemma.
Those concerns not easily reconciled. Then there's that need of distributing maybe data across different domains, different teams to external data consumers. They rely on stable, highly reusable data.
Again, you see that concern. So when you have a use case and you would like to be flexible and make lots of changes spontaneously, that contrast with that need of providing high reliable reusable data to different consumers. So therefore, I often see within gold, you start to split these different concerns again apart, and therefore you could see, again, different physical layers within that single gold layer.
Absolutely. So moving into the part of the book that I preferred the most, and I think we will have some time to discuss that, that was the part four of the book, and that's where you really open up and because I sense that another thing that people do not like about medallion architecture is that they think of it as this kind of enterprise wide highway of data transportation that every single use case has to conform to and go through in order to, to provide use cases for analytics, right? And so what I really like about part four of your book is that you really open it up and says, just like in your first book, Data Management at Scale, you provide this, this perspective of flexibility and scalability, which are really core to any modern data architecture, even software architecture, right?
So you really open up and say, okay, we can have multiple medallion architectures in one company. And actually, you even admit that most companies have multiple medallion architectures. So one of the layers that I'm very interested in in that context is the product design and distribution layer that you call it.
So I'm asking very detailed questions here. I hope you can remember all of this, Piethein. So can you elaborate a little bit on what is that layer about?
Because I think that is a very promising layer. Yeah. Coming back to your previous point, I think it's not an assumption.
When you are large, you will have for sure many of these medallion architectures. And the design of these architectures will differ depending on the size, number of source systems, whether you are more provider intensive or consumer intensive. So I, in the book, define different archetypes depending on kind of the nature of that domain and how you could best organize and align yourself, including to these different layers.
But in this multi medallion architecture, when you start to share data seamlessly across, you would like to rely on high quality, reusable, stable data. And therefore, your data model, in my view, becomes way more an interface model. And this is where that data product layer is for.
So providing that robust, stable data, and here lots of data modeling design guidelines come into play, like how to deal with reference data. Just to give a simple example, if all of these, so imagine you have multiple domains, multiple of these medallian architectures, and they all themselves in isolation, rework their data structures and provide these data problems. But you don't align on any data standards or reference data.
It will be for consumers then, at the end, very hard to combine data easily because they need to fight different local reference values and non-conforming data types, for instance. So in my field, lots of data modeling guidance should be provided to all these different teams. So they model and reword their data in a certain way.
All these different teams can easily interpret that data, consume it, combine it, and the list. Yeah, you learned in the book, it's quite extensive. So lots of guidance should be there in order.
No, but indeed, but I, but I'm thinking of this layer as a really, important layer for people that are eager to push forward with more decentralized approaches for the data architectures altogether. Right? Yes.
Maybe, maybe, I think we have time for one more question, because before we, I hand over the mic to Martin, I will stay with my camera on, and I may say a word or two, but I really want to let Martin, address his questions. They are excellent. So one, one last question for me is this, you have a concept that you call the medallion mesh.
We can just talk about that briefly before I hand over the mic to Martin. What is the medallion mesh? You know data mesh?
Uh, right? So and distributed federated architecture, different teams, they all operate their own, let's say, tiny data architectures, and they distribute data across when you layer your data according to the medallion layering we just discussed. And all of these teams kind of do that in a similar way.
This is what I call a medallion mesh architecture. But importantly here, and I think is that it's not per se so much about the layering. It's, I think, more at an enterprise level that layering.
It's a communication pattern. And there often I see things go wrong because teams, they interpret from one another how the layering is done within these different architectures. So it's better, therefore, I recommend to use the medallion layering as a communication pattern.
So all these teams kind of more adhere to the same way they organize their data within these different architectures, so that, that eases the distribution and sharing of data between these different tiny architectures. I think that's a wonderful statement to hand the mic over to Martin. Martin and I are still awake.
Did the coffee work? Yeah, yeah, I'm here. Wonderful, Martin.
Just lurking in the background. So that was, that was a great set of interviews. Great, great set of questions.
I first of all wanted to say, yeah Ole, thanks very much for inviting me on. By way of intro, let me say that, I'm honored here to be in the presence of two O'Reilly authors. You know, you're very honored and all guest personalities in the data world.
Uh, I figure that, you know, I'm here a data architect working for a small engineering consultancy in Melbourne, Australia. And I'll just take on the mantle of representing the layman data architect in all of this. Yeah.
Um, hopefully I can, can do justice to the questions that that might come in as well. Just for context, I wanted to start my position by saying, you know, when Ole praised Piethein's book in LinkedIn, I sort of said half jokingly to him in a comment oh, I think I need to have a serious conversation with you. Ole, I respect your opinion, and I hope you can convince me otherwise.
But medallion architectures have been my biggest frustration as a data architect for some 20 years now. So that's sort of to put this in perspective. And there was a serious point to this as well as a sort of, you know, a nudge to I hope I can call you a friend, you know?
Um and, I guess this is where I think Piethein's book, because then I sent that having not read Piethein's book, right? So I then figured, well, I'd better read the book so that, you know, once you actually sort of started to, to bite on that one and say, yeah, we better have a debate about this. Um, and I found it to be an excellent book.
You know actually, as you say there, I don't believe there are too many books out there explaining medallion architectures. And effectively it came out of the Databricks world and the lakehouse world. And, or actually probably more strictly speaking the data lake world.
And I think that's been one of the challenges, is that it gets interpreted in different ways by different people. And there's not really a forum to debate. And this book gives us a forum to debate.
Now, it just occurs to me as I'm listening to you, Ole question Piethein, that actually, one other possible improvement to the book is actually to, you know, to maybe incorporate some of the different ways medallion gets used and almost like have a, what is it, patterns and anti patterns type of approach. Yeah. Yes.
Because I'm sure that we could probably agree on some of the anti-patterns that we see. Yes. But anyway, I was, by way of introduction, I was really going to focus my discussion around sort of three areas of argument.
And, and mostly it's around looking forward rather than looking back. Right? And so, rather than explaining how we got here, which I think Piethein's book does an excellent job of, let's start working out where we go from here.
Yeah. Given that, you know, we know that organizations are having, I think, greater challenges than ever, in order to try to work out how to take their architectures forward. And I think part of those challenges are actually, interestingly enough the the word data has become more important because it gets twinned with AI.
I even see it within my small engineering consultancy that suddenly, you know, I have a bunch of software engineerings or engineers wanting to build AI applications and then figuring, okay, how am I gonna get the data for it? So, so they need some source. So let me summarize my three arguments to start with.
So my first argument was a general critique of layered architects, layered architectures in general, not really to do with medallion. And actually I agree with you Ole in the sense that I think layered architecture will never go outta style. But I guess the point I'd make is that if you look at the application space, I remember, you know, I'm old enough to remember client server architectures, and then we went to three layers.
You know, you have a, you know, a UI, a business logic, and a data storage layer or something like that. And then, you know, that still exists. You know, if I look at most people's GitHub projects, you know, or the, the Git projects that we build as, as Everest, they'll probably be three layers within the file structure.
Yeah. And you can see them, clearly you're called out. But interestingly enough, even though that, that's already built in, I think as application developers, we've gone beyond that.
We've said, okay, yeah, yeah, look, we've got the three layers. Um, you know, we don't care about them because what we care about is we care about maybe the hexagonal architecture, clean architecture, the maybe, you know, we are more looking at some of, let's say Martin Fowler's enterprise application patents, enterprise architecture application patents, or we're looking at a AWS or Azure well architected style of application architecture. And there's then a debate about, you know, all of these nuances of the architecture pattern, which then, in order to solve the problem, where the conversations in a way gone beyond the layer.
Right? It's not a either or conversation. It's a yes and, right?
And this is, I think, the conversation that we need to get into within data, which is to be and I think you hint at that in your book anyway, Piethein, where you sort of go into the the first of all the three examples, which are really good examples. I think the case studies worth anyone reading the book to go through those because they are real life examples of companies, you know, putting medallion architecture to use. Yeah.
But I think underlying those case studies is a sort of yes and discussion Yes. Yes, we've got medallion, but and, sorry, not but, and we want to do more and we want to deliver greater value, and how are we gonna do that? Yeah.
So that's sort of market number Was really nice. Uh, I enjoyed interviewing them most because they used the Mendelian architecture, but they also quickly realized we need to complement it with lots more. So there's the event driven architecture.
We would like to do application integration, build data intensive applications. How would that reconcile with the Medallian architecture? Where do we draw the boundaries between an application and distributing data across different teams?
So yeah, it was really nice conversation I had with them. Yeah, perfect. I mean, and that's actually the application that I, that's the conversation I find myself having with companies, which is, okay, you might have a Databricks set up or a snowflake or an Azure set up, but then, you know, how are you going to integrate that back into the application layer?
Because that's where you get value out of your data because you've taken decisions and you're actually actioning those decisions. Yeah. And yeah, I have favorite, Like the reverse pattern.
Well, yeah, sometimes it's reverse ETL, but it's just the integration if you like, of the decisioning, let's call it the decisioning part and the actioning part. Yeah. Yeah.
Um, so that was my first argument. The second argument was really actually the AI argument. And this goes to, we were talking about it a little bit at the beginning.
And this goes to the fact that I think within a lot of companies I see now, the data folks in the company are suddenly raising their head above the pair of pet in the technology organization, because suddenly the CEO wants something to happen from an AI perspective. They want some, I dunno, AI agents or something, right? And how the first question comes, in fact, I said that before, you know, the application developer says, where do I get the data from?
Right? How do I, you know, you've told me I've gotta do some rack. How do I build that rack?
Right? Where does the, the context for the organization come from? Right?
The context is stored somewhere within the data layer. So rather than having this, you know, let's say monolith sitting to the site just curated by a selected number of data teams, it feels like it's being dragged into the application architecture of the the company. And so I think that that then, you know, also takes us away from a pure medallion structure and more into thinking, you know, how do the operational and the analytical planes of a company start to integrate, you know, in order to serve AI?
And then the third argument, is, you know, the good old data mesh data product alignment argument. And in fact, you addressed this in the book when I sort of eventually got round to reading the book. So in, you know, part four of the book, you actually have a I think it is chapter, what is it, 12, 13 something?
Um, 12, no, I think it's 13. No, no, 13 is the generative ai, and maybe 11 is the scaling one, where you start talking about data mesh. And and I think that you start actually putting the domain and the federated structure to the fore and the layered structure to the back, to some extent, you're actually answering my question there, because you're suddenly saying, you know, you're sort of saying, well maybe one of the futures that we can look at is where the data product and a polyglot type of data modeling architecture sort of starts to you know, to dominate within organizations.
And in fact, to some extent, you practically see that in a lot of large organizations. Yeah. So there are my three arguments and they're mostly future focused, you know, so it's almost like, you know, okay, we, we wanna learn medallion because I think it's an important part of our history, but I don't think we want to over rotate towards medallion.
Absolutely. Yeah. I think in reflecting, I think on the last chapter where I introduced the the AI part I think probably I would now rewrite it again completely in a different way, knowing things move crazily fast.
But one of the discoveries I did, and we had a brief conversation on this, before is, here at this organization I currently work, we experiment with agents already. We have quite a number of agents in production, but we see two type of agents, agents that are more data oriented, I call them. So, ordinary chat bots or an agent that, I don't know, crawls a database or you do search or it's looks at an historical profile, for instance, or does Customer 360.
There, I think the medallian architecture could make sense. So we have set a direction where we foresee agents and around time that sit near close with the mendelian architecture, but we also see agents likely that sit close with the operational source systems, with the, within the application domain. And we call these operational aligned agents.
And those are context-aware, and operate in low latency, high volumes. And there, I think we need to have a different type of architecture. So probably you need to have operational data stores or data intensive applications.
So the, so other patterns and this is a risk I see. And, and maybe it's good you mentioned this and we should take this away that maybe readers or listeners might interpret. Well, the medallion architecture does it all, and I think this is a false argument.
it covers a certain scope but you need to compliment it with things you already mentioned. Yeah. Okay.
No, that, that makes perfect sense. Um, I mean, one of the things that I also realized as you went through this is that you know, the most people, most implementations I've seen is that within the silver layer, there is a combining of context. I mean, maybe you call that an anti-pattern, but I've seen more often than not, you know, the combining of different contexts.
And so would you call that an anti-pattern? Yeah, I think it's a missed opportunity if you not preserve the authentic context in a historized form. Lots of use cases require the authentic data historize in a slowly changing dimension because you would like to train it for operational machine learning for instance, or operational report.
If you too quickly already start to combine and integrate you lose that opportunity, which of course can overcome with an extra operational data store. You offload the data into another architecture. But I think there's an opportunity there, and not, it will for sure won't address all these operational oriented needs, but You could use it, uh, in that regards, and therefore I propose to defer that action of cross source system combining and integrating to the last layer.
But you're right, I see lots of organizations they already in Silver start to combine and merge and integrate it. You can also do both, right? So you can build that more operational data store type of data set, and you preserve those in your silver, and next to that you have that integration, classic integration.
It's fine. At least that you take away, and address these concerns. That's right.
Makes it sound like it's especially, you know, I think earlier mentioned or you mentioned data vault, you know, it feels like, silver layer is where data vault, or at least the is the main part of the data vault would set maybe the business vault. Would you say the business vault is then the gold layer? Yeah.
It's more the gold layer and, and the role vault or the integration layer is what you see often in silver. Um, I'm honestly not really a fan of data vault. Not not because of the methodology or the design principles.
They for sure give you lots of flexibility, but it's the underlying technology architecture that does not really facilitate that type of data modeling. So we, in Lakehouse, operate in a distributed architecture stored, and compute is decoupled from each other. So if you create thousands, millions of tiny parquet files that are distributed across all these server in data center, you will be fighting a network for sure.
So I, in my previous role, worked with lots of enterprises and they went for, or originally opted for data for design, but they had to stop it because they could not overcome the network challenges. Sure. That's interesting.
I mean, yeah, you want to jump in with some of the questions you're seeing Ole? Um, just, uh, apologies. I think there's just a second or two of delay.
Uh, sorry. Apologies for cutting you off, Martin. Um, so please continue.
I have been looking at the questions in the chat, and they are amazing. Thank you everyone for such detailed discussions, advice, point of views. Obviously not everyone here agrees, that is something that I'm very happy about.
I think we should behind, but Martin, I didn't want to cut you off. So, do you wanna, do you have more questions or should we take a question? Are you okay?
No, No. You, you take over Ole. Okay.
Okay. So, so following up on this, and I think, I think I have like one question that I like in conclusion of your conversation, Martin, with Piethein, I would like to know the border of of the medallion architecture, like what's outside of the medallion architecture. I would like to see that more clearly defined, but maybe we end with that question.
I think some of the participants on the call here deserve to get their questions out. So for example, Anyana, I hope I'm pronouncing that correctly, is asking do we really need a landing zone before the bronze layer? What are the scenarios?
It could be we should think of landing zone before getting into even bronze. Yeah. You make this.
You make this a point in your book, you discuss this. Yes, absolutely. So there's lots of pages even, you describing the nuances and the tradeoffs on this.
First it's more interpretation questions. Sometimes to some people, they made the landing zone part of their medallion architectures. Others clearly draw and position it outside.
So first you need to agree how to interpret the medallion architecture delivery, but in some case, I've seen, if you don't have robust or secure ingestion patterns, it might be wise to intermediately stage or land that data somewhere before pushing it into the formal bronze layer. Again, depends on how you would design the bronze layer, but this is a motivation for lots of organizations I see. To have that extra intermediate landing zone.
It's not mandatory. It's more an optional thing. So that's also good to mention.
Absolutely. Yes. Let's move on.
So there's a question from John O'Gorman actually put it into the Q&A. So I'm just wondering whether we should prioritize what goes into the Q&A or do you want No, no, please. Let's take Them based on interest.
Let, let's have, John O's question. That's great that he's here. So he says, does the existence of medallion, or he asks, does the existence of medallion architecture rely on the continued use of ETL processes?
Put another way, is an alternative architecture like data-centric combined with the use of AI to build applications going to be a factor? I have to say, I'm not quite sure I understand the question, but not Me. Same.
Do you want take it Piethein? I think maybe the layering in my view will always be there, and this isn't new. Right.
I also explained in the beginning of the book, we come from traditional classic enterprise data warehouses. The layering is there for good reasons, and that won't, and that part won't disappear. You could question of course, could you do it virtually, real time?
What will AI at some point mean for the layering?
Can you skip a certain part? I don't know. But things for sure will change.
Maybe that's my view on the answer on that question. I mean, your current job with the agents, with the AI agents, is I think, answering the second part of John's question. Yes.
Which is effectively isn't, I since see agents as like many applications, you know, they are, Those are many pieces of codes that are, And they could pull data from a medallion architecture, but they could very well pull data directly straight from an application or a source system. If you need the most accurate data, yeah, for sure you will go directly to the source system, right? You will call an API endpoint and not consume data from the medallion architecture.
But for maybe for asynchronous type of use cases, you could very well consider using that data. So again, that comes with lots of nuances. But that part has not really been described very well in the book.
I had to limit myself and conform myself to maximum of 300 pages. Yeah. Yeah.
300 Actually, well, that was the other conversation we had just before everyone else joined, which is that if you actually wanted to address everything in medallion, you'd be writing maybe a 600 page book. Yes. Yeah.
And that's not practical for O'Reilly to publish. Yes. No, no.
There's so many things that would also, I guess it's a challenge to write a book on tech that does not go, that does not like, become obsolete very fast. And I guess the longer it is that, the harder that gets.
But, there's a question from Tia, hope I'm pronouncing it correctly again. As a data product owner in an org standing, building up a medallion, we were forced to pull from bronze because silver wasn't there. Then they roll out silver and they changed the names of things to business friendly names for things.
Two years into building out our data product meant we had a huge find for purpose store refactoring to silver. Would it be worth it? Do you copy the question, Piethein?
It's a difficult question again, but Yeah, it is. Sorry to hear this first. Um, yeah, I hope for you things were more easy, but, yeah again, so please look at the guidance and try to follow as much as you can the best practices, recommendations there.
But, yeah, using gold as a direct, as a layer for direct input for data products, I'm personally no fan of because you are tightly coupled again to the the source system structures, the application you will find on the left as input. So I'm much more in favor of decoupling things. So again, having an extra layer and then pulling the data out from the silver layer.
And on data products, again, also what I've written in the book, there are different types of data products. So I distinguish there are more the operational aligned data products. Um, I think Silver is a perfect candidate for that.
And you have more the analytical kind of data products where data is combined and reached, curated and likely the gold layers candidate for that.
Yeah, I think Tia's question is a great one. I would pitch in here to say that if I was working with tia, I would be trying to focus on collaboration across the organization versus being strict around medallion and gold, silver, bronze. Just because, you know, it feels like the challenge is less a challenge of being, you know, adhering to an architecture and much more challenge about getting business value out to stakeholders.
At least maybe I'm reading between the lines here, but that's the feeling I get. Yeah. And it's also this Is the dilemma.
This also I hold this also the dilemma. I started with also in the first chapter that there's this continuous pressure from the business on these things to deliver fast. And on the contrary, we would like to properly design, document our models and do it nicely and decouple and, et cetera.
And yeah, those two don't go hand in hand. And therefore, often I see teams take shortcuts to facilitate the business with their high demands. And following up on that, Piethein, there is a question from Floris, which really like expands on what you're just saying.
How would you cope with organizational, especially C-level board level changes on challenges on applying a medallion architecture? How would you sell it? I know that it sometimes is being seen as too simplistic, just marketing.
It is in a way, but what I learned over time, it really works. So on that level, I think you could abstractly speak about different layers, concerns, you could develop personas, argue well stewards, for instance, work more within silver, maybe gold. While the business analysts, would look at gold, for instance.
So it's business friendly labels, right? Bronze, silver, gold. It's a replacement for all of these different naming conventions we had for the classic layers, like the staging environment, landing zone, curated layer, integration layer.
It's more business friendly. That's exactly the other question that I had, right? Because there's what, that I saw in the text that I wanted to ask.
Um, where is it?
Um, um, yes, that's from Heni. I'm a strong advocate of the architecture pattern, but I'm trying to understand the growing interest in the medallion architecture and how it differ from conventional data warehouse layers, such as the staging layer, foundational layer, and data modeling. And so perhaps expand a little bit on what were you were actually already saying, Piethein.
This is an excellent question. Honestly, it's really no different than what we did before with enterprise data warehousing or how we would manage in a data lake. It's, it's more that these vendors promote the usage of these business friendly labels.
And they work with the executives and on a management level. And that kind of seems from a distance, it solves it all because now we have business friendly labels, and these concerns are nicely aligned. But in practice, it's no different than that really hard work you had to put into building a fantastic data warehouse, for instance.
Yes. The way you, please Martin, go ahead. I was gonna jump in there because I've seen this happen in practice is when a certain vendor comes in and says, no, you know what?
Your data architect are all doing it all wrong. They've got these staging layers and these things, it should all be medallion, bronze, silver, gold, right? Let us send our consultants in to take over from you.
And, you know, at least in the specific example I'm thinking of, I thought that was very disrespectful to the people in the organization that were, I was by the way, an external consultant, right? So, you know, adjudicating in the way across the top, but, that particular vendor, I wasn't too happy with. We also combine now what, what I see, for instance, my organization, we have many different medallian architectures.
So we use the business friendly labels, abstractly to describe kind of the concerns and what happens. And then underneath, there are more the physical layers, and they have different names. There's a raw zone, a curated zone, an enrich zone an itegrated zone, you name it.
And we leave that up to the technical engineers, how to describe those concerns more at the physical level. Yes. So I see I see us getting close to the hour, again, time passed faster than what I expected.
I mean, obviously this is a technical discussion, and I was thinking that we would have time to discuss more things than we have already discussed. But time is, time is up. I mean, Gee, before you call time Ole, there is one question that I'd like to have a little go at, which is actually not really a medallion question, but it's from Juan Cecada who says, urgent versus important, short term versus long term.
This is the big dilemma, how to put it, how to put in the right, in incentive structures. How do we get people to eat their vegetables and go to the gym? And I was going to be suggesting, you know, an answer to that.
A, you know, a sort of little habits type of approach. Yeah. And the idea that, you know, you don't get to lose 20 kilos overnight.
You get to lose 20 kilos if every day you increase or, you know, practice good habits, good eating habits, good exercise habits, et cetera, and almost every day you start again. And so, you know, I think it's the answer to a lot of the questions, things like a generic answer to say okay, you don't solve your medallion problem by just throwing it all away. And and starting again and saying, now we're gonna go on a new fitness diet.
Right? You, solve the problems by little habits every day. Try to get better at explaining and get it better at the discipline of how you do things.
Getting better at thinking about how it's gonna adapt to the future.
At least that was gonna be my pitch to try and close off this debate. Thank you so much. Agreed.
Thank you all of you. There's a bit of delay. So sorry I'm not cutting you off, Piethein, and this is just a bit of today.
Thank you, Juan, for that last comment. Great one. And thank you, Martin, for waiting to join so late in the night where you are.
That's very, very friendly of you. Thank you very much for, for the detailed questions. And obviously, Piethein, thank you very much for your time being with us here today and answering questions.
I hope it got heated enough. If not, people are welcome to continue online. Uh, Jano, please follow up with your question.
It's nice to hear from you again. And everyone in the chat, you're always welcome to reach out. We're on LinkedIn, Medium, Substack, both Piethein and I, Martin also obviously.
So Piethein thank you so much. Thanks Ole for organizing this, big thanks. Sure.
Thank you. Thank you both of you and everyone on the call. Thank you.
Cheers. Thank you very much. Cheers.
Bye-Bye. Cheer, take.