Résumé
- Demonstrates how Actian adds context and visibility to KYC data.
- Catalogs and enriches customer information for better understanding.
- Supports trusted data governance and effective data management.
- Enables smarter decisions and stronger KYC outcomes.
Chapitres
Good morning, good afternoon, good evening, everyone. Welcome. Welcome to the webinar.
We're going to give a few minutes for folks to join. I see some people in the waiting room jumping in, so we will get going shortly. Just wanted to say a quick good morning, good afternoon, good evening, wherever you are joining us from.
Again, good morning, good afternoon, good evening. I see some folks joining this session. Wanted to welcome you.
We're going to get going here in about two minutes. I see a couple folks still joining, so want to give everyone a chance to get through and get into the webinar before we kick things off. So, we appreciate your patience.
We should get going here in about just under two minutes, according to my clock. Again, welcome. Again, good morning, good afternoon, good evening, depending on where you're joining us from.
I see a few more folks have joined the session. Wanted to welcome you. We're going to give it about one more minute.
I see a couple of folks jumping in from the waiting room, and we will get things going here shortly. Again, thank you for your patience. We should have plenty of time to get through the content.
Again, welcome, and we will get going here very shortly. Again, good morning, good afternoon, good evening, depending on where you're joining us from. I see a few more folks have joined the session.
We were just giving it about probably less than one minute before we kick things off here. As I mentioned earlier, we've got a few folks kind of joining late, and I wanted to make sure everyone had a chance to see the webinar from the beginning for those on the call or those joining late and watching the recording. Obviously, you've got the recording, so you can see all the information.
We'll give it about five more seconds here. I see one more person joining. All right.
Great. What do you think, Betty? You good to go?
We ready? Let's do it. All right.
Let's do it. Again, welcome everyone. Thank you so much for joining our webinar.
It's nice to see a good group of folks here on the participant panel. We're excited to have you. Before I kick things, or before I hand things off, I wanted to, again, welcome everyone to the session, cover a few of the agenda items, and then we'll get this thing going.
I think we're scheduled for about an hour. For those on the phone, it shouldn't take that long. I think we've got about 45 minutes worth of content.
And then, depending on questions, we should get you out of here early. So again, thanks for joining. I'm going to cover a couple of housekeeping items, just some general things about Zoom, and the webinar series.
We'll do some quick introductions, and then we'll hand the floor over to our presenter, Betty, who will jump into our data intelligence platform to give a little bit of an overview and a demo. We have a few call to actions for those on the attendee list, and then we'll cover any Q&A items. All right.
Sounds good. Thanks, Betty. So real quick, before I hand the floor to Betty, just a couple of housekeeping items for those that are new to Zoom.
I know there's so many different platforms out there. Just want to let you know that all of you are muted. You can only hear my wonderful voice, and Betty's here in a minute.
So for questions, if you have any questions, at the bottom of your window, you should see a A Little More button. It's a circle with three dots on it. If you click that, you should be able to open up a Q&A window.
If you have any questions during the webinar, please type your question in here, and I've got a team standing by that should be able to address your questions. At the end, we can also answer any unanswered questions. We can answer them live.
As you probably heard, this session is being recorded. We will share the recording with you after the session, so you can come back and watch. Again, we're going to run about 45 minutes, just depending on questions and answers.
Should have plenty of time to get you all out of here early. We will include all the resources, not only this recording, but any of the slides and additional- materials will be emailed after this session. And then if you're having any type of technical support, there is a chat function at the bottom as well.
Please feel free to send your chat or your question or your comment in there for any technical support items, and I will try to address those. You can also see my email address. If anything pops up, please feel free to email me.
But Q&A, please make sure you ask your questions in the Q&A and use the chat for any technical items. So those were all the housekeeping items. On to the next slide.
So I wanted to introduce the team that I have standing by for today. Our amazing speaker, Betty Wang, will be taking the floor shortly. We will hand the floor to her.
She'll take you through the material. Also manning our Q&A window, I've got John Dorney, who is a senior sales engineer. I also have Scarlet Webbe.
She is a principal solution architect available to answer any of your questions that you might have. So with that, I will hand the floor to Betty.
Betty, take it away. Amazing. Thanks, John.
So before we dive into today's demo, I just want to take a few minutes, kind of zoom out, and really frame where this session fits into the broader webinar series. We've really designed this program to be a progressive journey, where each session builds on the last, and it's all anchored around a central use case around know your customer. So in the previous session, we explored the AI analyst, which we describe as our activation layer.
This is where insights are surfaced in natural language through an agent user interface. And then today, we're really turning our focus on the data intelligence platform, which is that context layer. This provides that semantic foundation and makes that activation layer possible.
And then in our next session, led by Scarlet, we're going to look at the data observability layer, the trust aspect, really ensuring that the data that's powering your AI systems is reliable and continuously monitored. The reason I'm kind of going over this, right, is all these three sessions really reflect the core pillars of the Actian platform and a full end-to-end solution for building AI-ready data. So with that said, let's just quickly review what was covered in the previous webinar session with AI analysts.
The use case was around a retail bank's compliance team needing to do some investigation on this mismatch between low-risk customers who unexpectedly exceeded suspicious activity thresholds. This traditionally requires a lot of manual work, but in the demo, we saw how AI analysts could identify not just those top KYC transaction mismatches, but also pinpoint specific risk channels, whether that was through wire, Zelle, and provide those kind of plain language explanations in minutes. And the key takeaway from that was that the AI analyst really grounds your answers in your governed business context, definitions, and KPIs, so that it ensures trusted and audit-ready reasoning for those enterprise analytics.
So with that said, now that you've seen how you can activate your data with AI analysts, we're going to start peeling back the layers and understand that context engine behind your AI-ready data with the data intelligence platform. And before we dive into the demo, I'm just going to give a quick overview of how the platform works. So on the bottom here, we start by automatically connecting to all of your existing data sources and harvesting the metadata.
That feeds directly into what we call the core metadata management layer, and that covers all aspects from catalog to glossary, lineage, governance, data products, and data observability. All of that is grounded on our federated knowledge graph, which connects your physical and semantic assets into one single intelligent network so that the users and your AI systems always have full context behind their data. And from an access perspective, users are interfacing with the platform through two main UIs, the Explorer for those business users that are discovering and consuming data, and then the data stewards are interacting in the studio interface for their governing and curating activities.
Now, the end result of this is one unified platform that takes data from that raw source to a trusted and governed catalog.
So to continue from the use case from the last session, right? The demo today will showcase how we actually contextualize that data around KYC customer onboarding process, and the ultimate goal is to detect potential money laundering activities. So really think of KYC compliance as a bank really needing to verify who you are, what your background is, before allowing you to open an account.
And the focus will be on the five steps of this process that we see here. Account registration, identity verification, address verification, risk assessment, in which you're cross-checking customer data against external watch lists and sanctions to ultimately come up with an assigned risk factor. And then the final step with approval and onboarding.
So from a demo agenda here on the right-hand side, we're really going to start in the catalog to search for this KYC process and just get an understanding of that semantic layer. We'll then explore the associated data sets with each step of the process. And how it transforms into a medallion architecture, and any data quality issues along the way throughout the pipeline.
And then finally, we're going to cover one of our product's newest releases with the data steward agent. That's really designed to help automate that enrichment of the metadata. So with that said, let's jump into the product demonstration.
Here, I've landed on the homepage of the Explorer UI. There are several navigation paths to find the relevant data that you need. You can search by keywords in the search bar, look within our data marketplace, search by specific catalogs, or for ease of access, I've actually created a topic specifically around the KYC onboarding process.
So this will take us to a curated search results page with all the process steps associated with KYC onboarding. When we click into that specific process, we see several details here, right? First and foremost is the description, which defines what KYC means.
It's the mandatory process financial institutions use to verify a customer's identity and assess their risk. I also see some feature properties here on the left highlighting regulations it's complying with, approval status, approval date, and the business domain it belongs to. I can also expand the context here on the left, so from a governance perspective, really important to know who's the owner and who can I turn to for questions.
Scrolling down here, I see a glossary hierarchy. Here I can see each of the five steps or sub-processes of that KYC onboarding in that graphic in a parent-child format. And if I wanted to explore a specific sub-step, like account registration, I can simply click on that definition and understand what are the process steps associated with that.
So this is saying, this is where customers provide personal information and agree to the terms. So again, the key takeaway from a view like this is that we can get clear ownership and governance and really fast navigation around our KYC-relevant assets. So now that we've kind of gotten a conceptual understanding of the KYC process, let's explore the actual implementations by navigating to the implementations tab.
So what this tab is doing is it's really serving as the bridge between the business language and the technical data assets. It's answering questions like, "Where is this business term actually used in our data?" And this is what makes the context layer actionable rather than just a static reference document. So we see here that there are three datasets tied to the KYC process, which is really representative of the medallion architecture.
We have the raw customer as the bronze layer, the staging customer as the silver layer, and then bank customer as the gold layer. Within each of these steps in the relationships down here, you can also see which of the five sub-processes of the KYC compliance process corresponds to each dataset. For example, in the raw customer, if we click on this, you'll notice from the definition the three first steps of account registration, identity verification, and address verification are tied to this one.
So again, coming to this details page, it's surfacing key information, including the feature properties here on the left, and of course, as we went through that description to understand which sub-steps are associated with this dataset. So in total, right, this dataset represents that raw, unprocessed data collection from individuals as you're moving through the early stages of the registration process. And this will serve as the foundational input for the downstream KYC activities.
Now, remember, our ultimate goal is to be able to monitor the customer risk throughout the entire data pipeline to detect potential fraudulent transactions. So in order to do that, let's take a look at the end-to-end data flow with the lineage tab here. Initially, when I click on this, I just see the first two data layers, but if I right-click on this plus button and hit expand all levels of lineage, I can see the full end-to-end pipeline.
So as we go through this, right, in this very first stage, as customers are going through the account registration, identity verification, address validation, all of that data is coming into this bronze layer of raw customer. At this stage, we've embedded certain validation rules like checking to see if the address is valid, making sure credit card information, Social Security information, are all in the correct format. And then as we move on to the silver layer with the staging customer, this is where we've cross-checked the external watchlist to identify individuals with high risk.
This could be people in political office or individuals that have high media attention, and we really represent that risk factor with a field that we've labeled as risk factor to assign a certain risk score to individuals. And actually, through our data observability functionality, we can bin data for individuals who are beyond a certain acceptable risk factor and only push those that are acceptable into the gold layer. And then finally, within the gold layer of bank customer in the transactions table, this is where we're really combining customer information with account and transaction information to flag potential fraudulent activity.
So this is the end-to-end lineage that allows you to not just trace where a piece of data is coming from and where it flows, but also be able to do root cause analysis of when an issue occurs, what is the full impact of the data pipeline before you make any changes to it. Now, within the lineage graph, each data set also has a summary level data quality indicator, as you can see on the upper right-hand side. This is what provides an assessment of how each data set is measuring against its defined validation rules.
This allows teams to identify potential data quality issues across the entire pipeline without ever having to leave this lineage view. So let's jump into the transactions roll-up table that I've opened up in a new tab to really see how we can leverage data quality rules to flag potential money laundering transactions. So again, in this details page, as the description indicates, it's containing refined transaction data linked to account and customer information.
Now specifically, the logic here is combining that risk factor that we did in the silver layer assigned to each individual with their transaction amounts to flag potential instances of fraud. We can see the results of this validation check, along with other data quality rules in this data quality tab that I've navigated to. In this view, the top section here summarizes the results by category, like consistency, completeness, timeliness, while the bottom section here actually shows the live data quality checks run against this data set, along with their past failed outcomes.
So we can see here with the money laundering check, this has failed, meaning that a significant number of rows did not meet the specified condition, and this warrants further investigation. So if we click on the money laundering data quality rule here, this will take us directly into our data observability investigator view. This view is an interactive workspace where users can explore, diagnose, and even kick off remediation for data quality issues once anomalies are detected.
This also enables root cause analysis, allowing us to drill into anomalies that have failed those data quality checks and identify exactly which fields or segments or records are driving the issue. So here I filtered by the expectation violation, and we can see here, as this example shows in this first record, that the individual has a high risk factor. This is everything before the dash, even though their transaction amount of 1,282 is lower than 10,000.
So what this is doing is flagging any individuals with a high risk factor above which we've defined as above 12, or anyone with a low risk factor 12 and below, but have a transaction amount over 10,000. I can click on the specific value and identify the exact record that contains this value and be able to start troubleshooting from there. We're going to do a much deeper dive into the full data observability platform, like I mentioned, in the next session with Scarlet.
So if you're interested in understanding everything under the hood with data observability, make sure to join the next session. But from a data intelligence perspective, right, seeing this data quality score directly in this lineage view is what turns this lineage tab from a kind of a static data map to a living risk map, right? It lets you instantly judge not just where the data came from with this end-to-end lineage, but understand how you can trust your data each step of the way and what to troubleshoot if a problem occurred at any step in the data pipeline.
So now that we've seen how the data intelligence platform can surface context, lineage, and data quality across your KYC pipeline, let's take a look at what keeps all of that information accurate and governed at scale with the data steward agent. This piece of the product was launched just earlier this month, and you can see here that the data steward agent is embedded directly in the platform, and it's designed to really automate some of those time-consuming stewardship tasks like metadata documentation, enrichment, and governance. It can perform tasks like updating asset documentation, assigning ownership, and recommending classifications.
So the goal here is not necessarily to replace the human data stewards, but really to, again, reduce some manual stewardship work so that your team can focus on those higher-level governance tasks. You can see that I've already switched to the studio interface in the data intelligence platform, which again is the UI for the data stewards. I'm here filtered on the banking and finance services catalog, which is what houses the data sets we use for the KYC use case.
I have the agent panel open on the right-hand side here. And so as we dive into this, right, one of the most important tasks for data stewards is really making sure that catalog items are at full completion for end user consumption. So I can see here that the staging customer that's part of the silver layer is only at 25%.
So we can start there. I'm going to ask the data steward, "Why is the staging customer data set only at 25% completion?" Now as it's thinking and coming up with the answer, it's walking you through the logic. First, it's explaining how completion is defined, but based on the actual meta model configuration.
So I can see here that there are four criteria that need to be complete in order for a data set to have completion. It has to have a description, it has to have contacts, it needs to have a security classification, and it needs to have a definition here. So each of those weigh exactly equally at 25%, which is why the staging customer here is only at 25%.
So now, not only does it provide an answer, the agent also intelligently suggests next step that we can take to close the gap. So here I have some options, whether I want to write a description, set a security classification, or assign contacts one at a time, or I can fix all those gaps at the same time. So let's go ahead and click on fix all of those gaps.
So you see that it actually switched over to what we call plan mode because we're actually updating multiple components. So what this is doing is allowing you to review a proposed set of changes before any of the write actually happens. This is really useful for large or complex operations.
I'm going to click into the data set so that we can see the changes happening live. And now you can see the logic that it's going through. It's not hallucinating, it's not just making up random answers to be able to reach 100%.
It's really looking at peer data sets so that it can mirror description context, looking at contacts for the ownership piece in a similar manner.
And so you can see here that a peer data set had JD as a data architect. And so that's what it's suggesting as the contact. So I'm going to go ahead and hit execute plan.
And as that's making the proposed changes, we'll be able to see that once those changes are made, the completion level will increase respectively with the changes that are made. So soon here, we should be able to see a description that is generated. I'm going to hit approve.
We can see the description that it generated from the data steward agent. It's looking at where it fits in the overall pipeline, the schema, certain fields that are in it, and then also looking at the certification that its classification marked as restricted. And then under the people section, we have added JD as the data architect, and you can see that the completion level has reached 100%.
So this was a very simple example, but even from this example, we can see how the data steward agent really helps to automate the metadata governance. And truly reducing that manual stewardship work to keep your data consistently trusted and AI ready. So we've covered a lot in a short amount of time.
So just kind of wrapping up and summarizing what we saw in today's session. We really looked at how we establish governance context. All of the data that was feeding that AI analyst, we were able to map that KYC compliance process and the data sets that powered them, again, setting that foundation for trusted and governed data.
We were able to trace the data end-to-end, following the full lineage from raw customer onboarding through the transaction outputs, and looking at the lineage and data quality issues across every step. And then finally, we just saw how we automate metadata at scale with the data steward agent, eliminating some of that manual enrichment work and keeping your catalog accurate and AI ready. So I'm going to turn this back to JJ, and we're going to kind of wrap up with some housekeeping items before we open it up to Q&A.
Really good. Great stuff, Betty. That data steward agent, such a game changer, I feel, in the industry.
It was a great example that you took us through. I was fortunate enough to be able to speak to a number of customers and prospects over the last week, and I think it's really resonating, so great job. So it's been a quiet group.
Not a lot of questions yet in the Q&A window. But I did want to cover a couple of calls to action or housekeeping items to wrap up. I think it was a great session, a lot of good content that Betty covered.
But if anybody has additional questions, we have the team here that can answer, again, any of your questions or take you deeper into the technology. Myself, Betty, we've got John, and Scarlet. Here's our contact information.
Again, if you've got any questions post-event, feel free to reach out to us. If you'd like a deeper dive, I know we covered things at a very high level, but we're here to help. So want to make sure you've got our information.
I'm going to leave this slide up for just another second in case anybody needs to write down or copy our contact information. Next slide, Betty. And then you heard Betty mention it earlier, but this is a series.
We've designed the webinars to flow in a series. We do have the next one coming up in July. If you haven't registered, please feel free to go out to our website.
We've got the registration link listed here. And you can hear Scarlet present On the next solution, which would be our data observability solution, where she'll wrap up this series and tie in this entire know-your-customer process, and tie that off with our data observability solution. Okay.
And then questions. Let's see, Betty. Why don't we go through, and I think we had a question come in into the chat window.
But the questions around the knowledge graph, how does the knowledge graph actually work, and how is it different from any other traditional data catalogs that might be out there? Yeah, that's a good question, and I think this is really what helps bring Actian into what we call more of a next-generation data catalog. So, the federated knowledge graph does a number of things, right?
But the most important thing, I think I mentioned during the presentation, is it really creates those intelligent relationships between your data assets. So it's automatically capturing that context, enriching the metadata, and really just delivering those relevant search results and recommendations. So, while a traditional data catalog is essentially kind of an inventory, it tells you what type of data exists.
The knowledge graph goes one step further by understanding how things relate, so that as you're looking at the context of your data, everything is connected into one network. Good stuff, Betty. And again, for those on the line, if you've got any questions, feel free to chat them into the window and I can ask them live of Betty.
I know we muted you all so that we could not interrupt. Betty, so how long does it take to actually get the data steward agent up and running? The question is around does it need to be trained?
Does it need to be trained on the data in our environment? What does that process look like? I don't know if you can take us through that a little bit.
Yeah. It seems like it would take a long time, but it actually doesn't, right? The agent is really designed to be embedded directly within the catalog workflows, and it's grounded in your instance of the data intelligence platform.
So, tying it back to the first question, right? It's drawing from that existing federated knowledge graph, that semantic layer, meaning that it's pulling the context from what's already in your catalog. So it doesn't really need a separate training process, right?
It's already grounded in how you've set up your meta model and the context from there. Yeah. As I mentioned earlier, of course, I'm a little biased, right?
But I kind of feel like this is going to be a bit of a game changer for the industry. I know our first iteration of the data steward agent, there's a lot of capabilities there, and I know you just barely touched on some of them, and we'll continue to iterate on that. But I really do feel like it's powerful.
And to your point, it doesn't take long to get it up and running and for you to get that immediate value. Couple more questions here. Another one around the data steward agent and permissioning or permissions.
I think access control is always important, and security is always important. So, I guess the question is around is there role-based access control around the data steward agent? And so I guess, is there control around which users can have access to certain features?
Yeah, so I didn't necessarily go into the administration tab for time's sake in this demo. But for the super user who has access to that admin view, access to the data steward agent is a feature toggle that is controlled in that view. So, the way it works is a user is assigned to a specific group, and that group has permissions which designate which functions and catalogs that group has access to.
I kind of showed in the demo when we went to the data steward agent piece, that it was centered on the banking and financial services catalog. So, a specific data steward agent only runs when the user has a specific catalog selected in that studio view. So if you switch between different catalogs within the instance, it'll have a specific data steward for each respective catalog.
Awesome. I can see a question came in. Great demo, by the way, Betty.
Looks like John just actually answered the question, but maybe you can dig into it a little bit more live. The question is around data lineage, and how is the lineage generated? Is it generated automatically by the platform, or do users need to manually stitch/map lineage information after data cataloging?
Yeah, good question. Everything that you saw in the lineage view is created automatically through our data connectors. Happy to send out the documentation with the link to our connectors page to see all the sources we support.
But the way it works, right, is you would harvest the metadata from your data sources, and then we also have connectors for your ETL tools to be able to show what sort of transformations has happened between the different hops of your data pipeline. And so All of that is done automatically, and that lineage view is populated based on those. Nice job.
That was aligned with what JD actually typed in, too. So we're doing good, team. This question was actually a two-part question.
So the second part was, does the platform provide a data quality agent similar to the steward agent to profile and cleanse? So I think Scarlet will get into a little bit more of that in observability in the next session. We have a lot of AI capabilities- Yeah ...
around data observability. So there's agentic functions that we're developing there to be able to handle everything kind of from a functionality standpoint for data observability and quality there. So- Yeah ...
maybe a little bit more of a- A teaser ... preview. Yeah.
Yeah. Sneak. For the next session.
There you go. Way to tee it up. Make sure people show up and attend Scarlet's session here in a couple of weeks.
And again, if you haven't done so already, please head out and register for the last session in the series. You're welcome. There's a "Thanks, team" for answering the questions live.
Any other questions before we let people go? I think we're ahead of schedule, Betty. I think we said we'd be about 45 minutes total, and I think we're going to be about 40 minutes.
But do you want to give anybody else on the line a chance to ask a question? Again, great job highlighting the know your customer process, digging into that data steward agent. As a final reminder, you will be getting a follow-up email with links to information, resources, as well as the link to this recording.
So that will be coming out later today. Give it about another 30 seconds, a minute. It looks like we had a quiet group this time around, but good job.
Again, thanks everyone for participating and attending this session. Feel free to register and sign up for our next webinar. Again, that's July 16th, and we're going to focus in on our Actian Data Observability and close out this three-part series, tie it all up with a nice little bow.
Again, thank you everyone for your participation, and we look forward to seeing you in the next session. Again, great job, Betty. Thanks everyone.
Have a great rest- Thank you, everyone ... of your day, evening.