Data Intelligence

How to Deploy Effective Data Governance, Adopted by Everyone

Actian Corporation

October 8, 2020

big-data-paris-table-ronde-CDO-1

It is no secret that the recent global pandemic has completely changed the way people do business. In March 2020, France was placed in total lockdown, and many companies had to adapt to new ways of working, whether that be by introducing remote working, changing the production agenda, or even shutting down the organization’s operations completely. This health crisis had companies ask themselves: how are we going to deal with the financial, technological, and compliance risks following COVID-19?

At Big Data Paris 2020, we had the pleasure to attend the roundtable “How to deploy effective data governance that is adopted by everyone” led by Christina Poirson, CDO of Société Générale, Chafika Chettaoui, CDO of the Suez Group and Elias Baltassis, Partner & Director, Data & Analytics of the Boston Consulting Group. In this roundtable of approximately 35 minutes, the three data experts explain the importance and the best practices of implementing data governance.

First Steps to Implementing Data Governance

The impact of COVID-19 has not been without underlining the essential challenge of knowing, collecting, preserving, and transmitting quality data. So, has the lockdown pushed companies to want to put in place a data governance strategy? This first question, answered by Elias Baltassis, confirms the strong increase in demand for implementing data governance in France:

“The lockdown certainly accelerated the demand for implementing data governance! It was already a topic for the majority of these companies long before the lockdown, but the health crisis has, of course, pushed companies to strengthen the security and reliability of their data assets”.

So, what is the objective of data governance? And where do you start? Elias explains that the first thing to do is to diagnose the data assets in the enterprise, and identify the sticking points: “Identify the places in the enterprise where there is a loss of value because of poor data quality. This is important because data governance can easily drift into a bureaucratic exercise, which is why you should always keep as a “guide” the value created for the organization, which translates into better data accessibility, better quality, etc”.

Once the diagnosis is done and the sources of value are identified, Elias explains that there are four methodological steps to follow:

  1. Know your company’s data, its structure, and who owns it (via a data glossary for example),
  2. Set up a data policy targeted at the points of friction,
  3. Choose the right tool to deploy these policies across the enterprise
  4. Establish a data culture within the organization, starting with hiring data-driven people, such as Chief Data Officers.

The above methodology is therefore essential before starting any data governance project which, according to Elias, can be implemented fairly quickly: “Data governance can be implemented quickly, but increasing data quality will take more or less time, depending on the complexity of the business; a company working with one country will take less time than a company working with several countries in Europe for example”.

The Role of the Chief Data Officer in the Implementation of Data Governance

Christina Poirson, explains that for her and Société Générale, data governance played a very important role during this exceptional period: “Fortunately, we had data governance in place that ensured the quality and protection of data during lockdown to our professional and private customers. We realized the importance of the couple digitization and data, which has been vital not only for our work during the crisis, but also for tomorrow’s business.”

So how did a company as large, old and with thousands of data records as Société Générale implement a new data governance strategy? Christina explains that data at Société Générale is not a recent topic. Indeed, since the very beginnings, the firm has been asking for information about the client in order to be able to advise them on what type of loan to put in place, for example.

However, Société Générale’s CDO tells us that today, with digitization, there are new types, formats and volumes of data. It confirms what Elias Baltassis said just before: “The implementation of a data office and Chief Data Officers was one of the first steps in the company’s data strategy. Our role is to maximize the value of data while respecting the protection of sensitive data, which is very important in the banking world.”.

To do this, Christina explains that Société Générale supports this strategy throughout the data’s life cycle: from its creation to its end, including its qualification, protection, use, anonymization and destruction.

On the other hand, Chafika Chettaoui, CDO of the Suez group, explains that she sees herself as a conductor:

“What Suez lacked was a conductor who had to organize how IT can meet the business objectives. Today, with the increasing amount of data, the CDO has to be the conductor for the IT, business, and even HR and communication departments, because data and digital transformation is above all a human transformation. They have to be the organizer to ensure the quality and accessibility of the data as well as its analysis”.

But above all, the two speakers agreed that a CDO has two main missions:

  • The implementation of different standards on data quality and protection.
  • Must break down data silos by creating a common language around data, or data fluency, in all parts of the enterprise.

Data Acculturation in the Enterprise

We don’t need to remind you that building a data culture within the company is essential to create value with its data. Christina Poirson explains that data acculturation was quite a long process for Société Générale:

“To implement data culture, we went through what we call “data mapping” at all levels of the managerial structure, from top management to employees. We also had to set up coaching sessions, coding training and other dedicated awareness sessions. We have also made available all the SG Group’s use cases in a catalog of ideas so that every company in the group can be inspired: it’s a library of use cases that is there to inspire people”. 

She goes on to explain that they have other ways of acculturating employees at Société Générale:

  • Setting up a library of algorithms to reuse what has already been set up.
  • Implementing specific tools to assess whether the data complies with the regulations.
  • Making data accessible through a group data catalog.

Data acculturation was therefore not an easy task for Société Générale. But, Christina remains positive and tells us a little analogy: “Data is like water, CIOs are the pipes, and businesses make demands related to water. There must therefore be a symbiosis between the IT, CIO and the business departments”.

Chafika Chettaoui adds: “Indeed, it is imperative to work with and for the business. Our job is to appoint people in the business units who will be responsible for their data.  We have to give the responsibility back to everyone: the IT for building the house, and the business for what we put inside. By putting this balance in place, there are back and forth exchanges and it is not just the IT’s responsibility”.

Roles in Data Governance

Although roles and responsibilities vary from company to company, in this roundtable discussion, the two Chief Data Officers explain how role allocation works within their data strategy.

At Société Générale they have fairly strong convictions. First of all, they set up “Data Owners”, who are part of the business, who are responsible for:

  • The definition of their data.
  • Their main uses.
  • Their associated quality level.

On the other hand, if a data user wants to use that data, they don’t have to ask permission from the Data Owner, otherwise the whole system is corrupt. As a result, Société Générale has put in place measures to ensure that they check compliance rules and regulations, without calling the Data Owner into question: “the data at Société Générale belongs either to the customer or to the whole company, but not to a particular BU or department. We manage to create value from the moment the data is shared”.

At Suez, Chafika Chettaoui confirms that they have the same definition of Data Owner, but she adds another role, that of the Data Steward. At Suez, the Data Steward is the one who is on site, making sure that the data flows work.

She explains: “The Data Steward is someone who will animate the so-called Data Producers (the people who collect the data in the systems), make sure they are well trained and understand the quality of the data, as well as be the one who will hold the dashboards and analyze if there are any inconsistencies. It’s someone in the business, but with a real data valency and an understanding of the data and its value”.

What are the Key Best Practices for Implementing Data Governance?

What should never be forgotten in implementing data governance is to remember that data does not belong to one part of the organization but must be shared to all. It is therefore imperative to standardize the data. To do this, Christina Poirson explains the importance of a data dictionary: “by adding a data dictionary that includes the name, definition, data owner, and quality level of the data, you already have a first brick in your governance”.

As mentioned above, the second good practice in data governance is to define roles and responsibilities around data. In addition to a Data Owner or Data Steward, it is essential to define a series of roles to accompany each key stage in the use of the data. Some of these roles can be :

  • Data Quality Manager
  • Data Protection Analyst
  • Data Usages Analyst
  • Data Analyst
  • Data Scientist
  • Data Protection Officer

As a final best practice recommendation for successful data governance, Christina Poirson explains the importance of knowing your data environment, as well as your risk appetency, the rules of each business unit, industry and service to truly facilitate data accessibility and compliance. 

…and the Mistakes to Avoid?

To end the roundtable, Chafika Chettaoui talks about the mistakes to avoid in order to succeed in governance. According to her, we must not start with technology. Even if, of course, technology and expertise are essential to implementing data governance, it is very important to focus first on the culture of the company.

She states: “Establishing a data culture with training is essential. On the one hand we have to break the myth that data and AI are “magical”, and on the other break the myth of “intuition” of some experts, by explaining the importance of data in the enterprise. The cultural aspect is key, and at any level of the organization. ” 

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Retail 4.0: How Monoprix Migrated to the Cloud

Actian Corporation

October 1, 2020

monoprix

Omni-channel leader with a presence in more than 250 cities in France, Monoprix, a French retail chain, offers varied innovative products and services every day with a single objective in mind: “making the good and the beautiful accessible to all”.

The company’s stores combine food retailing with hardware, clothing, household items, and gifts. To give some stats on the firm, Monoprix in 2020 is:

  • Nearly 590 stores in France.
  • 22,000 employees.
  • Approximately 100 stores internationally.
  • 800,000 customers per day.
  • 466 local partner producers.

With close to one million customers in-store and more than 1.5 million users on their website each day, it’s no secret that Monoprix has hundreds of thousands of data points to manage. Whether it’s from loyalty cards, customer receipts, or online delivery orders, the company has to manage a huge amount of data in a variety of formats.

At Big Data Paris 2020, Damien Pichot, Director of Operations and Merchandise Flows at Monoprix shared with us the company’s journey in implementing a data-driven culture thanks to the Cloud.

Big Data at Monoprix

In response to the amount of data that was coming into Monoprix’s data systems every day, the company had implemented various technologies: an on-premise data warehouse for structured data and a data lake in the cloud, which was used to manage the semi-structured data coming from their websites. In addition, a lot of data also comes from partners or service providers, in the context of information exchanges and acquisitions.

Despite the fact that the architecture had been working well and fulfilling its role for many years, it was beginning to show its limitations and weaknesses:

“To illustrate, every Monday, our teams gather and analyze the profits made and everything that happened the previous week. As time went by, we realized that each week the number of users logging in to our information systems was increasing and we were reaching saturation. In fact, some of our employees would have to get up at 5am to launch their queries, only to retrieve it that day in the late morning or early afternoon,” explains Damien Pichot.

Another negative aspect of the company’s IT structure was regarding their business users, and more specifically the marketing users. They were beginning to develop analytical environments outside the control of the IT department, thus creating what is known as “shadow IT”.  The Monoprix data teams were obviously dissatisfied because they had no supervision over the business projects.

“The IT department represented within Monoprix was therefore not at the service of the business and did not meet its expectations”.

After consulting the IT committee, Monoprix decided to break off their contract with their large on-premise structure. The new solution had to answer four questions:

  • Does the solution allow business users to be autonomous?
  • Is the service efficient / resilient?
  • Will the solution lower operating costs?
  • Will users have access to a single platform that will enable them to extract all the data from the data warehouse and the data lake in order to meet business, decision-making, machine learning and data science challenges?

After careful consideration, Monoprix finally decided to migrate everything to the Cloud. “Even if we had opted for another big on-premise solution, we would have faced the same problems at some point. We might have gained two years, but that’s not viable in the long term.”

Monoprix’s Journey to the Cloud

Monoprix started this new adventure in the Cloud with Snowflake. Only a few months after its implementation, Monoprix quickly saw improvements  compared to their previous architecture. Snowflake was also able to meet their needs in terms of data sharing, which is something they were struggling to do before, as well as robustness and data availability.

The First Steps

During the conference, Damien Pichot explained that it was not easy to convince Monoprix teams that a migration to the Cloud was secure. They were reassured with the implementation of Snowflake, which carries out a level of security as high as that of the pharmaceutical and banking industries in the United States.

To give themselves all the means possible to make this project a success, Monoprix decided to create a dedicated team, made up of numerous people such as project managers, integrators, managers of specific applications, etc. The official launch of the project took place in March 2019.

Damien Pichot had organized a kickoff, inviting all the company’s business lines: “I didn’t want it to be an IT project but a company project, I am convinced that this project should be driven by the business lines and for the business lines”.

Damien tells us that the day before the project was launched, he had trouble sleeping! Indeed, Monoprix is the first French company to embark on the total migration of an on-premise data warehouse to the Cloud.

The Challenges of the Project

The migration was done in an iterative way, due to a strong technical legacy, because everything needed to be reintegrated in a technology as modern as Snowflake. Indeed, Monoprix had big problems with its connectors: “We thought at the time that the hardest part of the project would be to automate the data processing. But the most complicated part was to replatform our ETLs in a new environment. So we went from a 12-month project to a 15-month project.”

The New Architecture

Monoprix therefore handles two types of data: structured and semi-structured data. The structured data comes from their classic data warehouse, which contains data from the Supply Chain, Marketing, customer transactions, etc. And the semi-structured data that comes from website-related events. All of this is now converged via ETLs into a single platform running on Azure with Snowflake. “Thanks to this new architecture in the Cloud we can attack the data we want via different applications,” says Damien.

Conclusion: Monoprix is Better in the Cloud

Since May 2020, Monoprix has been managing its data in the Cloud, and it’s been “life changing”. On the business side, there is less latency, queries that used to take hours now take minutes, (and employees are finally sleeping in the morning!). Business analyses are also much deeper, with the possibility of making analyses over five years, which was not possible with the old IT structure. But the most important point is the ability to easily share data with the firm’s partners and service providers.

Damien proudly explains, “With the old structure, our marketing teams took 15 days to prepare the data and had to send thousands of files to our providers, today they connect in a few minutes and they fetch the data alone, without us having to intervene. That alone is a direct ROI.”

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

A Two-Part Approach to Track & Trace in Urban and Suburban Settings

Actian Corporation

September 25, 2020

Concept of 5G wireless digital connection and internet of things future

As businesses work to reopen and students return, in person, to schools nationwide, track and trace is a critical component to mitigate potential resurgences in COVID-19 cases. The more densely populated a city, the harder track and trace will be. However, getting urban and suburban areas moving again is critical to jumpstarting the US economy.

Track and trace is a tall order for several reasons, including public resistance – from wearing masks to social distancing – to allowing third-party surveillance of location and identity through mobile devices. Furthermore, the question of how to prioritize where track and trace should be concentrated and how and when to leverage automation versus human investigators must be answered.

A combination of IoT, community-oriented, psychology-based messaging, and big data analytics could be the key to successfully reopening the economy.

With a Little Help From My Friends

No matter how many tracers are pressed into service, the task will be insurmountable without a behavior change, driven by a change in mindset. There are current and past programs that provide lessons, guidance and proof that behavior can be changed. For example, anti-smoking campaigns were very segmented, with different messaging for underage smokers versus adults. In both cases, the point was to use social expectations as a means of changing behavior. In the case of anti-drunk driving, ad campaigns were focused on peer pressure and provided a positive behavior recommendation – a designated driver – alongside the consequences of driving drunk.

These programs have been successful, but it has taken years – and in some cases, decades – for society to adapt to new norms and expectations. The key is to leverage as many different channels as possible, using as many unique messages and mechanisms as possible, combined with big data analytics to determine what is and isn’t working.

Mayors, city councils, county commissions and other government institutions will need to look to local organizations, such as travel and tourism, public health information, licensing and inspection, 311 and city services portals to help spread messaging on social distancing and mask-wearing. They’ll also need to leverage existing and new communication channels, such as anonymous tip lines through designated Snapchat and Instagram profiles, online chatbots, and toll-free phone calls, to ensure the public’s cooperation, as well as their willingness to opt-in to automated programs for track and trace.

This is exactly where big data analytics can step in, if the proper data science and underlying data warehouse can be put into place quickly. In most cases, the need to rapidly ingest the proper data across multiple city services and their web click streams, transactions, conversations, and other communications that will need analysis is best stood up on a cloud platform.

IoT: The Front-End Piece to the Puzzle

For those individuals that opt-in to having their mobile devices tracked, local governments will need to work with wireless service providers to support tracking location and (masked) identities as they move from one cell tower to the next, while also using cell towers to triangulate position. Many wireless service providers already have network analytics around per-call measurement data that can be repurposed for this, but cell tower information is only part of the equation.

Identifying locations, location conditions and who is at said location based on their cell phone is the other, larger part of the equation. However, cities have some IoT infrastructure in place that can be leveraged in support of track and trace programs. For instance, existing video surveillance cameras can be leveraged to evaluate social distancing and mask wearing through facial and movement detection algorithms. This data can also be used in conjunction with network analytics data to review footage for those who have come in contact with someone who has COVID, confirming if they had a mask on and for how long, and how far they were inside the 6-foot perimeter.

Adding inexpensive IoT solutions can further improve outcome analysis for opt-ins to track and trace and stop-the-spread compliance programs. For example, on trains and buses, seat separation can be monitored either by pressure monitors or LEDs with RF signaling to a local Raspberry Pi in the bus or train car. This then maps out how densely packed seating is and whether or not people are adhering to seating guidelines. This could also be applied to classrooms, movie theaters, and taxi/ridesharing services.

The problem is pressing, but the use of IoT devices and data gives us options and a way to act quickly. The combination of big data analytics and IoT to support comprehensive smart programs gives local governments the chance to get their cities moving again, and their local economies, while better avoiding potential resurgences of COVID-19 spikes this fall.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Platform

Five Benefits of the Hybrid Cloud Approach

Actian Corporation

September 21, 2020

Hybrid Cloud benefits

The ISO 17788 Cloud Computing Overview and Vocabulary standard defines a hybrid cloud as “a cloud deployment model using at least two different cloud deployment models.” When we talk about different deployment models, we are talking about private clouds, public clouds with multiple different cloud providers, and even legacy applications. When the different deployment models are combined, the multi-hybrid cloud is created. The multi-hybrid model allows businesses to optimize their operations.

Competitive Advantage

When a business employs a private and public cloud, it can make careful decisions regarding what applications and services are deployed to each part of the architecture. Often the investment in services that differentiate the business from the competition is kept on the private cloud. Other services that need to be provided but are not part of the core business services can be deployed to the public cloud.

It is even likely that the cloud provider may offer these necessary services so the development team can focus on driving business value. Because it is easy to provision resources, the public cloud remains a good way to manage the resources needed for data storage.

Deployment to a Cloud Provider

There are many cloud service providers, but the top three in several assessments related to market share are Amazon Web Services (AWS), Microsoft Azure, and Google Cloud. A hybrid cloud solution for a business can include one or more providers. While all of the cloud providers offer similar basic services, they all have unique offerings that need to be reviewed for applicability to a business need. For example, GCP offers Cloud Interconnect, a dedicated connection option that bypasses the need for an Internet Service Provider. Cloud Interconnect allows Google to offer Service Level Agreements that include guaranteed uptimes as high as 99.99%.

Data Residency

Business needs for hybrid clouds include the performance of the network when accessing data and applications. There are two factors to consider, latency and throughput. Latency is defined as the time taken for a data transaction request to complete a round trip between the sender and the receiver. One of the many options offered by the various cloud providers is the location of their servers. In the case of AWS, Azure, and Google Cloud, the servers are spread out throughout the world. The location of the data and applications impacts the latency of serving that data. This is especially important for transactional database access. Each of the cloud service providers offers a simple web-based method for determining network latency.

Beyond the questions of performance, there are other issues concerned with data residency. For example, there are multiple categories of sensitive data, including personal data, trade controlled data, regulated data. The location and movement of data needs to be understood, monitored and managed. In a cloud solution, the question becomes: what knowledge of and access to the customer’s data does the custodian (cloud service provider) have, and who is at risk for storing data that may contravene local laws? What access does the provider/processor have to the data? Answers to these questions are an important aspect to determining whether data is best stored in the cloud or on-premises.

Benefits of the Hybrid Approach

There are several benefits to the multi-hybrid cloud approach.

  • Flexibility – Businesses can repartition or deploy elements of a solution based on changing technology services or improved capabilities of one or more of the cloud service providers. Actian Data Platform supports AWS, Azure, Google Cloud and on-prem as deployment platforms. It uses the same database, data model, and ETL integration on-premises and in each of these cloud providers, thereby offering significant deployment flexibility.
  • Performance – Actian Data Platform takes full advantage of modern processors to maximize concurrency, parallelism, and resource utilization. In addition, businesses can quickly leverage new capabilities in their solutions no matter where they are available in the hybrid cloud.
  • Elasticity – The number of Compute Nodes in an Actian Data Platform cloud deployment can start at a low of 4 nodes, automatically ramp up based on usage and demand, with peak end of week or end of month capacity when needed. Thus, resources are allocated and paid for only if and when needed.
  • Consistency – Businesses can support continuous delivery of applications across the hybrid cloud leveraging common tools and processes.
  • Agility – Businesses can design and develop solutions in such a manner that where they are deployed across a hybrid cloud can be adjusted in a seamless manner.

Costs of the Hybrid Approach

Most cloud service providers offer the idea that infrastructure updates are automatically applied so that businesses do not need to manage updates. Automatic updates may cause issues with applications that are sensitive to versions, so they still need to be tracked. Beyond updates, there are potentially concerns about managing multiple cloud providers, but the benefits outweigh the costs.

The multi-hybrid cloud solution using an application that has been architected for performance and the combined on-premises and cloud environment provides the power to query and analyze business data at near real-time speed.

Actian Data Platform is a hybrid cloud data warehouse service designed to deliver high performance and scale at a fraction of the cost of alternative solutions. It is a hybrid platform that can be deployed on-premises as well as on multiple clouds, including AWS, Azure, and Google Cloud.

You can learn more about Actian here.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Analytics

Predictive Analytics Can Reduce Customer Churn, Optimize Marketing

Actian Corporation

September 17, 2020

Predictive Analysis

All customers are not equal – there are some your business just can’t afford to lose. Do you know who they are and what is required to retain them? With predictive analytics powered by Actian, you can do just that.

Companies have been using statistical modeling, data correlation and behavioral forecasting for many years to profile customers. Unfortunately, traditional capabilities have been limited by the amount and types of data they have been able to include in the analysis, leading to incomplete and inconclusive results. With Actian, you can increase the accuracy of your churn predictions by combining traditional transactional and account datasets with call center text logs, marketing campaign response data, competitive offers, social media and many other sources to develop a truly holistic understanding of your customer’s buying behavior. Having the ability to leverage all this data will enable your organization to leverage predictive analytics more effectively.

Aggregated customer profiling data can help you discover new classifications and customer segments and assign lifetime value and churn scores that clearly indicate which of your customers are most important to your business – and who you can’t afford to lose. This information can be used to customize your customer experiences, provide enhanced support and create programs to retain your high-value customers. Identify the forces that influence these customers’ buying behavior and use them to customize marketing efforts, provide personalized customer service and even optimize your service supply chain. Predictive analytics is more than just a big data play, it is a critical business requirement.

Just as not all customers are equal, neither are all customer segments. Some groups of customers are high-value, purchasing your products and services repeatedly, ordering large quantities and generating large profit margins. Other customer segments are low-value, with larger customer acquisition costs, low-order volumes, few repeat purchases and low profitability due to price competition and discount demands.

If you want your company to grow and thrive, then you must focus your marketing and product planning efforts on developing high-value customer segments and potentially offload less-profitable segments. To accomplish this effectively, you must understand your customer segments and which customers align to these segments. Actian helps you do this by giving you the tools to aggregate all your customer data in one place, analyze it in real-time and make actionable insights available to your staff. These insights can help you improve customer satisfaction and loyalty, optimize supply chains, accurately price products, develop effective marketing campaigns and reduce the likelihood of your customers taking their business elsewhere.

Some level of customer churn is inevitable. Customers’ needs change and the offers from your competitors are enticing. Predictive analytics help you identify customer behavioral changes and market threats early, so you can make informed decisions about how you want to respond. Stop guessing with your marketing investments and focus your resources on the activities that will make your customers happy, increase their lifetime value to your company and generate sustainable results.

Actian Data Platform can help. Actian is a highly efficient data analytics database service that can process large amounts of data in near real-time by separating it into small chunks that are processed in parallel. What this means is you can perform customer-churn analysis behind the scenes to retain more customers. To learn more, visit www.actian.com/data-platform.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

A Smart Data Catalog, a Must-Have for Data Leaders

Actian Corporation

August 26, 2020

smart data catalogs

The term “smart data catalog” has become a buzzword over the past few months. However, when referring to something being “smart,” most people automatically think, and rightly so, of a data catalog with only Machine Learning capabilities.

We do not believe that a smart data catalog is reduced to only having ML features.

There are many different ways to be “smart”. This article focuses on the conference that Guillaume Bodet gave at the Data Innovation Summit 2020: “Smart Data Catalogs, A Must-Have for Leaders”.

A Quick Definition of Data Catalog

We define a data catalog as being:

A detailed inventory of all data assets in an organization and their metadata, designed to help data professionals quickly find the most appropriate data for any analytical business purpose.

A data catalog is meant to serve different people or end-users. All of these end-users have different expectations, needs, profiles, and ways to understand data. These end-users consist of data analysts, data stewards, data scientists, business analysts, and so much more. As more and more people are using and working with data, a data catalog must be smart for all end-users.

What Does a “Data Asset” Refer to?

An asset, financially speaking, typically appears in the balance sheet with an estimation of value. When referring to data assets, it is just as important, even more important in some cases, than other enterprise assets. The issue is that the value for data assets aren’t always known.

However, there are many ways to tap the value of your data. There is the possibility for enterprises to directly use their data’s value, like for example selling or trading their data. Many organizations do this; they clean the data, structure it, and then proceed to sell it.

Enterprises can also make value indirectly from their data. Data assets enable organizations to:

  • Innovate for new products/services.
  • Improve overall performance.
  • Improve product positioning.
  • Better understand markets/customers.
  • Increase operational efficiency.

High performing enterprises are those that master their data landscape and exploit their data assets in every aspect of their activity.

The Hard Things About Data Catalogs

When your enterprise deals with thousands of data, that usually means you are possibly dealing with:

  • 100s of systems that store internal data (data warehouses, applications, data lakes, datastores, APIs, etc) as well as external data from partners.
  • 1,000s of datasets, models, and visualizations (data assets) that are composed of thousands of fields.
  • And these fields contain millions of attributes (or metadata)!

Not to mention the hundreds of users using them…

This raises two different questions:

How can I build, maintain, and enforce the quality of my information for my end-users to trust in my catalog?

How can I quickly find data assets for specific use cases?

The answer is in smart data catalogs

We believe that are five core areas of “smartness” for a data catalog. It must be smart in its:

  • Design: The way users explore the catalog and consume information.
  • User Experience: How it adapts to different profiles.
  • Inventories: Provides a smart and automatic way of inventorying.
  • Search Engine: Supports the different expectations and gives smart suggestions.
  • Metadata management: A catalog that tags and links data together through ML features.

Let’s go into detail for each of these areas:

A Smart Design

Knowledge Graph

A data catalog with smart design uses knowledge graphs rather than static ontologies (a way to classify information, most of the time built as a hierarchy).  The problem with ontologies is that they are very hard to build and maintain, and usually only certain types of profiles truly understand the various classifications.

A knowledge graph on the other hand, is what represents different concepts in a data catalog and what links objects together through semantic or static links. The idea of a knowledge graph is to build a network of objects, and more importantly, create semantic or functional relationships between the different assets in your catalog.

Basically, a smart data catalog provides users with a way to find and understand related objects.

Adaptive Metamodels

In a data catalog, users will find hundreds of different properties, to which aren’t relevant to some users. Typically, two types of information are managed:

  • Entities: Plain objects, glossary entries, definitions, models, policies, descriptions, etc.
  • Properties: The attributes that you put on the entities (any additional information such as create date, last updated date, etc.)

The design of the metamodel must serve the data consumer. It needs to be adapted to new business cases and must be simple enough to manage for users to maintain and understand it. Bonus points if it is easy to create new types of objects and sets of attributes!

Semantic Attributes

Most of the time, in a data catalog, the metamodel’s attributes are technical properties. Some of the attributes on an object include generic types such as text, number, date, list of values, and so on. As this information is necessary to have, it is not completely sufficient because they do not have information on the semantics, or meaning. The reason this is important is because with this information, the catalog can adapt the visualization of the attribute and improve suggestions to users.

In conclusion, there is one size fits all to a data catalog’s design, and it must evolve in time to support new data areas and use cases.

A Smart User Experience

As stated above, a data catalog holds a lot of information and end-users often struggle to find the information of interest to them. Expectations differ between profiles. A data scientist will expect statistical information, whereas a compliance officer expects information on various regulatory policies.

With smart and adaptive user experience, a data catalog will present the most relevant information to specific end-users. Information hierarchy and adjusted search results in a smart data catalog is based on:

  • Static Preferences: Already known in the data catalog if the profile is more focused on data science, IT, etc.
  • Dynamic Profiling: To learn what the end-user usually searches, their interests, and how they’ve used the catalog in the past.

A Smart Inventory System

A data catalog’s adoption is built on trust and trust can only come if its content is accurate. As the data landscape moves at a fast pace, it must be connected to operational systems to maintain the first level of information on metadata on your data assets.

The catalog must synchronize its content with the actual content of the operational systems.

A catalog’s typical architecture is to have scanners that scan your operational systems and bring and synchronize information from various sources (Big Data, noSQL, Cloud, Data Warehouse, etc.). The idea is to have universal connectivity so enterprises can scan any type of system automatically and set them in the knowledge graph.

In the Actian Data Intelligence Platform, there is an automation layer to bring back the information from the systems to the catalog. It can:

  • Update assets to reflect physical changes.
  • Detect deleted or moved assets.
  • Resolve links between objects.
  • Apply rules to select the appropriate set of attributes and define attribute values.

 A Smart Search Engine

In a data catalog, the search engine is one of the most important features. We distinguish between two kinds of searches:

  • High Intent Search: The end-user already knows what they are looking for and has precise information on their query. They either already have the name of the dataset or already know where it is found. Low intent searches are commonly used by more data savvy people.
  • Low Intent Search: The end-user isn’t exactly sure what they are looking for, but want to discover what they could use for their context. Searches are made through keywords and users expect the most relevant results to appear.

 A smart data catalog must support both types of searches

It must also provide smart filtering. It is a necessary complement to the user’s search experience (especially low intent research), allowing them to narrow their search results by excluding attributes that aren’t relevant. Just like many big companies like Google, Booking.com, and Amazon, the filtering options must be adapted to the content of the search and the user’s profile in order for the most pertinent results to appear.

Smart Metadata Management

Smart metadata management is usually what we call the “augmented data catalog”, the catalog that has machine learning capabilities that will enable it to detect certain types of data, apply tags, or statistical rules on data.

A way to make metadata management smart is to apply data pattern recognition. Data pattern recognition refers to being able to identify similar assets and rely on statistical algorithms and ML capabilities that are derived from other pattern recognition systems.

This data pattern recognition system helps data stewards set their metadata:

  • Identify duplicates and copy metadata.
  • Detect logical data types (emails, city, addresses, and so on).
  • Suggest attribute values (recognize documentation patterns to apply to a similar object or a new one).
  • Suggest links – semantic or lineage links.
  • Detect potential errors to help improve the catalog’s quality and relevance.

It also helps data consumers find their assets. The idea is to use some techniques that are derived from content-based recommendations found in general-purpose catalogs. When the user has found something, the catalog will suggest alternatives based both on their profile and pattern recognition.

Start Your Data Catalog Journey

Actian Data Intelligence Platform is a 100% cloud-based solution, available anywhere in the world with just a few clicks. By choosing the Actian Data Intelligence Platform Data Catalog, control the costs associated with implementing and maintaining a data catalog while simplifying access for your teams.

The automatic feeding mechanisms, as well as the suggestion and correction algorithms, reduce the overall costs of a catalog, and guarantee your data teams with quality information in record time.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Platform

What to Do When Your Data Warehouse Chokes on Big Data

Actian Corporation

August 26, 2020

Data warehouse

This may seem like an academic question, but it is increasingly becoming a reality for modern businesses. What do you do when you have millions of records with infinite width and depth, and your data warehouse chokes? Do you trim your data? Do you add more infrastructure capacity? Or do you need to look at a better data warehouse solution?

This problem is akin to owning an old car that makes a bunch of noises, smells terrible, and has wheels that rattle when you drive down the road. What do you do about it? Drive slower (that’s annoying), open the windows for some fresh air, and turn up the radio to drown out the sounds? Do you get some new tires, an air freshener, and a louder radio to mask the issues? Or do you consider buying a new car? Nostalgia may be a valid reason to keep a classic car, but it isn’t a good reason to keep a data warehouse around that isn’t meeting your business needs. Your business is evolving, and you need a data warehouse platform that will give you agility and the ability to move faster, not slow you down.

Where is the Infinite Data Problem Coming From?

The digital transformation of business processes and the rapid adoption of modern connected technology is what is driving the infinite data challenge. Instead of having a business run on a few core platforms with well-structured data schemas and transactional data growth curves that are relatively flat, modern businesses are embracing a wide variety of specialized systems and things like IoT and mobile devices that produce seemingly endless streams of data. This “measure everything” culture, combined with an uptick in data update volume from transactional systems, leads to a data profile where there can be an infinite number of rows of data and a seemingly infinite set of column attributes that are collected. This problem is a sign of success – it means that your organization understands the value of data and is actively working to collect the most diverse and expansive information footprint it can. You don’t want your data warehouse system to get in the way of that.

Why is Your Data Warehouse Choking on Big Data?

Most data warehouses were designed for on-prem infrastructure hardware with fixed capacity and processing optimized for relational database schemas. This is what companies needed five years ago. Times have changed. Traditional data warehouses are choking because they aren’t architected for big-data analytics in real time. They aren’t deployed on flexible and scalable cloud infrastructures and configured for on-demand resource scaling, and they are trying to apply old-school scalar processing approaches to modern data structures. If you give the system enough time, it will get the job done, just not with the speed that most modern businesses demand.

A Modern Solution to The Big Data Problem

The Actian Data Platform is a modern solution to your big data problem. Designed for high-efficiency processing, deployed on scalable cloud infrastructure, and leveraging high-performance vectorized data processing, Actian can meet the big data challenges of today and give you plenty of growth room for the future. Yes, many other data warehouse solutions can be deployed in the cloud to give you access to the compute and storage capacity, but in a side by side comparison, Actian’s unique approach out-performs the next best option and through highly efficient hardware utilization that can deliver higher performance at a much lower cloud cost. To learn more about how the Actian Data Platform delivers superior performance and can cut your cloud data warehouse bill in half, check out this video.

To learn more about how the Actian Data Platform can help you address your business’s big data problems, visit www.actian.com/data-platform.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.