Data Intelligence

7 Lies of Data Catalogs #5: Not a Business Modeling Solution

Actian Corporation

July 9, 2021

a data catalog is not a business modeling solution

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Business Modeling Solution

Some organizations, usually large ones, have invested for years in the modeling of their business processes and information architecture.

They have developed several layers of models (conceptual, logical, physical) and have put in place an organization that helps the maintenance and sharing of these models with specific populations (business experts and IT people mostly).

We do not question the value of these models. They play a key role in the urbanization, the schema blueprints, the IS management, as well as regulatory compliance. But we seriously doubt that these modeling tools can provide a decent Data Catalog.

There is also a market phenomenon at play here: certain historical business modeling players are looking to widen the scope of their offer by positioning themselves on the Data Catalog market. After all, they do already manage a great deal of information on physical architecture, business classifications, glossaries, ontologies, information lineage, processes and roles, etc. But we can identify two major flaws in their approach.

The first is organic. By their nature, modeling tools produce top-down models to outline the information in an IS. However accurate it may be, a model remains a model: a simplified representation of reality.

They are very useful communication tools in a variety of domains, but they are not an exact reflection of the day-to-day operational reality which, for me, is crucial to keeping the promises of a Data Catalog (enabling teams to find data, understanding and knowing how to use the datasets).

The second flaw?: It is not user -friendly.

A modeling tool is complex and handles an important number of abstract concepts which require an important learning curve. It’s a tool for experts.

We could consider improving user friendliness of course to open it up to a wider audience. But the built-in complexity of the information won’t go away.

Understanding the information provided by these tools requires a solid understanding of modeling principles (object classes, logical levels, nomenclatures, etc). It is quite a challenge for data teams and a challenge that seems difficult to justify from an operational perspective.

The truth is, modeling tools that have been turned into Data Catalogs are faced with important adoption issues with the teams (they have to make huge efforts to learn how to use the tool, only to not find wha t they are looking for).

A prospective client recently presented us with a metamodel they had built and asked us whether it was possible to implement it in the Actian Data Intelligence Platform. Derived from their business models, the metamodel had several dozen classes of objects and thousands of attributes. To their question, the official answer was yes (the platform metamodel is very flexible). But instead, we tried to dissuade them from taking that path: A metamodel that sophisticated ran the risk, in our opinion, of losing the end users, and turning the Data Catalog project into a failure…

Should we Therefore Abandon Business Models When Putting a Data Catalog in Place? Absolutely Not.

It must, however, be remembered that business models are there to handle some issues, and the Data Catalog other issues. Some information contained within the models help structure the catalog and enrich its content in a very useful way (for instance responsibilities, classifications, and of course business glossaries).

The best approach is therefore, in our view, to conceive the catalog metamodel by focusing exclusively on the added value to the data teams (always with the same underlying question: does this information help find, localize, understand, and correctly use the data?), and then integrating the modeling tool and the Data Catalog in order to automate the supply of certain elements of the metamodel already present in the business model.

Take Away

 As useful and complete as they may be, business models are still just models: they are an imperfect reflection of the operational reality of the systems and therefore they struggle to provide a useful Data Catalog.

Modeling tools, as well as business models, are too complex and too abstract to be adopted by data teams. Our recommendation is that you define the metamodel of your catalog with a view to answering the questions of the data teams and supply some aspects of the metamodel with the business model.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #4: Not a Query Solution

Actian Corporation

July 2, 2021

a data catalog is not a query solution

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Query Solution

Here is another oddity of the Data Catalog market. Several vendors, whose initial aim was to allow users to query simultaneously several data sources, have “pivoted” towards a Data Catalog positioning on the market.

There is a reason for them to pivot.

The emergence of Data Lakes and Big Data have cornered them in a technological cul-de-sac that has weakened the market segment they were initially in.

A Data Lake is typically segmented into sever al layers. The “raw” layer integrates data without transformation, in formats that are more or less structured and in great quantities; A second layer, which we’ll call “clean”, will contain roughly the same data but in normalized formats, after a dust down. After that, there can be one or sever al “business” layers ready for use: A data warehouse and visualization tool for analytics, a Spark cluster for data science, a storage system for commercial distribution, etc. Within these layers, data is transformed, aggregated and optimized for use, along with the tools supporting this use (data visualization tools, notebooks, massive processing, etc).

In This Landscape, a Universal Self-Service Query Tool isn’t Suitable.

It is of course possible to set up an SQL interpretation layer on top of the “clean” layer (like Hive) but query execution remains a domain for specialists. The volumes of data are huge and rarely indexed.

Allowing users to define their own queries is very risky: On on-prem systems, they run the risk of collapsing the cluster by running a very expensive query. And on the Cloud, the bill could run very high indeed. Not to mention security and data sensitivity issues.

As for the “business” layers, they are generally coupled with more specialized solutions (such as a combination of Snowflake and Tableau for analytics) that offer very complete and secured tooling, offering great performance for self-service queries. With their market space shrinking like snow in the sun, some multi-source query vendors have pivoted towards Data Catalogs.

Their pitch is now to convince customers that the ability to execute queries makes their solution the Rolls-Royce of Data Catalogs (in order to justify their six-figure pricing). We would invite you to think twice about it.

Take Away

On a modern data architecture, the capacity to execute queries from a Data Catalog isn’t just unnecessary, it’s also very risky (performance, cost, security, etc.).

Data teams already have their own tools to execute queries on data, and if they haven’t, it may be a good idea to equip them. Integrating data access issues in the deployment of a catalog is the surest way to make it a long, costly, and disappointing project.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What is a Data Mesh?

Actian Corporation

June 28, 2021

what is a data mesh

In this new era of information, new terms are used in organizations working with data: Data Management Platform, Data Quality, Data Lake, Data Warehouse

Behind each of these words, we find specificities, technical solutions, etc. Let’s decipher.

Did you say: “Data Mesh”? Don’t be embarrassed if you’re not familiar with the concept. The term wasn’t used until 2019 as a response to the growing number of data sources and the need for business agility.

The Data Mesh model is based on the principle of a decentralized or distributed architecture exploiting a literal mesh of data.

While a Data Lake can be thought of as a storage space for raw data, and the Data Warehouse is designed as a platform for collecting and analyzing heterogeneous data, Data Mesh responds to a different use case.

On paper, a Data Warehouse and Data Mesh have a lot in common, especially when it comes to their main purpose, which is to provide permanent, real-time access to the most up-to-date information possible. But Data Mesh goes further. The freshness of the information is only one element of the system.

Because it is part of a distributed model, Data Mesh is designed to address each business line in your company with the key information that it concerns.

To meet this challenge, Data Mesh is based on the creation of data domains. 

The advantages? Your teams are more autonomous through local data management, a decentralization of your enterprise in order to aggregate more and more data, and finally, more control of the overall organization of your data assets.

Data Mesh: Between Logic and Organization

If a Data Lake is ultimately a single reservoir for all your data, Data Mesh is the opposite. Forget the monolithic dimension of a Data Lake. Data is a living, evolving asset, a tool for understanding your market and your ecosystem and an instrument of knowledge and understanding. 

Therefore, in order to appropriate the concept of meshing data, you need to think differently about data. How can we do this? By laying the foundations for a multi-domain organization. Each type of data has its own use, its own target, and its own exploitation. From then on, all the business areas of your company will have to base their actions and decisions on the data that is really useful to them to accomplish their missions. The data used by marketing is not the same as the data used by sales or your production teams. 

The implementation of a Data Catalog is therefore the essential prerequisite for the creation of a Data Mesh. Without a clear vision of your data’s governance, it will be difficult to initiate your company’s transformation. Data quality is also a central element. But ultimately, Data Mesh will help you by decentralizing the responsibility for data to the domain level and by delivering high-quality transformed data.

The Challenges

Does adopting Data Mesh seem impossible because the project seems both complex and technical? No cause for panic! Data Mesh, beyond its technicality, its requirements, and the rigor that goes with it, is above all a new paradigm. It must lead all the stakeholders in your organization to think of data as a product addressed to the business. 

In other words, by moving towards a Data Mesh model, the technical infrastructure of the data environment is centralized, while the operational management of the data is decentralized and entrusted to the business.

With Data Mesh, you create the conditions for an acculturation to data for all your teams so that each employee can base his or her daily action on data.

The Data Mesh Paradox

Data Mesh is meant to put data at the service of the business. This means that your teams must be able to access it easily, at any time, and to manipulate the data to make it the basis of their daily activities.

But in order to preserve the quality of your data, or to guarantee compliance with governance rules, change management is crucial and the definition of each person’s prerogatives is decisive. When deploying Data Mesh, you will have to lay a sound foundation in the organization. 

On the one hand, free access to data for each employee (what we call functional governance). On the other hand, management and administration, in other words, technical governance in the hands of the Data teams.

Decompartmentalizing uses by compartmentalizing roles, that’s the paradox of Data Mesh.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #3: Not a Compliance Solution

Actian Corporation

June 25, 2021

a data catalog is not a compliance solution

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Compliance Solution

As with governance, regulatory compliance is a crucial issue for any data-centric organization.

There is a plethora of data handling regulations spanning all sectors of activity and countries. On the subject of personal data alone, GDPR is mandatory across all EU countries, but each State has a lot of wiggle room on how its implemented, and most States have a large arsenal of legislation to complete, reinforce and adapt it (Germany alone for instance, has several dozen regulations across different sectors of activity related to personal data).

In the US, there are hundreds of laws and regulations across States and sectors of activity (with varying degrees of adherence). And here we are only referring to personal data…Rules and regulations also exist for financial data, medical data, biometric data, banking data, risk data, insurance data etc. Put simply, every organization has some regulation it has to be in compliance with.

So What Does Compliance Mean in this Case?

The vast majority of regulatory audits center on the following:

  • The ability to provide complete and up to date documentation on the procedures and controls put in place in order to meet the norms.
  • The ability to prove that the procedures described in the documentation are rolled out in the field.
  • The ability to supervise all the measures deployed with a view towards continuous improvement.

A Data Catalog is neither a procedures library, or an evidence consolidation system, and even less a process supervision solution.

It strikes us as obvious that assigning those responsibilities to a Data Catalog will make it considerably less simple to use (norms are too obscure for most people) and will jeopardize adoption for those most likely to benefit from it (data teams).

Should we Therefore Forget About Data Catalogs in our Quest for Compliance?

No, of course not. Again, in terms of compliance, it would be much wiser to use the Da ta Catalog for the literacy of the data teams. And to tag the data appropriately thus, enabling the teams to quickly identify any norm or procedure they need to adhere to before using the data. The Catalog can even help place the tags using a variety of approaches. It can for example automatically detect sensitive or personal data.

That said, even with the help of ML, detection will never work perfectly ( the notion of “personal data” defined by GDPR for instance, is much larger and harder to detect than North American PII). The Catalog’s ability to manage these tags is therefore critical.

Take Away

Regulatory compliance is above all a matter of documentation and proof and has no place in a Data Catalog.

However, the Data Catalog can help identify (more or less automatically) data that is subject to regulations. The Data Catalog plays a key role in the acculturation of the data teams with respect to the importance of regulations.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Lakes: The Benefits and Challenges

Actian Corporation

June 24, 2021

data lakes: the pros and cons

Data Lakes are increasingly used by companies for storing their enterprise data. However, storing large quantities of data in a variety of formats can lead to data chaos! Let’s take a look at the pros and cons of Data Lakes.

To understand what a Data Lake is, let’s imagine a reservoir or a water retention basin that runs alongside the road. Regardless of the type of data, its origin, its purpose, everything, absolutely everything, ends up in the Data Lake. Whether that data is raw or refined, cleansed or not, all of this information ends up in this single place where it isn’t modified, filtered, or deleted before being stored.

Sounds a bit messy, doesn’t it? But that’s the whole point of the Data Lake.

It’s because it frees the data from any preconceived idea that a Data Lake offers real added value. How? By allowing data teams to constantly reinvent the use and exploitation of your company’s data.

Improvement of customer experience with a 360° analysis of the customer journey, detection of personas to refine marketing strategies, and rapid integration of new data flows from IoT, in particular, the Data Lake is an agile response to very structured problems for companies.

Data Lakes: The Undeniable Advantages

The first advantage of a Data Lake is that it allows you to store considerable volumes of protean data. Structured or unstructured, data from NoSQL databases…a Data Lake is, by nature, agnostic to the type of information it contains. It is precisely because it has no strict data exploitation scheme that the Data Lake is a valuable tool. And for good reason, none of the data it contains is ever altered, degraded, or distorted.

This is not the only advantage of a Data Lake. Indeed, since the data is raw, it can be analyzed on an ad-hoc basis.

The objective: to detect trends and generate reports according to business needs without it being a vast project involving another platform or another data repository. 

Thus, the data available in the Data Lake can be easily exploited, in real time, and allows you to place your company in a data centric scheme so that your decisions, your choices, and your strategies are never disconnected from the reality of your market or your activities.

Nevertheless, the raw data stored in your Data Lake can (and should!) be processed in a specific way, as part of a larger, more structured project. But your company’s data teams will know that they have, within reach of a click, an unrefined ore that can be put to use for further analysis.

The Challenges a Data Lake

When you think of a Data Lake, poetic mental images come to mind. Crystalline waves waving in the wind of success that carries you away…but beware! A Data Lake carries the seeds of murky, muddy waters. This receptacle of data must be the object of particular attention because without rigorous governance, the risk of sinking into a “chaos of data” is real.

In order for your Data Lake to reveal its full potential, you must have a clear and standardized vision of your data sources.

The control of these flows is a first essential safeguard to guarantee the good exploitation of data by heterogeneous nature. You must also be very vigilant about data security and the organization of your data.

The fact that the data in a Data Lake is raw does not mean that it should not have a minimum structure to allow you to at least identify and find the data you want to exploit.

Finally, a Data Lake often requires significant computing power in order to refine masses of raw data in a very short time. This power must be adapted to the volume of data that will be hosted in the Data Lake.

Between method, rigor and organization, a Data Lake is a tool that serves your strategic decisions.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #2: Not a Quality Solution

Actian Corporation

June 21, 2021

data quality

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted several players from adjacent markets.

 These players have rejigged their marketing positioning in order to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Data Quality Management (DQM) Solution

Do not underestimate the importance of data quality in successfully delivering a data project, quite the contrary. It just seems absurd to me to put this in the hands of a solution, which by its very nature, cannot achieve the controls at the right time.

Let us explain: There is a very elementary rule to quality control, a rule that can be applied virtually in any domain where quality is an issue, be it an industrial production chain, software development, or the cuisine of a 5-star restaurant: The sooner the problem is detected, the less it will cost to correct.

To demonstrate the point, a car manufacturer is unlikely to refrain from testing the battery of a new vehicle until after its built and all the production costs have already been incurred and solving a defect would cost the most. No. Each piece is closely controlled, each step of the production is tested, defective pieces are removed before ever being integrated in the production circuit, and the entire chain of production can be halted if quality issues are detected at any stage. The quality issues are corrected at the earliest possible state of the production process where they are the least costly and the most durable.

“In a modern data organization, data production rests on the same principles. We are dealing with an assembly chain whose aim is to provide usage with high added value. Quality control and correction must happen at each step. The nature and level of controls will depend on what the data is used for.”

If you are handling data, you obviously have at your disposal pipelines to feed your uses. These pipelines can involve dozens of steps – data acquisition, data cleaning, various transformations, mixing various data sources, etc.

In order to develop these pipelines, you probably have a number of technologies at play, anything from in-house scripts to costly ETLs and exotic middleware tools. It’s within those pipelines that you need to insert and pilot your quality control, as early as possible, by adapting them to what is at stake with the end product. Only measuring data quality levels at the end of the chain isn’t just absurd, it’s totally inefficient.

It is therefore difficult to see how a Data Catalog (whose purpose is to inventory and document all potentially usable datasets in order to facilitate data discovery and usage) can be a useful tool to measure and manage quality.

A Data Catalog operates on available datasets, on any systems that contain data, and should be as least invasive as possible in order to be deployed quickly throughout the organization.

A DQM solution works on the data feed (the pipelines), focuses on production data and is, by design, intrusive and time consuming to deploy. I cannot think of any software architecture that can tackle both issues without compromising the quality of either one.

Data Catalog vendors promising to solve your data quality issues are, in our opinion, in a bind and it seems unlikely they can go beyond a “salesy” demo.

As for DQM vendors (who also often sell ETLs), their solutions are often too complex and costly to deploy as credible Data Catalogs.

The good news is that the orthogonal nature of data quality and data cataloging makes it easy for specialized solutions in each domain to coexist without encroaching on each other’s lane.

Indeed, while a data catalog isn’t purposed for quality control, it can exploit the information on the quality of the datasets it contains which obviously provides many benefits.

The Data Catalog uses this metadata for example to share the information (and possible alerts it may identify) with the data consumers. The catalog can benefit from this information to adjust his search and recommendation engine and thus, orientate other users towards higher quality datasets.

And both solutions can be integrated at little cost with a couple of APIs here and there.

Take Away

Data quality needs to be assessed as early as possible in the pipeline feeds.

The role of the Data Catalog is not to do quality control but to share as much as possible the results of these controls. By their natures, Data Catalogs are bad DQM solutions, and DQM solutions are mediocre and overly complex Data Catalogs.

An integration between a DQM solution and a Data Catalog is very straightforward and is the most pragmatic approach.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Events

Hybrid Data Conference Recap and Highlights

Traci Curran

June 17, 2021

hybrid data conference banner

That’s a Wrap!

Wow! What a wonderful time we had at the 2021 Hybrid Data Conference! Over two days, we showcased amazing demos, customer stories and technology advancements across the Actian portfolio. For those in attendance, we hope you enjoyed the event and the opportunity to see a few of the ways Actian is innovating and enabling our customers to gain greater value from their data at a fraction of the time and cost of other cloud data platforms.

For those who missed the event, here’s a quick recap of some of our most popular sessions.

Some of Our Favorite Sessions from the 2021 Hybrid Data Conference

Delivering on the Vision – Actian Hybrid Data Platform, presented by Emma McGrattan, Actian VP of Engineering

Emma McGrattan, Actian’s VP of Engineering gave an in-depth overview of how Actian products are delivering on the vision of hybrid cloud. Highlighting the Actian Data Platform, Emma showcased how Actian’s product portfolio is accelerating cloud adoption and changing the way customers advance along their cloud journey. If you’re looking to make the shift left right away or modernize and preserve investments in critical applications, this session is a great overview of many options and use cases to support your unique path to the cloud.

Actian on Google Cloud, Presented by Lak Lakshmanan, Google’s Director of Analytics

This brief 15 minute session presented by Lak Lakshmanan, Google’s Director of Analytics and AI Solutions, is a great intro in why Actian has chosen Google as our preferred cloud. We all love a better together story, but Lak shows provides a glimpse from the cloud provider perspective.

Of course, no conference would be complete without perspectives from our customers. Actian would like to thank all of the customers and partners that made the 2021 Hybrid Data Conference a success.

Actian Customer Panel Featuring Key Customer Speakers from Sabre, Finastra, and Goldstar Software

One Final Highlight

Greg Williams from Wired Mag Image

We were delighted to have Greg Williams, Editor-in-Chief for Wired deliver his thoughts on why data-driven insights are no longer optional in today’s modern world. Greg summarized it best in his presentation – every company is a data company.

Please visit the on-demand conference to hear more of his outstanding commentary on the future of data and how companies are creating advantage in a global economy.

Once again, we want to thank everyone that attended this year’s Hybrid Data Conference. We hope you found the networking and content valuable and we can’t wait to see you in 2022 – hopefully in person! Stay safe, and enjoy your summer!

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Intelligence

7 Lies of Data Catalog Providers #1: Not a Data Governance Solution

Actian Corporation

June 16, 2021

a data catalog is not a governance solution

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Data Governance Solution

This is probably our most controversial stance on the role of a Data Catalog and the controversy originates with the powerful marketing messages pumped out from the world leader in metadata management whose solution is in reality a data governance platform being sold as a Data Catalog.

To be clear, having sound data governance is one of the pillars of an effective data strategy. Governance, however, has little to do with tooling.

Its main purpose is the definition of roles, responsibilities, company policies, procedures, controls, committees. In a nutshell, its function is to deploy and orchestrate, in its entirety, the internal control of data in all its dimensions.

Let’s just acknowledge that data governance has many different aspects (processing and storage architecture, classification, retention, quality, risk, conformity, innovation, etc.) and that there aren’t any universal “one-size fits all” model adapted for all organizations. Like other governance domains, each organization must conceive and pilot its own landscape based on its capacities and ambitions, as well as thorough risk analysis.

Putting in place an effective data governance is not a project, but rather it is a transformation program.

No commercial “solution” can replace that transformation effort.

So Where Does the Data Catalog fit into All This?

The quest for a Data Catalog is usually the result of a very operational requirement: Once the Data Lake and a number of self-service tools are set up, the next challenge quickly becomes to find out what the Data Lake actually contains (both from a technical and a semantic perspective), where the data comes from, what transformations the data may have incurred, who is in charge of the data, what internal policies apply to the data, who is currently using the data and why etc.

An inability to provide this type of information to the end-user can have serious consequences to an organization, and a Data Catalog is the best means to mitigate that risk. When dealing with the selection of a transverse solution, involving people from many different departments, the selection of the solution is often given to those in charge of data governance, as they appear to be in the best position to coordinate the expectations of the largest number of stakeholders.

This is where the alchemy begins. The Data Catalog, whose initial purpose was to provide data teams with a quick solution to discover, explore, understand, and exploit the data, becomes a gargantuan project in which all aspects of governance have to be solved.

The project will be expected to:

  • Manage data quality.
  • Manage personal data and compliance (GDPR first and foremost).
  • Manage confidentiality, security, and data access.
  • Propose a new Master Data Management (MDM).
  • Ensure a field by field automated lineage for all datasets.
  • Support all the roles as defined in the system of governance and enable the relevant workflow configuration.
  • Integrate all the business models produced in the last 10 years for the urbanization program.
  • Authorize crossed querying on the data sources while complying with user habilitation on those same sources, as well as anonymizing the results.

Certain vendors manage to convince their client that their solution can be this unique one-stop-shop to data governance. If you believe this is possible, by all means call them, they will gladly oblige. But to be frank, we simply do not believe such a platform is possible, or even desirable. Too complex, too rigid, too expensive and too bureaucratic, this kind of solution can never be adapted to a data-centric organization.

For us, the Data Catalog plays a key role in a data governance program. This role should not involve supporting all aspects of governance but should rather be utilized to facilitate communication and awareness of governance rules within the company and to help each stakeholder become an active part of this governance.

In our opinion, a Data Catalog is one of the components that delivers the biggest return on investment in data-centric organizations that rely on Data Lakes with modern data pipelines…provided it can be deployed quickly and has a reasonable pricing associated with it.

Take Away

A Data Catalog is not a data governance management platform.

Data governance is essentially a transformation program with multiple layers that cannot be addressed by one single solution. In a data-centric organization, the best way to start, learn, educate, and remain agile is to blend clear governance guidelines with a modern Data Catalog that can share those guidelines with the end users.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Governance Framework | S03-E02 – Start in Under 6 Weeks

Actian Corporation

June 9, 2021

This is the last episode of our third and final season of “The Effective Data Governance Framework”.

Divided into two episodes, this final season will focus on the implementation of metadata management with a data catalog.

In this final episode, we will help you start a 3-6 week data journey and then deliver the first iteration of your Data Catalog.

Season 1: Alignment

Evaluate your Data maturity

Specify your Data strategy

Getting sponsors

Build a SWOT analysis

Season 2: Adapting

Organize your Data Office

Organize your Data Community

Creating Data Awareness

Season 3: Implementing Metadata Management with a Data Catalog

The importance of metadata

6 weeks to start your data governance journey

Metadata Governance Iterations

We are using an iterative approach based on short cycles (6 to 12 weeks at most) to progressively deploy and extend the metadata management initiative in the Data Catalog.

These short cycles make it possible to quickly obtain value. They also provide an opportunity to communicate regularly via the Data Community on each initiative and its associated benefits.

Each cycle is organized in predetermined steps, as follows:

1. Identify the Goal

A perimeter (data, people), a target.

2. Deploy / Connect

Technical configuration of scanners and ability to harvest the information.

Scanners deployed and operational.

3. Conceive and Configure

A metamodel tailored to meet expectations.

4. Import the Items

Define the core (minimum viable) information to properly serve the users.

5. Open and Test

Validate if the effort produced the expected value.

6. Measure the Gains

Fine grained analysis of the cycle to identify what worked, what didn’t and how to improve the next cycle.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Strategy: How to Break Down Data Silos

Actian Corporation

June 8, 2021

data-silos

Whether it comes from Product life cycles, marketing, or customer relations, data is omnipresent in the daily life of a company. Customers, suppliers, employees, partners… they all collect, analyze and exploit data in their own way.

The risk: The appearance of silos. Let’s discover why your data is siloed and how to put an end to it.

A company is made up of different professions that coordinate their actions to impose themselves on their market and generate profit. Each of these professions fulfill specific missions and collect data. Marketing, sales, customer success teams, communication…all of these entities act on a daily basis and base their actions on their own data.

The problem is that, over the course of their career, a customer will generate a certain amount of information.

A simple lead then becomes a prospect, who then becomes a customer. The same person may have different taxonomies based on which part of the business is analyzing this data.

This reality is what we call a data silo. In other words, data is poorly or never shared and therefore too often untapped. 

In a study by IDC entitled “The Data-Forward Enterprise” published in December 2020, 46% of French companies forecast a 40% annual growth in the volume of data to be processed over the next two years.

Nearly 8 out of 10 companies consider data governance to be essential. However, only 11% of them believe they are getting the most out of their data. The most common reason for this is data silos.

What are the Major Consequences of Data Silos?

Among the frequent problems linked to data silos, we find first and foremost the problem of duplicated data. Since data is used blindly by the business, what could be more natural?

These duplicates have unfortunate consequences. They distort the knowledge you can have of your products or your customers. This biased, imperfect information often leads to imprecise or even erroneous decisions.

Duplicated data also take up unnecessary space on your servers. Storage space that represents an additional cost for your company! Beyond the impact of data silos on your company’s decisions, strategies, or finances, there is also the organizational deficit.

When your data is in silos, your teams can’t collaborate effectively because they don’t know if they’re mining the same soil.

At a time where collective intelligence is a cardinal value, this is undoubtedly the most harmful event caused by data silos.

Does Your Company Suffer From Data Silos?

There are many causes for siloed data. Most often, they are associated with the history of your information systems. Over the years, these systems were built as a patchwork for business applications that were not always designed with interoperability in mind.

Moreover, a company is like a living organism. It welcomes new employees when others leave. In everyday life, spreading data culture throughout the workforce is a challenge! Finally, there is the place of data in the key processes of organizations.

Today data is central. But when you go back 5 to 10 years ago, it was much less so. Now that you know that you are suffering from data silos, you need to take action. 

How do you get rid of Data Silos?

To get started on the road to eradicating data silos, you need to proceed methodically.

Start by recognizing that the process will inevitably take some time. The prerequisite is a creating a detailed mapping of all your databases and information systems. These can be produced by different tools and solutions such as emails, CRMs, various spreadsheets, financial documents, customer invoices, etc.

It is also necessary to start by identifying all your data sources in order to centralize them in a unique repository. To do this, you can for example create gaps between the silos by using specific connectors, also called APIs. The second option is to implement a platform on your information system that will centralize all the data.

Working as a data aggregator, this platform will also consolidate data by tracking duplicates and keeping the most recent information. A Data Catalog Solution will prevent the reappearance of data silos once deployed.

But beware, data quality, optimized circulation between departments, and coordinated use of data to improve performance is also a human project.

Sharing best practices, training, raising awareness – in a word, creating a data culture within the company – will be the key to eradicating data silos once and for all.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Essential Keys to a Successful Cloud Migration

Actian Corporation

June 8, 2021

data transfer to cloud technology data storage,futuristic of data transfer,online data storage technology. vector illustration

The recent COVID-19 pandemic has brought about major changes in the work culture, and the Cloud is becoming an essential part of that culture by offering employees access to the company’s data, wherever they are. But why migrate? How do you migrate? And for what benefits? Here is an overview:

Head in the clouds and feet on the ground, that’s the promise of the Cloud, which has proven to be an essential tool for business continuity during the health crisis.

In a study conducted by Vanson Bourne at the end of 2020, it appears that more than 8 out of 10 business leaders (82%), accelerated their decision to migrate their critical data and business functions to the Cloud, after facing the COVID-19 crisis. 91% of survey participants say they have become more aware of the importance of data in the decision-making process since the crisis began.

Cloud and data. A duo that is now inseparable from business performance.

A reality that is not limited to a specific market. The plebiscite for Cloud data migration is almost worldwide. The Vanson Bourne study highlights a shared awareness on an international scale, with edifying figures:

  • United States (97%)
  • Germany and Japan (93%)
  • United Kingdom (92%)

Finally, 99% of Chinese executives are accelerating their plans to complete their migration to the Cloud. In this context, the question “Why migrate to the Cloud” is unequivocally answered: if you don’t, your competitors will do it before you and will definitely beat you to it.

The Main Benefits of Cloud Migration

Ensuring successful Cloud data migration is first and foremost a question of guaranteeing its availability in all circumstances. Once stated, this benefit leads to many others. If data is accessible everywhere and at all times, a company is able to meet the demand for mobility and flexibility expressed by employees.

A requirement that was fulfilled during the successive confinements and that should continue as the return to normalcy seems finally possible. Fully operational employees at home, in the office or in the countryside, not only promise increased productivity but also a considerable improvement in the user experience. HR benefits are not the only consequences of Cloud migration.

From a financial point of view, the Cloud opens the way to a better control of IT costs. By shifting data from a CAPEX dimension to an OPEX dimension, you can improve the TCO (Total Cost of Ownership) of your information system and your data assets. Better experience, budget control, the Cloud opens the way to optimized data availability.

Indeed, when migrating to the Cloud, your partners make commitments in terms of maintenance or backups that guarantee maximum access to your data. You should therefore pay particular attention to these commitments, which are referred to as SLAs (Service Level Agreements).

Finally, by migrating data to the cloud, you benefit from the expertise and technical resources of specialized partners who deploy resources that are far superior to those that you could have on your own.

How to Successfully Migrate to the Cloud

Data is, After Human Resources, the Most Valuable Asset of a Company

This is one of the reasons why companies should migrate to the Cloud. But the operation must be carried out in the best conditions to limit the risk of data degradation, as well as the temporary unavailability that impacts your business.

To do this, preparation is essential and relies on one prerequisite: the project does not only concern IT teams, but the entire company. 

Support, reassurance, training: the triptych that is essential to any change management process must be applied. Then make sure you give yourself time. Avoid the Big Bang mode, which could irritate your teams and dampen their enthusiasm. Even if the Cloud migration of your data should go smoothly, put all the chances on your side by making backups of your data.

Rely on redundancy to prepare for any eventuality, including (and especially!) the most unlikely. Once the deployment on the cloud is complete, ensure the quality of the experience for your employees. By conducting rigorous long-term project management, you can easily identify if you need to make adjustments to your initial choices.

The scalability of the Cloud model is a strength that you should seize upon to constantly adapt your strategy.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Governance Framework | S03-E01 – Importance of Metadata

Actian Corporation

June 2, 2021

zeenea effective data governance season 3 episode 1

This is the first episode of our third and final season of “The Effective Data Governance Framework”.

Divided into two episodes, this final season will focus on the implementation of metadata management with a data catalog.

For this first episode, we will give you the right questions to ask yourself to build a metamodel for your metadata.

Season 1: Alignment

Evaluate your Data maturity

Specify your Data strategy

Getting sponsors

Build a SWOT analysis

Season 2: Adapting

Organize your Data Office

Organize your Data Community

Creating Data Awareness

Season 3: Implementing Metadata Management with a Data Catalog

The importance of metadata

6 weeks to start your data governance journey

In our previous Season, we explained gave you our tips on how to build your Data Office, organize your Data Community, and build your Data Awareness.

In this third season, you will step into the real world of implementing a Data Catalog where Seasons 1 and 2 helped you to specify your Data Journey Strategy.

In this episode, you will learn how to ask the right questions for designing your Metamodel.

The Importance of Metadata

Metadata management is an emerging discipline and is necessary for enterprises wishing to bolster innovation or regulatory compliance initiatives on their data assets.

Many companies are therefore trying to establish their convictions on the subject and brainstorm solutions to meet this new challenge. As a result, metadata is increasingly being managed, alongside data, in a partitioned and siloed way that does not allow the full, enterprise-wide potential of this discipline.

Before beginning your data governance implementation, you will have to cover different aspects, ask yourself the right questions and figure out how to answer them.

Our Metamodel Template is a way to identify the main aspects when it comes to data governance by asking the right questions and in each case, you decide on its relevance.

These questions can also be used as support for your data documentation model and can provide useful elements to data leaders.

The Who

  • Who created this data?
  • Who is responsible for this data?
  • Who does this data belong to?
  • Who uses this data?
  • Who controls or audits this data?
  • Who is accountable on the quality of this data?
  • Who gives access to this data?

The What

  • What is the “business” definition for this data?
  • What are the associated business rules of this data?
  • What is the security/confidentiality level of this data?
  • What are the acronyms or aliases associated with this data?
  • What are the security/confidentiality rules associated with this data?
  • What is the reliability level (quality, velocity, etc.) of this data?
  • What are the authorized contexts of use (related to confidentiality for example)?
  • What are the (technical) contexts of use possible (or not) for this data?
  • Is this data considered a “Golden Source”?

The Where

  • Where is this data located?
  • Where does this data come from? (a partner, open data, internally, etc.)
  • Where is this data used/shared?
  • Where is this data saved?

The Why

  • Why are we storing this data? (rather than treating its flow)?
  • What is this data’s current purpose/usage?
  • What are the possible usages for this data? (in the future)

The When

  • When was the data created?
  • When was this data last updated?
  • What is this data’s life cycle? (update frequency)?
  • How long are we stocking this data for?
  • When does this data need to be deleted?

The How

  • How is this data structured? (diagram)?
  • How do your systems consume this data?
  • How do you access this data?

Start Defining Your Metamodel Template

These questions can serve as a foundation for building your data documentation model and providing data consumers with the elements that are useful to them.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.