Data Intelligence

Everything You Need to Know About a Data Fabric

Actian Corporation

April 13, 2022

Data Fabric

As early as 2019, Gartner identified the concept of a Data Fabric as a major technological trend for 2022. Behind this buzzword lies an important objective: to maximize the value of your data and accelerate your digital transformation. You can just find out how by following this guide.

Bringing order to your data is the promise of a Data Fabric. However, it is not merely a solution for organizing or structuring information. A Data Fabric is a tool designed to give value to your data. Indeed, the volume of data generated by companies is growing exponentially. Every second, there is more and more data to exploit that enables organizations to be more efficient and more in tune with their market or with their customers. The figures speak for themselves: IDC estimates that by 2025, the volume of data generated globally will reach 175 zettabytes. A volume that is so large that, if stored on Blu-ray, it would represent a stack of discs 23 times the distance from the Earth to the Moon.

What is Data Fabric?

Gartner defines Data Fabric as “a design concept that acts as an integrated layer of data and connection processes.” In other words, a Data Fabric continuously analyzes combinations of existing, accessible, and inferred metadata assets to provide smarter information and support data management tasks more efficiently. A Data Fabric then uses all of this metadata analysis to design new processes and establish standardized access to data for all business profiles within the enterprise: application developers, analysts, data scientists, etc.

A Data Fabric is, therefore, a series of processes that read, capture, integrate, and deliver data based on the understanding of who is using the data, the classification of usage types, and the monitoring of changes in data usage patterns.

The Benefits of a Data Fabric for Enterprises

Gartner explains that by 2024, the deployment of Data Fabrics within organizations will quadruple the efficiency of data exploitation while reducing by half the data management tasks performed by humans. In this sense, the institute identifies three main areas of opportunity brought by a Data Fabric:

  1. A 70% reduction in data discovery, analysis and integration tasks for data teams;
  2. The increase in the number of data users, by reusing data for a greater number of use cases;
  3. The ability to get more out of more data by significantly accelerating the introduction and exploitation of secondary and third-party data.

From a technological standpoint, a Data Fabric adapts to the tools already in place within an organization. It can evolve from existing integration and quality tools, data management, and governance platforms (such as a data catalog, for example – we’ll come back to this). In this sense, its design model is ideal since it uses your existing technology while pursuing a strategic shift in your overall data management.

Finally, a Data Fabric helps companies break down data silos. You can then reduce the cost and effort of your data teams who must constantly merge, recast, and redeploy data management silos with new silos.

The Contribution of a Data Catalog to a Data Fabric

If we take the notion of “integrated layer” from the definition of a Data Fabric as well as the diagram proposed by Gartner (below) as a guide, we observe that a data catalog plays a fundamental part in the constitution of a Data Fabric. Indeed, it influences the higher layers that form an efficient Data Fabric.

Layer 1 – Access to all Types of Metadata

A data catalog is the foundation of a Data Fabric structure – it is the first (gray) layer. It supports the identification, collection, and analysis of all data sources and all types of metadata. The data catalog is a starting point for a Data Fabric.

Layer 2 – Metadata Enablement and the Knowledge Graph

In the second layer of a Data Fabric (yellow), Garner focuses on metadata activation. This activation involves the continuous analysis of metadata to calculate key indicators. This analysis is facilitated by the use of artificial intelligence (AI), machine learning (ML), and automated data integration.

The patterns and connections detected are then fed back into the data catalog and other data management tools to make recommendations to the people and machines involved in data management and integration. This requires continuous analysis from a connected knowledge graph – the means to create and visualize existing relationships between data assets of different types, to make business sense of them, and to make this set of relationships easy to discover and navigate by all users in the organization.

Layer 3 – Dynamic Data Integration

Gartner’s third layer (blue) primarily addresses the technical consumers of data in organizations. This layer of the Data Fabric refers to the need to prepare, integrate, explore, and transform data. The challenge here is to make data assets from a wide range of tools accessible to a wide range of business users. The keywords here are flexibility and compatibility to break down data silos, with the following features:

    • A data permissions set management system: the Data Fabric must automate access by the user.
    • Automated provisioning: Anyone in the organization should be able to request access to a dataset from the Data Fabric – via ticket creation with built-in data governance capabilities.
    • A data exploration tool: The Data Fabric should allow users to explore data (not just metadata) without having to leave the fabric.

Automated data orchestration – as described in the top part of this third layer of the diagram – refers to DataOps. It is a collaborative data management practice aimed at improving the communication, integration, and automation of data flows between data managers and data consumers within an organization. You can read more about it in this article.

Is There a Single Tool for Implementing a Data Fabric?

As Gartner points out, there is no single tool that supports all layers of the fabric in a comprehensive manner. In this sense, no single vendor is able to offer a data structure that can be equated to a complete Data Fabric. The solution lies in the interaction between the different layers. An open platform is the key, and companies need to equip themselves with the best, interconnected data tools to achieve a Data Fabric worthy of its name. Building a Data Fabric should be viewed as a marathon, not a sprint, and approached in stages – the data catalog being the first.

Building a Data Fabric

The companies that have adopted our Smart Data Catalog have already laid the foundations of their Data Fabric. Indeed, in addition to the identification, collection, and analysis of all data sources as well as all types of metadata (first layer), Actian Data Intelligence Platform offers all the features necessary for the activation of metadata via its core – via a knowledge graph (second layer). Finally, our catalog addresses the third layer: on the one hand, via the integration of data governance rules; on the other hand, via the Actian Explorer application which acts as a true data marketplace so that each business user can easily access the key datasets that interest them thus, quickly creating value from the available data.

To learn more about our Smart Data Catalog, please consult our two eBooks below or contact us:

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Why a Privacy by Design Approach Works for Data Catalogs

Actian Corporation

April 11, 2022

Privacy by Design

Since the beginning of the 21st century, we’ve been experiencing a true digital revolution. The world is constantly being digitized and human activity is increasingly structured around data and network services. The manufacturing, leisure, administration, service, medical, and so many other industries are now organized around complex and interconnected information systems. As a result, more and more data is continuously collected by the devices and technologies present in our daily lives (Web, Smartphone, IoT) and transited from system to system. It has become central for any company that provides products or services to do everything possible to protect the data of their customers. The best approach to do so is through Privacy by Design.

In this article, we explain what Privacy by Design is, how we applied this approach in the design of our data catalog, as well as how a data catalog can help companies implement Privacy by Design.

Data Protection: A Key Issue for Enterprises

Among all the various data mentioned above, some allow the direct or indirect identification of physical persons. These are known as personal data, as defined by the CNIL. It is of paramount importance in the modern world because of its intrinsic value.

On a daily basis, huge volumes of personal data pass between individuals, companies, and governments. There is a real risk of their misuse, as the Cambridge Analytica scandal in 2015 showed, for example. Cybercriminals can also make substantial gains from it, via account hacking, reselling data to other cybercriminals, identity theft, or attacking companies via phishing or president scams. For example, a real estate developer was recently robbed of several tens of millions of euros in France.

The need to protect data has never been so important.

States have quickly become aware of this issue to protect individuals from the abuses related to the exploitation of their data. In Europe, for example, the GDPR (the General Data Protection Regulation) has been in effect since 2016 and is already well established in the daily activities of companies. In the rest of the world, regulations are constantly evolving and are a concern for nearly every country. Recently, California passed a consumer data privacy law, a U.S. equivalent of the GDPR. Even China has just legislated on this topic.

Privacy by Design: Defining a Key Concept for Data Protection

While many legislations rely heavily on the notion of Privacy by Design, it was conceptualized by Ann Cavoukian in the late 1990s when she was the Information and Privacy Commissioner of the Province of Ontario in Canada. The essence of this idea is to include the issue of personal data protection right from the design of a computer system.

In this sense, Privacy by Design lists seven fundamental principles:

Proactivity: Any company must put in place the necessary provisions for data protection upstream, and must not rely on a reactive policy;

Personal data protection as a default setting: Any system must take as a default setting the highest possible level of protection for the sensitive data of its users;

Privacy by design: Privacy should be a systematically studied and considered aspect of the design and implementation of new functionality;

Full functionality: No compromise should be made with security protocols or with the user experience;

End-to-end security: The system must ensure the security of data throughout its lifecycle, from collection to destruction (including if the data is outsourced);

Visibility and transparency: The system and the company must document and communicate personal data protection procedures and actions taken in a clear, consistent and transparent manner;

Respect for user privacy: Every design and implementation decision must be made with the user’s interest at the center.

The Application of Privacy by Design

We’ve built our product on the foundations of Privacy by Design.

The Treatment of Users’ Personal Data

First of all, we have anchored data protection at the heart of our architecture. Each of our customer’s data is segregated into different tenants, each encrypted with their own key. User authentication is managed through a specialized third-party system. We encourage identity federation among our customers, which allows them to maintain control over the data needed for user identification and authentication.

We have also included the concept of Privacy by Design in the design of our application. For example, we collect only the bare minimum of information, all system outputs are anonymized (logs, application errors, APIs).

Processing Customer Business Data

Our main mission being to document the data, our solution contains by essence metadata. By design, the Actian Data Intelligence Platform does not extract any data from our customers’ systems. Indeed, the risk is intrinsically less on the metadata than on the data.

Nevertheless, we offer within the platform, several features allowing us to provide information on the data present in the client systems (statistics, sampling, etc.). Because of our architecture, the calculations are always done on the client’s infrastructure, as close as possible to the data and its security. And in compliance with principle #2 of Privacy by Design, we have set the protection of personal data as a default setting. Thus, all these features are disabled by default and can only be activated by the customer.

How Our Data Catalog Helps Companies Implement Privacy by Design

Our data catalog can help your company implement Privacy by Design, especially on the control and verification aspects. Taking the 7 principles described earlier, the data catalog can effectively participate in two of them: the visibility and transparency principle, and the end-to-end security principle. The data catalog also enables the automation of the identification of sensitive data.

Visibility and Transparency via the Data Catalog

The objective of a data catalog is to centralize a company’s data assets, document them, and share them with as many people as possible. This centralization allows each employee to know what data is collected by the CRM, and the marketing and customer success teams to process this information in the acquisition and churn tracking reports.

Once this inventory has been established, the catalog can be used to document certain additional information that is necessary for the company’s proper functioning. This is notably the case for the sensitive or non-sensitive nature of the documented information, the rules of governance, the processing, or the access procedures that must be applied.

In the context of a Privacy by Design approach, the data catalog can be used to add a business term corresponding to sensitive data (a social security number, a telephone number, etc.). This business term can then be easily associated with the tables or physical fields that contain the data, thus allowing its easy identification. This initiative contributes to the principle of visibility and transparency of Privacy by Design.

End-to-End Security via the Data Catalog

The data catalog also provides data lineage capabilities. Automatic data lineage ensures that the processes applied to data identified as sensitive comply with what is defined by the company’s data governance. It is then simple with the data catalog to fill in the governance rules to be applied to sensitive data.

Moreover, the lineage allows us to follow the whole life cycle of the data, from its creation to its final use, including its transformations. This makes it easy to check that all the stages of this life cycle comply with the rules and correct any errors.

The data catalog, via the data lineage, thus contributes to the principle of the end-to-end security of Privacy by Design.

With that said, we remain convinced that a data catalog is not a compliance solution, but rather a tool for the acculturation of teams to sensitive data and its particularities of use.

Identifying Sensitive Data via the Data Catalog

In a rapidly changing data environment, the data catalog must reflect reality as much as possible in order to maintain the trust of its users. Without this, the entire adoption of the data catalog project is put into question.

We are firmly convinced that the data catalog must be automated as much as possible to be scalable and efficient. This starts with the inventory of available data. In this sense, our inventory is automated and is responsible for passing on all modifications to the original system (source) of the data directly into the catalog. Thus, at any time, the customer has an exhaustive list of the data present in its systems.

And to help our customers identify which of the inventoried data deserve special treatment because of their sensitive data status, the automation does not stop at the inventory. We now offer a system that suggests the tagging of new data inventoried with a sensitive data profile. This makes it easier to bring this data to the forefront and to spread the information faster and more easily throughout the company.

Conclusion

In the past few years, personal data has become a real concern for most consumers. More and more countries are setting up regulations to guarantee their citizens maximum protection. One of the major principles governing all these regulations is Privacy by Design.

We have from the start included the reflection around personal data at the heart of our product. Both in our technical development and in the processing of our users’ data, as well as in our reflection on the data that our clients process via our catalog. 

We believe that a data catalog can be a significant asset in the implementation and monitoring of Privacy by Design policies. We also heavily rely on automation and AI to bring many more improvements in the upcoming months: automatic construction of technical data lineage, improved detection of sensitive data in the catalog objects to better document them, quality control of processes applied to sensitive data, etc. The possibilities are numerous.

To learn more about the advantages of the catalog in the management of your sensitive and personal data, don’t hesitate to schedule a meeting with one of our experts.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Why Data Privacy is Essential for Successful Data Governance

Actian Corporation

April 8, 2022

Data Privacy essential for Data Governance

Data Privacy is a priority for organizations that wish to fully exploit their data. Considered the foundation of trust between a company and its customers, Data Privacy is the pillar of successful data governance. Understand why in this article.

Whatever the sector of activity or the size of a company, data now plays a key role in the ability for organizations to adapt to their customers, ecosystem, and even competitors. The numbers speak for themselves! Indeed, in a study by Stock Apps, the global Big Data market was worth $215.7 billion in 2021 and is expected to grow 27% in 2022 to exceed $274 billion.

Companies are generating such large volumes of data that data governance has become a priority. Indeed, a company’s data is vital to identifying its target audiences, creating buyer personas, providing personalized responses to its customers, or optimizing the performance of its marketing campaigns. However, this is not the only issue. If data governance provides the possibility to create value with enterprise data assets, it also ensures the proper administration of data confidentiality, also known as Data Privacy.

Data Privacy vs. Data Security: Two Not-So-Very-Different Notions

Data Privacy is one of the key aspects of Data Security. Although different, they take part in the same mission: building trust between a company and its customers who want to entrust their personal data. 

On the one hand, Data Security is the set of means implemented to protect data from internal or external threats, whether malicious or accidental (strong authentication, information system security, etc.).

Data Privacy, on the other hand, is a discipline that concerns the treatment of sensitive data, not only personal data (also called PII for Personally Identifiable Information) but also other confidential data (certain financial data, intellectual property, etc.). Data Privacy is furthermore clearly defined in the General Data Protection Regulation (GDPR), which came into place in Europe in 2018 and has since helped companies redefine responsible and efficient data governance.

Data confidentiality has two main aspects. The first is controlling access to the data – who is allowed to access it and under what conditions. The second aspect of data confidentiality is to put in place mechanisms that will prevent unauthorized access to data.

Why is Data Privacy so Important?

While data protection is essential to preserve this valuable asset and to create the conditions for rapid data recovery in the event of a technical problem or malicious attack, data privacy addresses another equally important issue.

Consumers are suspicious of how companies collect and use their personal information. In a world full of options, customers who lose trust in one company can easily buy elsewhere. To cultivate trust and loyalty, organizations must make data privacy a priority. Indeed, consumers are becoming increasingly aware of data privacy. The GDPR has played a key role in the development of this sensitivity: customers are now very vigilant about the way their personal data is collected and used.

Because digital services are constantly developing, companies gravitate in a world of hyper-competition where customers will not hesitate to switch to a competitor if said company has not done everything possible to preserve the confidentiality of their data. This is the main reason why Data Privacy is so crucial.

Why is Data Privacy a Pillar of Data Governance?

Data governance is about ensuring that data is of sufficient quality and that access is managed appropriately. The company’s objectives are to reduce the risk of misuse, theft, or loss. As such, data privacy should be understood as one of the foundations of sound and effective data governance. 

Even if data governance embraces the data issue in a much broader way, it cannot be done without a perfect understanding of the levers to be used to ensure optimized data confidentiality.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Guide to Data Quality Management #4 – Data Catalog Contribution to DQM

Actian Corporation

April 4, 2022

Contribution Data Quality Management

Data quality refers to an organization’s ability to maintain the quality of its data in time. If we were to take some data professionals at their word, improving Data quality is the panacea to all our business woes and should therefore be the top priority.

We believe this should be nuanced: Data quality is a means amongst others to limit the uncertainties of meeting corporate objectives.

In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):

  1. The nine dimensions of data quality
  2. The challenges and risks associated with data quality
  3. The main features of Data Quality Management tools
  4. The data catalog contribution to DQM

A Data Catalog is not a DQM Tool

An essential element is that a data catalog should not be considered a Data Quality Management tool, per se.

First of all, one of the core principles at the heart of Data Quality is that controls should ideally take place in the source system. Running these controls solely in the data catalog – rather than at the source and the data transformation flow – increases the global cost of the undertaking.

Furthermore, a data catalog must be both comprehensive and less intrusive to facilitate its rapid deployment within the company. This is simply incompatible with the complex nature of data transformation and the multitude of tools used to carry out these transformations.

Lastly, a data catalog must remain a simple tool to understand and use.

How Does a Data Catalog Contribute to DQM?

While the data catalog isn’t a Data Quality tool, its contribution to the upkeep of Data Quality is nonetheless substantial. Here is how:

  • A data catalog enables data consumers to easily understand metadata and avoid hazardous interpretations of the data. It echoes the clarity dimension of quality;
  • A data catalog gives a centralized view of all the available enterprise data. Data Quality information is therefore metadata like any other that carries value and should be made available to all. They are easy to interpret and extract, an echo of the dimensions of accuracy, validity, consistency, uniqueness, completeness, and timeliness.
  • A data catalog has data traceability capacities (Data Lineage), echoing the traceability dimension of quality;
  • A data catalog usually allows direct access to the data sources, echoing the availability dimension of quality.

The Implementation Strategy of the DQM

The following table details how Data Quality is taken into account depending on the different solutions on the market:

As stated above, quality testing should by default take place directly in the source system. Quality test integration in a data catalog can improve user experience, but it isn’t a must in light of its limitations – as Data Quality isn’t integrated into the transformation flow.

That said, when the systems stacks become too complex and we need, for example, to consolidate data from different systems with different functional rules, a Data Quality tool becomes unavoidable.

The implementation strategy will depend on use cases and company objectives. It is nonetheless apropos to put Data Quality in place incrementally to:

  1. Ensure the source systems have put in place the relevant quality rules;
  2. Implement a data catalog to improve quality on the dimensions of clarity, traceability, and/or availability;
  3. Integrate Data Quality in the transformation flows with a specialized tool while importing this information automatically into the data catalog via APIs.

Conclusion

Data Quality refers to the ability of a company to maintain the sustainability of its data over time. We define it through the prism of nine of the sixty dimensions described by DAMA International: completeness, accuracy, validity, uniqueness, consistency, timeliness, traceability, clarity, and availability.

As a data catalog provider, we reject the idea that a data catalog is a full-fledged quality management tool. In fact, it is only one of several ways to contribute to the improvement of Data Quality, notably through the dimensions of clarity, availability, and traceability.

Get our Data Quality Management Guide for Data-Driven Organizations

For more information on Data Quality and DQM, download our free guide: “A Guide to Data Quality Management” now! Download the eBook.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Guide to Data Quality Management #3 – The Main Features of DQM Tools

Actian Corporation

April 3, 2022

Data Quality Management Tools Features

Data Quality refers to an organization’s ability to maintain the quality of its data over time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority.

We believe this should be nuanced: Data Quality is a means amongst others to limit the uncertainties of meeting corporate objectives.

In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):

  1. The nine dimensions of Data Quality
  2. The challenges and risks associated with Data Quality
  3. The main features of Data Quality Management tools
  4. The Data Catalog contribution to DQM

One way to better understand the challenges of Data Quality is to look at the existing Data Quality solutions on the market.

From an operational point of view, how do we identify and correct Data Quality issues? What features do Data Quality Management tools offer to improve Data Quality?

Without going into too much detail, let’s illustrate the pros of a Data Quality Management tool through the main evaluation criteria of Gartner’s Magic Quadrant for Data Quality Solutions.

Connectivity

A Data Quality Management tool has to be able to gather and apply quality rules on all enterprise data (internal, external, on-prem, cloud, relational, non-relational, etc.). The tool must be able to plug into all relevant data in order to apply quality rules.

Data Profiling, Data Measuring, and Data Visualization

You cannot correct Data Quality issues if you cannot detect them first. Data profiling enables IT and business users to assess the quality of the data in order to identify and understand the Data Quality issues.

The tool must be able to carry out what is outlined in The Nine Dimensions of Data Quality to identify quality issues throughout the key dimensions for the organization.

Monitoring

The tool must be able to monitor the evolution of the quality of the data and warn management at a certain point.

Data Standardization and Data Cleaning

Then comes the data cleaning phase. The aim here is to provide data cleaning functionalities in order to enact norms or business rules to alter the data (format, values, page layout).

Data Matching and Merging

The aim is to identify and delete duplicates that can be present within or between datasets.

Address Validation

The aim is to standardize addresses that could be incomplete or incorrect.

Data Curation and Enrichment

The capabilities of a Data Quality Management tool are what enable the integration of data from external sources and improve completeness, thereby adding value to the data.

The Development and Putting in Place of Business Rules

The capabilities of a Data Quality Management tool are what enable the creation, deployment, and management of business rules, which can then be used to validate the data.

Problem Resolution

The quality management tool helps both IT and business users to assign, escalate, solve, and monitor Data Quality problems.

Metadata Management

The tool should also be capable of capturing and reconciling all the metadata related to the Data Quality process.

User-Friendliness

Lastly, a solution should be able to adapt to the different roles within the company, and specifically to non-technical business users.

Get our Data Quality Management Guide for Data-Driven Organizations

For more information on Data Quality and DQM, download our free guide: “A Guide to Data Quality Management” now! Download the eBook.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Guide to Data Quality Management #2 – The Challenges With Data Quality

Actian Corporation

April 2, 2022

The 9 Dimensions of Data Quality

Data Quality refers to an organization’s ability to maintain the quality of its data over time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority. 

We believe this should be nuanced: Data Quality is one means, among others, to limit the uncertainties of meeting corporate objectives. 

In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):

  1. The nine dimensions of Data Quality
  2. The challenges and risks associated with Data Quality
  3. The main features of Data Quality Management tools
  4. The Data Catalog contribution to DQM

The Challenges of Data Quality for Organizations

Initiatives for improving the quality of data are typically implemented by organizations to meet conformity requirements and reduce risk. They are indispensable for reliable decision-making. There are, unfortunately, many stumbling blocks that can hinder Data Quality improvement initiatives. Below are some examples:

  • The exponential growth of the volume, speed, and variety of the data makes the environment more complex and uncertain.
  • Increasing pressure from conformity regulations such as GDPR, BCBS 239, or HIPAA.
  • Teams are increasingly decentralized, and each has its domain of expertise.
  • IT and data teams are snowed under and don’t have time to solve Data Quality issues.
  • The data aggregation processes are complex and long.
  • It can be difficult to standardize data between different sources.
  • Change audits among systems are complex.
  • Governance policies are difficult to implement.

Having said that, there are also numerous opportunities to grab. High-quality data enables organizations to facilitate innovation with artificial intelligence and ensure a more personalized customer experience. Assuming there is enough quality data. 

Gartner has actually forecasted that until 2022, 85% of AI projects will produce erroneous data as a result of bias in the data, algorithms, or from teams in charge of data management.

Reducing the Level of Risk by Improving the Quality of the Data

Poor Data Quality should be seen as a risk and quality improvement software as a possible solution to reduce this level of risk.

Processing a Quality Issue

If we accept the notion above, any quality issue should be addressed in several phases:

1. Risk Identification: This phase consists in seeking out, recognizing, and describing the risks that can help/prevent the organization from reaching its objectives – in part because of a lack of Data Quality.

2. Risk Analysis: The aim of this phase is to understand the nature of the risk and its characteristics. It includes factors for event similarities and their consequences, the nature, and importance of these consequences, etc. Here, we should seek to identify what has caused the poor quality of the marketing data. We could cite for example:

  • A poor user experience of the source system leading to typing errors;
  • A lack of verification of the completeness, accuracy, validity, uniqueness, consistency, or timeliness of the data;
  • A lack of simple means to ensure the traceability, clarity, and availability of the data;
  • The absence of a governance process and the implication for business teams.

3. Risk Evaluation: The purpose of this phase is to compare the results of the risk analysis with the established risk criteria. It helps establish whether further action is needed for the decision-making – for instance keeping the current means in place, undertaking further analysis, etc.

Let’s focus on the nine dimensions of Data Quality and evaluate the impact of poor quality on each of them:

The values for the levels of probability and severity should be defined by the main stakeholders, who know the data in question best. 

4. Risk Processing: This processing phase aims to set out the available options to reduce risk and roll them out. This processing also involves the ability to assess the usefulness of the actions taken, determining whether the residual risk is acceptable or not – and in this last case – consider further processing.

Therefore, improving the quality of the data is clearly not a goal in itself:

  • Its cost must be evaluated based on company objectives.
  • The treatments to be implemented must be evaluated through each dimension of quality.

Get our Data Quality Management Guide for Data-Driven Organizations

For more information on Data Quality and DQM, download our free guide: “A Guide to Data Quality Management” now! Download the eBook

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Guide to Data Quality Management #1 – The 9 Dimensions of Data Quality

Actian Corporation

April 1, 2022

The 9 Dimensions of Data Quality

Data Quality refers to an organization’s ability to maintain the quality of its data over time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority. 

We believe this should be nuanced: Data Quality is one means, among others, to limit the uncertainties of meeting corporate objectives. 

In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):

    1. The nine dimensions of data quality
    2. The challenges and risks associated with data quality
    3. The main features of Data Quality Management tools
    4. The data catalog contribution to DQM

Some Definitions of Data Quality

Asking Data Analysts or Data Engineers for a definition of Data Quality will provide you with very different answers, even within the same company, amongst similar profiles. Some, for example, will focus on the unity of data, while others will prefer to reference standardization. You may have your interpretation.

The ISO 9000-2015 norm defines quality as “the capacity of an ensemble of intrinsic characteristics to satisfy requirements”.

DAMA International (The Global Data Management Community) – a leading international association involving both business and technical data management professionals – adapts this definition to a data context: “Data Quality is the degree to which the data dimensions meet requirements.”

The Dimensional Approach to Data Quality

From an operational perspective, Data Quality translates into what we call Data Quality dimensions, in which each dimension relates to a specific aspect of quality.

The 4 dimensions most often used are generally completeness, accuracy, validity, and availability. In literature, there are many dimensions and different criteria to describe Data Quality. There isn’t however any consensus on what these dimensions actually are.

For example, DAMA enumerates sixty dimensions – when most Data Quality Management (DQM) software vendors usually offer up five or six.

The Nine Dimensions of Data Quality

At Zeenea, we believe that the ideal compromise is to take into account nine Data Quality dimensions: completeness, accuracy, validity, uniqueness, consistency, timeliness, traceability, clarity, and availability.

We will illustrate these nine dimensions and the different concepts we refer to in this publication with a straightforward example:

Arthur is in charge of sending marketing campaigns to clients and prospects to present his company’s latest offers. He encounters, however, certain difficulties:

  • Arthur sometimes sends communications to the same people several times.
  • The emails provided in his CRM are often invalid.
  • Prospects and clients do not always receive the right content.
  • Some information pertaining to the prospects are obsolete.
  • Some clients receive emails with erroneous gender qualifications.
  • There are two addresses for clients/prospects but it’s difficult to understand what they relate to.
  • He doesn’t know the origin of some of the data he is using or how he can access their source.

Below is the data Arthur has at hand for his sales efforts. We shall use them to illustrate each of the nine dimensions of Data Quality:

1. Completeness

Is the data complete? Is there information missing? The objective of this dimension is to identify the empty, null, or missing data. In this example, Arthur notices that there are missing email addresses:

To remedy this, he could try and identify whether other systems have the information needed. Arthur could also ask data specialists to manually insert the missing email addresses.

2. Accuracy

Are the existing values coherent with the actual data, i.e., the data we find in the real world?

Arthur noticed that some letters sent to important clients are returned because of incorrect postal addresses. Below, we can see that one of the addresses doesn’t match the standard address formats in the real world:

It could be helpful here for Arthur to use postal address verification services.

3. Validity

Does the data conform with the syntax of its definition? The purpose of this dimension is to ensure that the data conforms to a model of a particular rule.

Arthur noticed that he regularly gets bounced emails. Another problem is that certain prospects/clients do not receive the right content because they haven’t been accurately qualified. For example, the email address annalincoln@apple isn’t in the correct format and the Client Type Customer isn’t correct.

To solve this issue, he could for example make sure that the Client Type values are part of a list of reference values (Customer or Prospect) and that email addresses conform to a specific format.

4. Consistency

Are the different values of the same record in conformity with a given rule? The aim is to ensure the coherence of the data between several columns.

Arthur noticed that some of his male clients complain about receiving emails in which they are referred to as Miss. There does appear to be an incoherence between the Gender and Title columns for Lino Rodrigez.

To solve these types of problems, it is possible to create a logical rule that ensures that when the id Gender is Male, the title should be Mr.

5. Timeliness

Is the time lapse between the creation of the data and its availability appropriate? The aim is to ensure the data is accessible in as short a time as possible.

Arthur noticed that certain information on prospects is not always up to date because the data is too old. As a company rule, data on a prospect that is older than 6 months cannot be used.

He could solve this problem by creating a rule that identifies and excludes data that is too old. An alternative would be to harness this same information in another system that contains fresher data.

6. Uniqueness

Are there duplicate records? The aim is to ensure the data is not duplicated.

Arthur noticed he was sending the same communications several times to the same people. Lisa Smith, for instance, is duplicated in the folder:

In this simplified example, the duplicated data is identical. More advanced algorithms such as Jaro, Jaro-Winkler, or Levenshtein, for example, can regroup duplicated data more accurately.

7. Clarity

Is understanding the metadata easy for the data consumer? The aim here is to understand the significance of the data and avoid interpretations.

Arthur has doubts about the two addresses given as it is not easy to understand what they represent. The names Street Address 1 and Street Address 2 are subject to interpretation and should be modified, if possible.

Renaming within a database is often a complicated operation and should be correctly documented with at least one description.

8. Traceability

Is it possible to obtain traceability from data? The aim is to get to the origin of the data, along with any transformations it may have gone through.

Arthur doesn’t really know where the data comes from or where he can access the data sources. It would have been quite useful for him to know this as it would have ensured the problem was fixed at the source. He would have needed to know that the data he is using with his marketing tool originates from the data of the company data warehouse, itself sourced from the CRM tool.

9. Availability

How can the data be consulted or retrieved by the user? The aim is to facilitate access to the data.

Arthur doesn’t know how to easily access the source data. Staying with the previous schema, he wants to effortlessly access data from the data warehouse or the CRM tool.

In some cases, Arthur will need to make a formal request to access this information directly.

Get our Data Quality Management Guide for Data-Driven Organizations

For more information on Data Quality and DQM, download our free guide: “A Guide to Data Quality Management”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Management

Zen Edge Database and Ado.net on Raspberry Pi

Actian Corporation

March 31, 2022

data management words on a laptop screen

Do you have a data-centric Windows application you want to run at the Edge? If so, this article demonstrates an easy and affordable way to accomplish this by using the Zen Enterprise Database through Ado.net on a Raspberry Pi. Raspberry Pi features a 64-bit ARM processor, can accommodate several operating systems, and cost around $50 (USD).

These instructions use Windows 11 for ARM64 installed on a Raspberry Pi V4 with 8 GB RAM for this example. (You could consider using Windows 10 (or another ARM64-based board), but you would first need to ensure Microsoft supports your configuration.)

Here are the steps and results as follows.

  • Use the Microsoft-installed Windows emulation with Windows 11. ARM64bit for Windows 11 installer
  • After the installer finishes, the Windows 11 directory structure should look like the figure below:

  • The installer creates Arm, x86, and x64bit directories for windows simulation.
  • Next, run a .Net Framework application using Zen ADO.NET provider on Windows 11 for ARM64 bit on Raspberry Pi.

Once the framework has been established, create an ADO.NET application using VS 2019 on a Windows platform where Zen v14 was installed and running.

To build the simple application, use a C# Windows form application, as seen in the following diagram.

Name and configure the project and point it to a location on the local drive (next diagram).

Create a form and add two command buttons and text boxes. Name it “Execute” and “Clear,” and add a DataGridView as follows.

Add Pervasive.Data.SqlClient.dll under project solution references by selecting the provider from C:Program Files (x86)ActianZenbinADONET4.4 folder. Add a “using” clause in the program code as

using Pervasive.Data.SqlClient;.

Add the following code under the “Execute” button.

Add the following code under the “Clear” button.

Then, add the connection information and SQL statement to the text boxes added in the previous steps as follows.

Zen Edge

Now the project is ready to compile, as seen below.

Use a “localhost” in the connection string to connect to the local system where the Zen engine is running. This example uses the Demodata database “class” table to select data.

Se “Execute” will then return the data in the Grid as follows.

Now the application is ready to be deployed on Raspberry Pi. To do so, all copy the “SelectData.Exe” from the C:testSelectDataSelectDatabinDebug folder and Zen ADO.NET provider “Pervasive.Data.SqlClient.dll “. Copy it to a folder on Windows 11 for ARM64bit on Raspberry Pi.

Next, register the ZEN ADO.NET provider to the GAC using Gacutil as follows.

Gacutil /f /I <dir>Pervasive.Data.SqlClient.dll

Zen Edge Database

Run the SelectData app and connect to a remote server where ZEN engine is running as a client-server application.

Change the server name or IP address in the connection string to your server where the Zen V14 or V15 engine is running.

Now the Windows application is running in the client-server using Zen Ado.net provider on a Raspberry Pi with Windows 11 for Arm64 bit installed.

And that’s it!  Following these instructions, you can build and deploy a data-centric Windows 11 application on a Raspberry Pi ARM64.  This or similar application can run on a client or server to upstream or downstream data clients such as sensors or other devices that generate or require data from an edge database.  Zen Enterprise uses standard SQL queries to create and manage data tables, and the same application and database will run on your Microsoft Windows-based (or Linux) laptops, desktops, or in the Cloud.  For a quick tutorial on the broad applicability of Zen, watch this video.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

What is the Difference Between Data Governance and Data Management?

Actian Corporation

March 31, 2022

Difference between Data Governance and Data Management

In a world where companies aspire to become data-driven, data management and data governance are concepts that must be mastered at all costs. Too often perceived as related or even interchangeable disciplines, the differences are important.

A company wanting to become data-driven must master the disciplines, concepts, and methodologies that govern the collection and use of data. Among those that are most often misunderstood are data governance and data management. 

On the one hand, data governance consists of defining the organizational structures of data – who owns it, who manages it, who exploits it, etc. On the other hand, data governance is about policies, rules, processes, and monitoring of indicators that allow for a sound administration of data throughout its life cycle (from collection to deletion).

Data management can therefore be defined as the technical application of the recommendations and measures defined by data governance.

Data Governance vs. Data Management: Their Different Missions

The main difference between data governance and data management is that the former has a strategic dimension, while the latter is rather operational.

Without data governance, data management cannot be efficient, rational, or sustainable. Indeed, data governance that is not restated into appropriate data management will remain a theoretical document or a letter of intent that will not allow you to actively and effectively engage in data-driven decision-making.

To understand what is at stake, it is important to understand that all the disciplines related to data are permanently overlapping and interdependent. Data governance is a conductor that orchestrates the entire system. It is based on a certain number of questions such as:

  • What can we do with our data?
  • How do we ensure data quality?
  • Who is responsible for the processes, standards, and policies defined to exploit the data?

Data management is the pragmatic way to answer these questions and make the data strategy a reality. Data management and data governance can and should work in tandem. However, data governance is mainly concerned with the monitoring and processing of all the company’s data, while data management is mainly concerned with the storage and retrieval of certain types of information.

Who are the Actors of Data Governance and Management?

At the top management level, the CEO is naturally the main actor in data governance, as they are its legal guarantor. But they are not the only one who must get involved.

The CIO (Chief Information Officer) plays a key role in securing and ensuring the availability of the infrastructure. However, constant access to data is crucial for the business (marketing teams, field salespeople) but also for all the data teams who are in charge of the daily reality of data management.

It is then up to the Chief Data Officer (CDO) to create the bridge between these two entities and break down the data silos in order to build agile data governance. He or she facilitates access to data and ensures its quality in order to add value to it.

And while the Data Architect will be more involved in data governance, the Data Engineer will be more involved in data management. As for the Data Steward, he or she is at the confluence of the two disciplines.

How Combining the Two Roles Helps Companies Become Data-Driven

Despite their differences in scope and means, the concepts of data governance and data management should not be opposed. In order for a company to adopt a data-driven strategy, it is imperative to reconcile these two axes within a common action. To achieve this, an organization’s director/CEO must be the first sponsor of data governance and the first actor in data management.

It is by communicating internally with all the teams and by continuously developing the data culture among all employees that data governance serves the business challenges while preserving a relationship of trust that unites the company with its customers.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

5 Product Values That Strengthen Team Cohesion & Experience

Actian Corporation

March 14, 2022

5 Product Values that Strangthen Zeeneas Team cohension customer experience

To remain competitive, organizations must make decisions quickly, as the slightest mistake can lead to a waste of precious time in the race for success. Defining the company’s reason for being, its direction, and its strategy makes it possible to build a solid foundation for creating an alignment – subsequently facilitating decisions that impact product development. Aligning all stakeholders in product development is a real challenge for Product Managers. Yet, it is an essential mission to bring up a successful product and an obvious prerequisite to motivate teams who need to know why they get up each morning to go to work.

The Foundations of a Shared Product Vision Within the Company

Various frameworks (NorthStar, OKR, etc.) have been developed over the last few years to enable companies and their product teams to lay these foundations, disseminate them within the organization, and build a roadmap that creates cohesion. These frameworks generally define a few key artifacts and have already given rise to a large body of literature. Although versions may differ from one framework to another, the following concepts are generally found:

  • Vision: The dream, the true North of a team. The vision must be inspiring and create a common sense of purpose throughout the organization.
  • The Mission: It represents an organization’s primary objective and must be measurable and achievable.
  • The Objectives: These define measurable short and medium-term milestones to accomplish the mission.
  • The Roadmap: A source of shared truth – it describes the vision, direction, priorities, and progress of a product over time.

With a clear and shared definition of these concepts across the company, product teams have a solid foundation for identifying priority issues and effectively ordering product backlogs.

Product Values: The Key to Team Buy-in and Alignment Over Time

Although well-defined at the beginning, these concepts described above can nevertheless fall into oblivion after a while or become obsolete! Indeed, the company and the product evolve, teams change, and consequently the product can lose its direction… A work of reconsideration and acculturation must therefore be carried out continuously by the product teams in order for it to last.

Indeed, product development is both a sprint and a marathon. One of the main difficulties for product teams is to maintain this alignment over time. In this respect, another concept in these frameworks is often under-exploited when it is not completely forgotten by organizations: product values.

Jeff Steiner, Executive Chairman at LinkedIn, particularly emphasized the importance of defining company values through the Vision to Values framework. LinkedIn defines values as “The principles that guide the organization’s day-to-day decisions; a defining element of your culture”. For example “be honest and constructive”, “demand excellence”, etc.

Defining product values in addition to corporate values can be a great way for product teams to create this alignment over time and this is exactly what we do at the Actian Data Intelligence Platform.

From Corporate Vision to Product Values: A Focus on a Data Catalog

Organization & Product Consistency

We have a shared vision – “Be the first step of any data journey” – and a clear mission – “To help data teams accelerate their initiatives by creating a smart & reliable data asset landscape at the enterprise level”.

We position ourselves as a data catalog pure-player and we share the responsibility of a single product between several Product Managers. This is why we have organized ourselves into feature teams. This way, each development team can take charge of any new feature or evolution according to the company’s priorities, and carry it out from start to finish.

If we prioritize the backlog and delivery by defining and adapting our strategy and organization according to the objectives, three problems remain:

  • How do we ensure that the product remains consistent over time when there are multiple pilots onboard the plane?
  • How do we favor one approach over another?
  • How do we ensure that a new feature is consistent with the rest of the application?

Indeed, each product manager has his or her own sensitivity, his or her own background. And if the problems are clearly identified, there are usually several ways to solve them. This is where product values come into play…

Actian Data Intelligence Platform’s Product Values

If the vision and the mission help us to answer the “why?”, the product values allow us to remain aligned with the “how?”. It is a precious tool that challenges the different possible approaches to meet customer needs. And each Product Manager can refer to these common values to make decisions, prioritize a feature or reject it, and ensure a unified & unique user experience across the product.

Thus, each new feature is built with the following 5 product values as guides:

Simplicity

This value is at the heart of our convictions. The objective of a Data Catalog is to democratize data access. To achieve this, facilitating catalog adoption for end users is key. Simplicity is clearly reflected in the way each functionality is proposed. Many applications end up looking like Christmas trees with colored buttons all over the place that no one knows how to use; others require weeks of training before the first button is clicked. The use of the Data Catalog should not be reserved to experts and should therefore be obvious and fluid regardless of the user’s objective. This value was reflected in our decision to create two interfaces for our Data Catalog: one dedicated to search and exploration, and the other for the management and monitoring of the catalog’s documentation.

Empowering

Documentation tasks are often time-consuming and it can be difficult to motivate knowledgeable people to share and formalize their knowledge. In the same way, the product must encourage data consumers to be autonomous in their use of data. This is why we have chosen not to offer rigid validation workflows, but rather a system of accountability. This allows Data Stewards to be aware of the impacts of their modifications. Coupled with an alerting and auditing system after the fact, it ensures better autonomy while maintaining traceability in the event of a problem.

Reassuring

It is essential to allow end-users to trust in the data they consume. The product must therefore reassure the user by the way it presents its information. Similarly, Data Stewards who maintain a large amount of data need to be reassured about the operations for which they are responsible: have I processed everything correctly? How can I be sure that there are no inconsistencies in the documentation? What will really happen if I click this button? What if it crashes? The product must create an environment where the user feels confident using the tool and its content. This value translates into preventive messages rather than error reports, a language type, idempotency of import operations, etc.

Flexibility

Each client has their own business context, history, governance rules, needs, etc. The data catalog must be able to adapt to any context to facilitate its adoption. Flexibility is an essential value to enable the catalog to adapt to all current technological contexts and to be a true repository of data at enterprise level. The product must therefore adapt to the user’s context and be as close as possible to their uses. Our flat and incremental modeling is based on this value, as opposed to the more rigid hierarchical models offered on the market.

Deep Tech

This value is also very important in our development decisions. Technology is at the heart of our product and must serve the other values (notably simplicity and flexibility). Documenting, maintaining, and exploiting the value of enterprise-wide data assets cannot be done without the help of intelligent technology (automation, AI, etc.). The choice to base our search engine on a knowledge graph or our positioning in terms of connectivity are illustrations of this “deep tech” value.

The Take Away

Creating alignment around a product is a long-term task. It requires Product Managers – in synergy with all stakeholders – to define from the very beginning: the vision, the mission, and the objectives of the company. This enables product management teams to effectively prioritize the work of their teams. However, to ensure the coherence of a product over time, the definition and use of product values  are essential. With the Actian Data Intelligence Platform, our product values are simplicity, autonomy, trust, flexibility and deep-tech. They are reflected in the way we design and enhance our Data Catalog and allow us to ensure a better customer experience over time.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Security

Hybrid Cloud Security

Actian Corporation

March 7, 2022

Hybrid Cloud Security padlock

One of the biggest fears of cloud adoption is the security of organizational data and information. IT security has always been an issue for all organizations, but the thought of not having total control over corporate data is frightening. One of the factors for organizations not moving everything to the cloud and adopting a hybrid cloud approach is security concerns. Hybrid cloud security architectures still have security risks related to a public cloud; however, hybrid cloud risks are higher simply because there are more clouds to protect. The trust boundary is extended beyond the organization for access to its essential critical data with hybrid cloud architectures.

Sensitive data can be kept off the public cloud to help manage risk. Doing so today may be helpful, but hybrid cloud solutions are integrations between public and private clouds. This integration without the appropriate security could still make your private cloud solution vulnerable to attacks originating from the public cloud. Secure hybrid clouds have significant benefits to organizations today. Along with the great benefits of the cloud are the negative aspects and challenges faced with securing the organizations’ data. The negative aspects are continually being addressed to help realize the incredible benefits that hybrid cloud architectures can provide for organizations today.

What is Hybrid Cloud Security?

Organizational IT infrastructures have increased in complexity, especially with hybrid cloud implementations. This complexity, combined with the benefits of cloud having characteristics of broad network access, and on-demand everywhere access capabilities, complicates how securing a hybrid cloud can be done. Securing the data, applications, and infrastructure internally and externally from hackers’ malicious adversary tactics and inadvertent, unintentional activities are compounded.

Many cloud vendors have adopted industry compliance and governance security standards, especially those created by the US government, to ease the security threats and risks that an organization may experience in the cloud. The Federal Risk and Authorization Program (FedRAMP) provides standards and accreditations for cloud services. The Security Requirement Guide (SRG) provides security controls and requirements for cloud service in the Department of Defense (DOD). These standards and others help cloud vendors and organizations improve their hybrid cloud security.

Securing the cloud, an organization should consider the cloud architecture components that consist of applications, data, middleware, operating systems, virtualization, servers, storage, and networking components. Security concerns are specific to the service type. Organizations have a shared responsibility with the cloud service provider for security with hybrid cloud security.

The responsibility for hybrid cloud security should include specific disciplines. Some essential discipline areas for managing risk and securing hybrid cloud are:

  • Physical controls to deter intruders and create protective barriers to IT assets are just as important as cybersecurity for protecting assets.
    • Security parameters, cameras, locks, alarms.
    • Physical controls can be seen as the first line of defense for protecting organizational IT assets. Not only from security threats but from overall harm from environmental challenges.
    • Biometrics (one or more fingerprints, possibly retina-scans) where system access ties to extremely sensitive data.
  • Technical controls.
    • Cloud patching fixes vulnerabilities in software and applications that are targets of cyber-attacks. Besides overall keeping systems up to date, this helps reduce security risk for hybrid cloud environments.
    • Multi-tenancy security each tenant or customer is logically separated in a cloud environment. This means each tenant has access to the cloud environment, but the boundaries are purely virtual, and hackers can find ways to access data across virtual boundaries if resources are improperly assigned and data overflows from one tenant can impinge on another. Data must be properly configured and isolated to avoid interference between tenants.
    • Encryption is needed for data at rest and data in transit. Data at rest is sitting in storage, and data in transit, going across the network and the cloud layers (SaaS, PaaS, IaaS). Both have to be protected. More often than not, data at rest isn’t encrypted because it’s an option that is not turned on by default.
    • Automation orchestration is needed to remove slow manual responses for hybrid cloud environments. Monitoring, checking for compliance, appropriate responses, and implementations should be automated to eliminate human error. These responses should also be reviewed and continuously improved.
    • Access controls – People and technology accesses should always be evaluated and monitored on a contextual basis including date, time, location, network access points, and so forth. Define normal access patterns and monitor for abnormal patterns and behavior, which could be an alert to a possible security issue.
    • Endpoint security for remote access has to be managed and controlled. Devices can be lost, stolen, or hacked, providing an access point into a hybrid cloud and all of its data and resources. Local ports on devices that allow printing or USB drives would need to be locked for remote workers or monitored and logged when used.
  • Administrative controls to account for human factors in cloud security.
    • Zero trust architecture (ZTA), principles and policy continually evaluate trusted access to cloud environments to restrict access for only minimum privileges. Allowing too much access to a person or technology solution can cause security issues. Adjustments to entitlements can be made in real-time, for example, is a user suddenly downloading far more documents? Are those documents outside his or her normal scope of work or access?  Of course, this requires data governance that includes tagging and role-based access that maps entitlements to tagging.
    • Disaster recovery – Performing business impact analysis (BIA) and risk assessments are crucial for performing disaster recovery and deciding how hybrid cloud architectures should be implemented. Including concerns related to data redundancy and placement within a cloud architecture for service availability and rapid remediation post attack.
    • Social engineering education and technical controls for phishing, baiting, etc. Social engineering is an organizational issue and a personal issue for everyone.  Hackers can steal corporate data and personal data to access anything for malicious purposes.
    • A culture of security is critical for organizations. The activities of individuals are considered one the most significant risk to the organization. Hackers target their access to any organization through the organization’s employees as well as partners and even third-party software vendors and services contractors. The employees, contractors, and partners need to be educated continuously to help avoid security issues that can be prevented with training and knowledge.
  • Supply chain controls.
    • Software, infrastructure, and platform from 3rd parties have to be evaluated for security vulnerabilities. Software from a 3rd party supplier, when installed, could have security vulnerabilities or have been hacked that allow criminals complete access to an organization’s hybrid cloud environment. Be sure to check how all 3rd party software vendors approach and practice safe security controls over their products.

Security in the cloud is a shared responsibility that becomes more complex as deployments are added. Shared Services are a way to deliver functions such as security, monitoring, authorization, backups, patching, upgrades, and more in a cost-effective, reliable way to all clouds. Shared services reduce management complexity and are essential to achieve a consistent security posture across your hybrid cloud security architecture.

Configuration Management and Hybrid Cloud Security

Hybrid cloud security architecture risks are higher simply because there are more clouds to protect. For this reason, here are a few extra items that you should put on your hybrid cloud security best practices list, including visibility, shared services, and configuration management. First, you can’t secure what you can’t see. Hybrid cloud security requires visibility across the data center and private and public cloud borders to reduce hybrid cloud risks resulting from blind spots.

Another area to focus on is configuration management since misconfigurations are one of the most common ways for digital criminals to land and expand in your hybrid cloud environments. Encryption isn’t turned on, and access hasn’t been restricted; security groups aren’t set up correctly, ports aren’t locked down. The list goes on and on. Increasingly, hybrid cloud security teams need to understand cloud infrastructure better to secure it better and will need to include cloud configuration auditing as part of their delivery processes.

One of the Hybrid cloud security tools that can be utilized is a Configuration Management System (CMS) using configuration management database (CMDB) technology as the foundation that can help organizations gain visibility into hybrid cloud configurations and the relationships of all cloud components. The first activity with a CMS involves discovering all cloud assets or configuration items that make up the services being offered. At this time, a snapshot of the environment is made with essential details of the cloud architecture. Once discovering their hybrid cloud architecture, many organizations immediately look for security concerns that violate security governance.

Once the CMS is in place, other hybrid cloud security tools such as drift management and monitoring changes in the cloud architecture can alert to cloud attacks. Once the unauthorized drift is detected, other automation tools to correct and alert can be implemented to counterattack the attack. The CMS and the CMDB support cloud security operations and other service management areas, such as incident, event, and problem management, to help provide a holistic solution for the organization’s service delivery and service support.

Conclusion

Security issues in hybrid cloud computing aren’t that different from security issues in cloud computing. You can review the articles on Security, Governance, and Privacy for the Modern Data Warehouse, Part 1 and Part 2, that provide a lot of pointers on how to protect your data and cloud services.

Hybrid cloud security risks and issues will be one of those IT organizational business challenges that will be around for a long time. Organizations need to stay informed and have the latest technologies and guidance for combating the hybrid cloud security issues and threats. This includes partnering with hybrid cloud solution providers such as Actian. It is essential for the organization’s ability to function with consistently changing cloud security needs.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Interview With Ruben Marco Ganzaroli – CDO at Autostrade per l’Italia

Actian Corporation

March 3, 2022

We are pleased to have been selected by Autostrade per l’Italia – a European leader among concessionaires for the construction and management of toll highways – to deploy the Actian Data Intelligence Platform’s data catalog at the group level. We took this opportunity to ask Ruben Marco Ganzaroli a few questions, who joined the company in 2021 as a Chief Data Officer to support the extensive Next to Digital program, to digitally transform the company. A program with a data catalog as its starting point.

Q: CDOs are becoming critical to a C-level team. How important is data to the strategic direction of Autostrade per l’Italia?

Data is at the center of the huge Digital Transformation program started in 2021, called « Next to Digital », which aims at transforming Autostrade per l’Italia into a Sustainable Mobility Leader. We wanted to protect whoever is traveling on our highways, execute decisions faster, as well as be agile and fluid. We not only want to react immediately to what is happening around us, but also to be able to anticipate events and take action before they occur. The Company was started in the early 50s – last century, and we realized that all the data we collected throughout the years could be a unique advantage and a strong lever to transform the company.

Q: What are the main challenges you want to address by implementing a data catalog in your organization?

We think that only the business functions of the Autostrade group can truly transform the company into a data-driven one. To do this, business functions need to be supported by the right tools – efficient and usable – and they must be fully aware of the data they have available. Ideas, and therefore value, are generated only if you have a clear idea of ​​the environment in which you are moving within, and the objective you are aiming for. If, without knowing it, you have a gold bar under your mattress, you will sleep uncomfortably and realize that you could do something to improve your situation – probably by changing mattresses, for example. However, if you are aware that you have that gold bar, you will lift the mattress, take the bar, and turn it into a jewel – maximizing its value.

The data catalog builds the bridge between business and data at Autostrade. It is the tool that allows business users to have knowledge on the fact that there are many gold bars available and to know where they can be found.

Q: What features were you looking for in a data catalog and that you found in the platform?

From a business perspective, a data catalog is the access point to all data. It must be fast, complete, easy to understand and user friendly, and represent a lever (not an obstacle). Business users must not be forced to spend the majority of their time on it. Whereas from an IT perspective, a data catalog must be agile, scalable, as well as quickly and continuously upgradeable as data is continuously being ingested or created.

Q: What is your vision of a data catalog in the data management solutions’ ecosystem?

We don’t think of the catalog as a tool, but as a part of the environment we need, as IT, to make available to the business functions. This ecosystem naturally includes tools, but what’s also important is the mindset of its users. To lead this mindset change, business functions must be able to work with data, and that’s the reason Self-BI is our main goal for 2022 as CDO Office. As mentioned previously, the catalog is the starting point for all of that. It is the door that lets the business in the data-room.

Q: How will you drive catalog adoption among your data teams?

All leaders from our team, Leonardo B. for the Data Product, Fulvio C. for Data Science, Marco A. and Andrea Q. for Data Engineering and Cristina M. as Scrum (super)Master are focused on managing the program. This program foresees an initial training phase for business users, an on-the-job dedicated support and an on-the-room support. Business users will participate in the delivery of their own analysis. We will onboard business functions incrementally, to focus the effort and maximize the effectiveness of each business function. The goal is to onboard all business functions within 2022: it represents a lot of work, but is made easier by knowing that there is a whole company behind that supports us and strongly believes that we are going in the right direction.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.