Data Intelligence

Guide to Data Quality Management #3 – The Main Features of DQM Tools

Actian Corporation

April 3, 2022

Data Quality Management Tools Features

Data Quality refers to an organization’s ability to maintain the quality of its data over time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority.

We believe this should be nuanced: Data Quality is a means amongst others to limit the uncertainties of meeting corporate objectives.

In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):

  1. The nine dimensions of Data Quality
  2. The challenges and risks associated with Data Quality
  3. The main features of Data Quality Management tools
  4. The Data Catalog contribution to DQM

One way to better understand the challenges of Data Quality is to look at the existing Data Quality solutions on the market.

From an operational point of view, how do we identify and correct Data Quality issues? What features do Data Quality Management tools offer to improve Data Quality?

Without going into too much detail, let’s illustrate the pros of a Data Quality Management tool through the main evaluation criteria of Gartner’s Magic Quadrant for Data Quality Solutions.

Connectivity

A Data Quality Management tool has to be able to gather and apply quality rules on all enterprise data (internal, external, on-prem, cloud, relational, non-relational, etc.). The tool must be able to plug into all relevant data in order to apply quality rules.

Data Profiling, Data Measuring, and Data Visualization

You cannot correct Data Quality issues if you cannot detect them first. Data profiling enables IT and business users to assess the quality of the data in order to identify and understand the Data Quality issues.

The tool must be able to carry out what is outlined in The Nine Dimensions of Data Quality to identify quality issues throughout the key dimensions for the organization.

Monitoring

The tool must be able to monitor the evolution of the quality of the data and warn management at a certain point.

Data Standardization and Data Cleaning

Then comes the data cleaning phase. The aim here is to provide data cleaning functionalities in order to enact norms or business rules to alter the data (format, values, page layout).

Data Matching and Merging

The aim is to identify and delete duplicates that can be present within or between datasets.

Address Validation

The aim is to standardize addresses that could be incomplete or incorrect.

Data Curation and Enrichment

The capabilities of a Data Quality Management tool are what enable the integration of data from external sources and improve completeness, thereby adding value to the data.

The Development and Putting in Place of Business Rules

The capabilities of a Data Quality Management tool are what enable the creation, deployment, and management of business rules, which can then be used to validate the data.

Problem Resolution

The quality management tool helps both IT and business users to assign, escalate, solve, and monitor Data Quality problems.

Metadata Management

The tool should also be capable of capturing and reconciling all the metadata related to the Data Quality process.

User-Friendliness

Lastly, a solution should be able to adapt to the different roles within the company, and specifically to non-technical business users.

Get our Data Quality Management Guide for Data-Driven Organizations

For more information on Data Quality and DQM, download our free guide: “A Guide to Data Quality Management” now! Download the eBook.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Guide to Data Quality Management #2 – The Challenges With Data Quality

Actian Corporation

April 2, 2022

The 9 Dimensions of Data Quality

Data Quality refers to an organization’s ability to maintain the quality of its data over time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority. 

We believe this should be nuanced: Data Quality is one means, among others, to limit the uncertainties of meeting corporate objectives. 

In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):

  1. The nine dimensions of Data Quality
  2. The challenges and risks associated with Data Quality
  3. The main features of Data Quality Management tools
  4. The Data Catalog contribution to DQM

The Challenges of Data Quality for Organizations

Initiatives for improving the quality of data are typically implemented by organizations to meet conformity requirements and reduce risk. They are indispensable for reliable decision-making. There are, unfortunately, many stumbling blocks that can hinder Data Quality improvement initiatives. Below are some examples:

  • The exponential growth of the volume, speed, and variety of the data makes the environment more complex and uncertain.
  • Increasing pressure from conformity regulations such as GDPR, BCBS 239, or HIPAA.
  • Teams are increasingly decentralized, and each has its domain of expertise.
  • IT and data teams are snowed under and don’t have time to solve Data Quality issues.
  • The data aggregation processes are complex and long.
  • It can be difficult to standardize data between different sources.
  • Change audits among systems are complex.
  • Governance policies are difficult to implement.

Having said that, there are also numerous opportunities to grab. High-quality data enables organizations to facilitate innovation with artificial intelligence and ensure a more personalized customer experience. Assuming there is enough quality data. 

Gartner has actually forecasted that until 2022, 85% of AI projects will produce erroneous data as a result of bias in the data, algorithms, or from teams in charge of data management.

Reducing the Level of Risk by Improving the Quality of the Data

Poor Data Quality should be seen as a risk and quality improvement software as a possible solution to reduce this level of risk.

Processing a Quality Issue

If we accept the notion above, any quality issue should be addressed in several phases:

1. Risk Identification: This phase consists in seeking out, recognizing, and describing the risks that can help/prevent the organization from reaching its objectives – in part because of a lack of Data Quality.

2. Risk Analysis: The aim of this phase is to understand the nature of the risk and its characteristics. It includes factors for event similarities and their consequences, the nature, and importance of these consequences, etc. Here, we should seek to identify what has caused the poor quality of the marketing data. We could cite for example:

  • A poor user experience of the source system leading to typing errors;
  • A lack of verification of the completeness, accuracy, validity, uniqueness, consistency, or timeliness of the data;
  • A lack of simple means to ensure the traceability, clarity, and availability of the data;
  • The absence of a governance process and the implication for business teams.

3. Risk Evaluation: The purpose of this phase is to compare the results of the risk analysis with the established risk criteria. It helps establish whether further action is needed for the decision-making – for instance keeping the current means in place, undertaking further analysis, etc.

Let’s focus on the nine dimensions of Data Quality and evaluate the impact of poor quality on each of them:

The values for the levels of probability and severity should be defined by the main stakeholders, who know the data in question best. 

4. Risk Processing: This processing phase aims to set out the available options to reduce risk and roll them out. This processing also involves the ability to assess the usefulness of the actions taken, determining whether the residual risk is acceptable or not – and in this last case – consider further processing.

Therefore, improving the quality of the data is clearly not a goal in itself:

  • Its cost must be evaluated based on company objectives.
  • The treatments to be implemented must be evaluated through each dimension of quality.

Get our Data Quality Management Guide for Data-Driven Organizations

For more information on Data Quality and DQM, download our free guide: “A Guide to Data Quality Management” now! Download the eBook

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Guide to Data Quality Management #1 – The 9 Dimensions of Data Quality

Actian Corporation

April 1, 2022

The 9 Dimensions of Data Quality

Data Quality refers to an organization’s ability to maintain the quality of its data over time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority. 

We believe this should be nuanced: Data Quality is one means, among others, to limit the uncertainties of meeting corporate objectives. 

In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):

    1. The nine dimensions of data quality
    2. The challenges and risks associated with data quality
    3. The main features of Data Quality Management tools
    4. The data catalog contribution to DQM

Some Definitions of Data Quality

Asking Data Analysts or Data Engineers for a definition of Data Quality will provide you with very different answers, even within the same company, amongst similar profiles. Some, for example, will focus on the unity of data, while others will prefer to reference standardization. You may have your interpretation.

The ISO 9000-2015 norm defines quality as “the capacity of an ensemble of intrinsic characteristics to satisfy requirements”.

DAMA International (The Global Data Management Community) – a leading international association involving both business and technical data management professionals – adapts this definition to a data context: “Data Quality is the degree to which the data dimensions meet requirements.”

The Dimensional Approach to Data Quality

From an operational perspective, Data Quality translates into what we call Data Quality dimensions, in which each dimension relates to a specific aspect of quality.

The 4 dimensions most often used are generally completeness, accuracy, validity, and availability. In literature, there are many dimensions and different criteria to describe Data Quality. There isn’t however any consensus on what these dimensions actually are.

For example, DAMA enumerates sixty dimensions – when most Data Quality Management (DQM) software vendors usually offer up five or six.

The Nine Dimensions of Data Quality

At Zeenea, we believe that the ideal compromise is to take into account nine Data Quality dimensions: completeness, accuracy, validity, uniqueness, consistency, timeliness, traceability, clarity, and availability.

We will illustrate these nine dimensions and the different concepts we refer to in this publication with a straightforward example:

Arthur is in charge of sending marketing campaigns to clients and prospects to present his company’s latest offers. He encounters, however, certain difficulties:

  • Arthur sometimes sends communications to the same people several times.
  • The emails provided in his CRM are often invalid.
  • Prospects and clients do not always receive the right content.
  • Some information pertaining to the prospects are obsolete.
  • Some clients receive emails with erroneous gender qualifications.
  • There are two addresses for clients/prospects but it’s difficult to understand what they relate to.
  • He doesn’t know the origin of some of the data he is using or how he can access their source.

Below is the data Arthur has at hand for his sales efforts. We shall use them to illustrate each of the nine dimensions of Data Quality:

1. Completeness

Is the data complete? Is there information missing? The objective of this dimension is to identify the empty, null, or missing data. In this example, Arthur notices that there are missing email addresses:

To remedy this, he could try and identify whether other systems have the information needed. Arthur could also ask data specialists to manually insert the missing email addresses.

2. Accuracy

Are the existing values coherent with the actual data, i.e., the data we find in the real world?

Arthur noticed that some letters sent to important clients are returned because of incorrect postal addresses. Below, we can see that one of the addresses doesn’t match the standard address formats in the real world:

It could be helpful here for Arthur to use postal address verification services.

3. Validity

Does the data conform with the syntax of its definition? The purpose of this dimension is to ensure that the data conforms to a model of a particular rule.

Arthur noticed that he regularly gets bounced emails. Another problem is that certain prospects/clients do not receive the right content because they haven’t been accurately qualified. For example, the email address annalincoln@apple isn’t in the correct format and the Client Type Customer isn’t correct.

To solve this issue, he could for example make sure that the Client Type values are part of a list of reference values (Customer or Prospect) and that email addresses conform to a specific format.

4. Consistency

Are the different values of the same record in conformity with a given rule? The aim is to ensure the coherence of the data between several columns.

Arthur noticed that some of his male clients complain about receiving emails in which they are referred to as Miss. There does appear to be an incoherence between the Gender and Title columns for Lino Rodrigez.

To solve these types of problems, it is possible to create a logical rule that ensures that when the id Gender is Male, the title should be Mr.

5. Timeliness

Is the time lapse between the creation of the data and its availability appropriate? The aim is to ensure the data is accessible in as short a time as possible.

Arthur noticed that certain information on prospects is not always up to date because the data is too old. As a company rule, data on a prospect that is older than 6 months cannot be used.

He could solve this problem by creating a rule that identifies and excludes data that is too old. An alternative would be to harness this same information in another system that contains fresher data.

6. Uniqueness

Are there duplicate records? The aim is to ensure the data is not duplicated.

Arthur noticed he was sending the same communications several times to the same people. Lisa Smith, for instance, is duplicated in the folder:

In this simplified example, the duplicated data is identical. More advanced algorithms such as Jaro, Jaro-Winkler, or Levenshtein, for example, can regroup duplicated data more accurately.

7. Clarity

Is understanding the metadata easy for the data consumer? The aim here is to understand the significance of the data and avoid interpretations.

Arthur has doubts about the two addresses given as it is not easy to understand what they represent. The names Street Address 1 and Street Address 2 are subject to interpretation and should be modified, if possible.

Renaming within a database is often a complicated operation and should be correctly documented with at least one description.

8. Traceability

Is it possible to obtain traceability from data? The aim is to get to the origin of the data, along with any transformations it may have gone through.

Arthur doesn’t really know where the data comes from or where he can access the data sources. It would have been quite useful for him to know this as it would have ensured the problem was fixed at the source. He would have needed to know that the data he is using with his marketing tool originates from the data of the company data warehouse, itself sourced from the CRM tool.

9. Availability

How can the data be consulted or retrieved by the user? The aim is to facilitate access to the data.

Arthur doesn’t know how to easily access the source data. Staying with the previous schema, he wants to effortlessly access data from the data warehouse or the CRM tool.

In some cases, Arthur will need to make a formal request to access this information directly.

Get our Data Quality Management Guide for Data-Driven Organizations

For more information on Data Quality and DQM, download our free guide: “A Guide to Data Quality Management”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

Zen Edge Database and Ado.net on Raspberry Pi

Actian Corporation

March 31, 2022

data management words on a laptop screen

Do you have a data-centric Windows application you want to run at the Edge? If so, this article demonstrates an easy and affordable way to accomplish this by using the Zen Enterprise Database through Ado.net on a Raspberry Pi. Raspberry Pi features a 64-bit ARM processor, can accommodate several operating systems, and cost around $50 (USD).

These instructions use Windows 11 for ARM64 installed on a Raspberry Pi V4 with 8 GB RAM for this example. (You could consider using Windows 10 (or another ARM64-based board), but you would first need to ensure Microsoft supports your configuration.)

Here are the steps and results as follows.

  • Use the Microsoft-installed Windows emulation with Windows 11. ARM64bit for Windows 11 installer
  • After the installer finishes, the Windows 11 directory structure should look like the figure below:

  • The installer creates Arm, x86, and x64bit directories for windows simulation.
  • Next, run a .Net Framework application using Zen ADO.NET provider on Windows 11 for ARM64 bit on Raspberry Pi.

Once the framework has been established, create an ADO.NET application using VS 2019 on a Windows platform where Zen v14 was installed and running.

To build the simple application, use a C# Windows form application, as seen in the following diagram.

Name and configure the project and point it to a location on the local drive (next diagram).

Create a form and add two command buttons and text boxes. Name it “Execute” and “Clear,” and add a DataGridView as follows.

Add Pervasive.Data.SqlClient.dll under project solution references by selecting the provider from C:Program Files (x86)ActianZenbinADONET4.4 folder. Add a “using” clause in the program code as

using Pervasive.Data.SqlClient;.

Add the following code under the “Execute” button.

Add the following code under the “Clear” button.

Then, add the connection information and SQL statement to the text boxes added in the previous steps as follows.

Zen Edge

Now the project is ready to compile, as seen below.

Use a “localhost” in the connection string to connect to the local system where the Zen engine is running. This example uses the Demodata database “class” table to select data.

Se “Execute” will then return the data in the Grid as follows.

Now the application is ready to be deployed on Raspberry Pi. To do so, all copy the “SelectData.Exe” from the C:testSelectDataSelectDatabinDebug folder and Zen ADO.NET provider “Pervasive.Data.SqlClient.dll “. Copy it to a folder on Windows 11 for ARM64bit on Raspberry Pi.

Next, register the ZEN ADO.NET provider to the GAC using Gacutil as follows.

Gacutil /f /I <dir>Pervasive.Data.SqlClient.dll

Zen Edge Database

Run the SelectData app and connect to a remote server where ZEN engine is running as a client-server application.

Change the server name or IP address in the connection string to your server where the Zen V14 or V15 engine is running.

Now the Windows application is running in the client-server using Zen Ado.net provider on a Raspberry Pi with Windows 11 for Arm64 bit installed.

And that’s it!  Following these instructions, you can build and deploy a data-centric Windows 11 application on a Raspberry Pi ARM64.  This or similar application can run on a client or server to upstream or downstream data clients such as sensors or other devices that generate or require data from an edge database.  Zen Enterprise uses standard SQL queries to create and manage data tables, and the same application and database will run on your Microsoft Windows-based (or Linux) laptops, desktops, or in the Cloud.  For a quick tutorial on the broad applicability of Zen, watch this video.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What is the Difference Between Data Governance and Data Management?

Actian Corporation

March 31, 2022

Difference between Data Governance and Data Management

In a world where companies aspire to become data-driven, data management and data governance are concepts that must be mastered at all costs. Too often perceived as related or even interchangeable disciplines, the differences are important.

A company wanting to become data-driven must master the disciplines, concepts, and methodologies that govern the collection and use of data. Among those that are most often misunderstood are data governance and data management. 

On the one hand, data governance consists of defining the organizational structures of data – who owns it, who manages it, who exploits it, etc. On the other hand, data governance is about policies, rules, processes, and monitoring of indicators that allow for a sound administration of data throughout its life cycle (from collection to deletion).

Data management can therefore be defined as the technical application of the recommendations and measures defined by data governance.

Data Governance vs. Data Management: Their Different Missions

The main difference between data governance and data management is that the former has a strategic dimension, while the latter is rather operational.

Without data governance, data management cannot be efficient, rational, or sustainable. Indeed, data governance that is not restated into appropriate data management will remain a theoretical document or a letter of intent that will not allow you to actively and effectively engage in data-driven decision-making.

To understand what is at stake, it is important to understand that all the disciplines related to data are permanently overlapping and interdependent. Data governance is a conductor that orchestrates the entire system. It is based on a certain number of questions such as:

  • What can we do with our data?
  • How do we ensure data quality?
  • Who is responsible for the processes, standards, and policies defined to exploit the data?

Data management is the pragmatic way to answer these questions and make the data strategy a reality. Data management and data governance can and should work in tandem. However, data governance is mainly concerned with the monitoring and processing of all the company’s data, while data management is mainly concerned with the storage and retrieval of certain types of information.

Who are the Actors of Data Governance and Management?

At the top management level, the CEO is naturally the main actor in data governance, as they are its legal guarantor. But they are not the only one who must get involved.

The CIO (Chief Information Officer) plays a key role in securing and ensuring the availability of the infrastructure. However, constant access to data is crucial for the business (marketing teams, field salespeople) but also for all the data teams who are in charge of the daily reality of data management.

It is then up to the Chief Data Officer (CDO) to create the bridge between these two entities and break down the data silos in order to build agile data governance. He or she facilitates access to data and ensures its quality in order to add value to it.

And while the Data Architect will be more involved in data governance, the Data Engineer will be more involved in data management. As for the Data Steward, he or she is at the confluence of the two disciplines.

How Combining the Two Roles Helps Companies Become Data-Driven

Despite their differences in scope and means, the concepts of data governance and data management should not be opposed. In order for a company to adopt a data-driven strategy, it is imperative to reconcile these two axes within a common action. To achieve this, an organization’s director/CEO must be the first sponsor of data governance and the first actor in data management.

It is by communicating internally with all the teams and by continuously developing the data culture among all employees that data governance serves the business challenges while preserving a relationship of trust that unites the company with its customers.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

5 Product Values That Strengthen Team Cohesion & Experience

Actian Corporation

March 14, 2022

5 Product Values that Strangthen Zeeneas Team cohension customer experience

To remain competitive, organizations must make decisions quickly, as the slightest mistake can lead to a waste of precious time in the race for success. Defining the company’s reason for being, its direction, and its strategy makes it possible to build a solid foundation for creating an alignment – subsequently facilitating decisions that impact product development. Aligning all stakeholders in product development is a real challenge for Product Managers. Yet, it is an essential mission to bring up a successful product and an obvious prerequisite to motivate teams who need to know why they get up each morning to go to work.

The Foundations of a Shared Product Vision Within the Company

Various frameworks (NorthStar, OKR, etc.) have been developed over the last few years to enable companies and their product teams to lay these foundations, disseminate them within the organization, and build a roadmap that creates cohesion. These frameworks generally define a few key artifacts and have already given rise to a large body of literature. Although versions may differ from one framework to another, the following concepts are generally found:

  • Vision: The dream, the true North of a team. The vision must be inspiring and create a common sense of purpose throughout the organization.
  • The Mission: It represents an organization’s primary objective and must be measurable and achievable.
  • The Objectives: These define measurable short and medium-term milestones to accomplish the mission.
  • The Roadmap: A source of shared truth – it describes the vision, direction, priorities, and progress of a product over time.

With a clear and shared definition of these concepts across the company, product teams have a solid foundation for identifying priority issues and effectively ordering product backlogs.

Product Values: The Key to Team Buy-in and Alignment Over Time

Although well-defined at the beginning, these concepts described above can nevertheless fall into oblivion after a while or become obsolete! Indeed, the company and the product evolve, teams change, and consequently the product can lose its direction… A work of reconsideration and acculturation must therefore be carried out continuously by the product teams in order for it to last.

Indeed, product development is both a sprint and a marathon. One of the main difficulties for product teams is to maintain this alignment over time. In this respect, another concept in these frameworks is often under-exploited when it is not completely forgotten by organizations: product values.

Jeff Steiner, Executive Chairman at LinkedIn, particularly emphasized the importance of defining company values through the Vision to Values framework. LinkedIn defines values as “The principles that guide the organization’s day-to-day decisions; a defining element of your culture”. For example “be honest and constructive”, “demand excellence”, etc.

Defining product values in addition to corporate values can be a great way for product teams to create this alignment over time and this is exactly what we do at the Actian Data Intelligence Platform.

From Corporate Vision to Product Values: A Focus on a Data Catalog

Organization & Product Consistency

We have a shared vision – “Be the first step of any data journey” – and a clear mission – “To help data teams accelerate their initiatives by creating a smart & reliable data asset landscape at the enterprise level”.

We position ourselves as a data catalog pure-player and we share the responsibility of a single product between several Product Managers. This is why we have organized ourselves into feature teams. This way, each development team can take charge of any new feature or evolution according to the company’s priorities, and carry it out from start to finish.

If we prioritize the backlog and delivery by defining and adapting our strategy and organization according to the objectives, three problems remain:

  • How do we ensure that the product remains consistent over time when there are multiple pilots onboard the plane?
  • How do we favor one approach over another?
  • How do we ensure that a new feature is consistent with the rest of the application?

Indeed, each product manager has his or her own sensitivity, his or her own background. And if the problems are clearly identified, there are usually several ways to solve them. This is where product values come into play…

Actian Data Intelligence Platform’s Product Values

If the vision and the mission help us to answer the “why?”, the product values allow us to remain aligned with the “how?”. It is a precious tool that challenges the different possible approaches to meet customer needs. And each Product Manager can refer to these common values to make decisions, prioritize a feature or reject it, and ensure a unified & unique user experience across the product.

Thus, each new feature is built with the following 5 product values as guides:

Simplicity

This value is at the heart of our convictions. The objective of a Data Catalog is to democratize data access. To achieve this, facilitating catalog adoption for end users is key. Simplicity is clearly reflected in the way each functionality is proposed. Many applications end up looking like Christmas trees with colored buttons all over the place that no one knows how to use; others require weeks of training before the first button is clicked. The use of the Data Catalog should not be reserved to experts and should therefore be obvious and fluid regardless of the user’s objective. This value was reflected in our decision to create two interfaces for our Data Catalog: one dedicated to search and exploration, and the other for the management and monitoring of the catalog’s documentation.

Empowering

Documentation tasks are often time-consuming and it can be difficult to motivate knowledgeable people to share and formalize their knowledge. In the same way, the product must encourage data consumers to be autonomous in their use of data. This is why we have chosen not to offer rigid validation workflows, but rather a system of accountability. This allows Data Stewards to be aware of the impacts of their modifications. Coupled with an alerting and auditing system after the fact, it ensures better autonomy while maintaining traceability in the event of a problem.

Reassuring

It is essential to allow end-users to trust in the data they consume. The product must therefore reassure the user by the way it presents its information. Similarly, Data Stewards who maintain a large amount of data need to be reassured about the operations for which they are responsible: have I processed everything correctly? How can I be sure that there are no inconsistencies in the documentation? What will really happen if I click this button? What if it crashes? The product must create an environment where the user feels confident using the tool and its content. This value translates into preventive messages rather than error reports, a language type, idempotency of import operations, etc.

Flexibility

Each client has their own business context, history, governance rules, needs, etc. The data catalog must be able to adapt to any context to facilitate its adoption. Flexibility is an essential value to enable the catalog to adapt to all current technological contexts and to be a true repository of data at enterprise level. The product must therefore adapt to the user’s context and be as close as possible to their uses. Our flat and incremental modeling is based on this value, as opposed to the more rigid hierarchical models offered on the market.

Deep Tech

This value is also very important in our development decisions. Technology is at the heart of our product and must serve the other values (notably simplicity and flexibility). Documenting, maintaining, and exploiting the value of enterprise-wide data assets cannot be done without the help of intelligent technology (automation, AI, etc.). The choice to base our search engine on a knowledge graph or our positioning in terms of connectivity are illustrations of this “deep tech” value.

The Take Away

Creating alignment around a product is a long-term task. It requires Product Managers – in synergy with all stakeholders – to define from the very beginning: the vision, the mission, and the objectives of the company. This enables product management teams to effectively prioritize the work of their teams. However, to ensure the coherence of a product over time, the definition and use of product values  are essential. With the Actian Data Intelligence Platform, our product values are simplicity, autonomy, trust, flexibility and deep-tech. They are reflected in the way we design and enhance our Data Catalog and allow us to ensure a better customer experience over time.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Security

Hybrid Cloud Security

Actian Corporation

March 7, 2022

Hybrid Cloud Security padlock

One of the biggest fears of cloud adoption is the security of organizational data and information. IT security has always been an issue for all organizations, but the thought of not having total control over corporate data is frightening. One of the factors for organizations not moving everything to the cloud and adopting a hybrid cloud approach is security concerns. Hybrid cloud security architectures still have security risks related to a public cloud; however, hybrid cloud risks are higher simply because there are more clouds to protect. The trust boundary is extended beyond the organization for access to its essential critical data with hybrid cloud architectures.

Sensitive data can be kept off the public cloud to help manage risk. Doing so today may be helpful, but hybrid cloud solutions are integrations between public and private clouds. This integration without the appropriate security could still make your private cloud solution vulnerable to attacks originating from the public cloud. Secure hybrid clouds have significant benefits to organizations today. Along with the great benefits of the cloud are the negative aspects and challenges faced with securing the organizations’ data. The negative aspects are continually being addressed to help realize the incredible benefits that hybrid cloud architectures can provide for organizations today.

What is Hybrid Cloud Security?

Organizational IT infrastructures have increased in complexity, especially with hybrid cloud implementations. This complexity, combined with the benefits of cloud having characteristics of broad network access, and on-demand everywhere access capabilities, complicates how securing a hybrid cloud can be done. Securing the data, applications, and infrastructure internally and externally from hackers’ malicious adversary tactics and inadvertent, unintentional activities are compounded.

Many cloud vendors have adopted industry compliance and governance security standards, especially those created by the US government, to ease the security threats and risks that an organization may experience in the cloud. The Federal Risk and Authorization Program (FedRAMP) provides standards and accreditations for cloud services. The Security Requirement Guide (SRG) provides security controls and requirements for cloud service in the Department of Defense (DOD). These standards and others help cloud vendors and organizations improve their hybrid cloud security.

Securing the cloud, an organization should consider the cloud architecture components that consist of applications, data, middleware, operating systems, virtualization, servers, storage, and networking components. Security concerns are specific to the service type. Organizations have a shared responsibility with the cloud service provider for security with hybrid cloud security.

The responsibility for hybrid cloud security should include specific disciplines. Some essential discipline areas for managing risk and securing hybrid cloud are:

  • Physical controls to deter intruders and create protective barriers to IT assets are just as important as cybersecurity for protecting assets.
    • Security parameters, cameras, locks, alarms.
    • Physical controls can be seen as the first line of defense for protecting organizational IT assets. Not only from security threats but from overall harm from environmental challenges.
    • Biometrics (one or more fingerprints, possibly retina-scans) where system access ties to extremely sensitive data.
  • Technical controls.
    • Cloud patching fixes vulnerabilities in software and applications that are targets of cyber-attacks. Besides overall keeping systems up to date, this helps reduce security risk for hybrid cloud environments.
    • Multi-tenancy security each tenant or customer is logically separated in a cloud environment. This means each tenant has access to the cloud environment, but the boundaries are purely virtual, and hackers can find ways to access data across virtual boundaries if resources are improperly assigned and data overflows from one tenant can impinge on another. Data must be properly configured and isolated to avoid interference between tenants.
    • Encryption is needed for data at rest and data in transit. Data at rest is sitting in storage, and data in transit, going across the network and the cloud layers (SaaS, PaaS, IaaS). Both have to be protected. More often than not, data at rest isn’t encrypted because it’s an option that is not turned on by default.
    • Automation orchestration is needed to remove slow manual responses for hybrid cloud environments. Monitoring, checking for compliance, appropriate responses, and implementations should be automated to eliminate human error. These responses should also be reviewed and continuously improved.
    • Access controls – People and technology accesses should always be evaluated and monitored on a contextual basis including date, time, location, network access points, and so forth. Define normal access patterns and monitor for abnormal patterns and behavior, which could be an alert to a possible security issue.
    • Endpoint security for remote access has to be managed and controlled. Devices can be lost, stolen, or hacked, providing an access point into a hybrid cloud and all of its data and resources. Local ports on devices that allow printing or USB drives would need to be locked for remote workers or monitored and logged when used.
  • Administrative controls to account for human factors in cloud security.
    • Zero trust architecture (ZTA), principles and policy continually evaluate trusted access to cloud environments to restrict access for only minimum privileges. Allowing too much access to a person or technology solution can cause security issues. Adjustments to entitlements can be made in real-time, for example, is a user suddenly downloading far more documents? Are those documents outside his or her normal scope of work or access?  Of course, this requires data governance that includes tagging and role-based access that maps entitlements to tagging.
    • Disaster recovery – Performing business impact analysis (BIA) and risk assessments are crucial for performing disaster recovery and deciding how hybrid cloud architectures should be implemented. Including concerns related to data redundancy and placement within a cloud architecture for service availability and rapid remediation post attack.
    • Social engineering education and technical controls for phishing, baiting, etc. Social engineering is an organizational issue and a personal issue for everyone.  Hackers can steal corporate data and personal data to access anything for malicious purposes.
    • A culture of security is critical for organizations. The activities of individuals are considered one the most significant risk to the organization. Hackers target their access to any organization through the organization’s employees as well as partners and even third-party software vendors and services contractors. The employees, contractors, and partners need to be educated continuously to help avoid security issues that can be prevented with training and knowledge.
  • Supply chain controls.
    • Software, infrastructure, and platform from 3rd parties have to be evaluated for security vulnerabilities. Software from a 3rd party supplier, when installed, could have security vulnerabilities or have been hacked that allow criminals complete access to an organization’s hybrid cloud environment. Be sure to check how all 3rd party software vendors approach and practice safe security controls over their products.

Security in the cloud is a shared responsibility that becomes more complex as deployments are added. Shared Services are a way to deliver functions such as security, monitoring, authorization, backups, patching, upgrades, and more in a cost-effective, reliable way to all clouds. Shared services reduce management complexity and are essential to achieve a consistent security posture across your hybrid cloud security architecture.

Configuration Management and Hybrid Cloud Security

Hybrid cloud security architecture risks are higher simply because there are more clouds to protect. For this reason, here are a few extra items that you should put on your hybrid cloud security best practices list, including visibility, shared services, and configuration management. First, you can’t secure what you can’t see. Hybrid cloud security requires visibility across the data center and private and public cloud borders to reduce hybrid cloud risks resulting from blind spots.

Another area to focus on is configuration management since misconfigurations are one of the most common ways for digital criminals to land and expand in your hybrid cloud environments. Encryption isn’t turned on, and access hasn’t been restricted; security groups aren’t set up correctly, ports aren’t locked down. The list goes on and on. Increasingly, hybrid cloud security teams need to understand cloud infrastructure better to secure it better and will need to include cloud configuration auditing as part of their delivery processes.

One of the Hybrid cloud security tools that can be utilized is a Configuration Management System (CMS) using configuration management database (CMDB) technology as the foundation that can help organizations gain visibility into hybrid cloud configurations and the relationships of all cloud components. The first activity with a CMS involves discovering all cloud assets or configuration items that make up the services being offered. At this time, a snapshot of the environment is made with essential details of the cloud architecture. Once discovering their hybrid cloud architecture, many organizations immediately look for security concerns that violate security governance.

Once the CMS is in place, other hybrid cloud security tools such as drift management and monitoring changes in the cloud architecture can alert to cloud attacks. Once the unauthorized drift is detected, other automation tools to correct and alert can be implemented to counterattack the attack. The CMS and the CMDB support cloud security operations and other service management areas, such as incident, event, and problem management, to help provide a holistic solution for the organization’s service delivery and service support.

Conclusion

Security issues in hybrid cloud computing aren’t that different from security issues in cloud computing. You can review the articles on Security, Governance, and Privacy for the Modern Data Warehouse, Part 1 and Part 2, that provide a lot of pointers on how to protect your data and cloud services.

Hybrid cloud security risks and issues will be one of those IT organizational business challenges that will be around for a long time. Organizations need to stay informed and have the latest technologies and guidance for combating the hybrid cloud security issues and threats. This includes partnering with hybrid cloud solution providers such as Actian. It is essential for the organization’s ability to function with consistently changing cloud security needs.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Interview With Ruben Marco Ganzaroli – CDO at Autostrade per l’Italia

Actian Corporation

March 3, 2022

We are pleased to have been selected by Autostrade per l’Italia – a European leader among concessionaires for the construction and management of toll highways – to deploy the Actian Data Intelligence Platform’s data catalog at the group level. We took this opportunity to ask Ruben Marco Ganzaroli a few questions, who joined the company in 2021 as a Chief Data Officer to support the extensive Next to Digital program, to digitally transform the company. A program with a data catalog as its starting point.

Q: CDOs are becoming critical to a C-level team. How important is data to the strategic direction of Autostrade per l’Italia?

Data is at the center of the huge Digital Transformation program started in 2021, called « Next to Digital », which aims at transforming Autostrade per l’Italia into a Sustainable Mobility Leader. We wanted to protect whoever is traveling on our highways, execute decisions faster, as well as be agile and fluid. We not only want to react immediately to what is happening around us, but also to be able to anticipate events and take action before they occur. The Company was started in the early 50s – last century, and we realized that all the data we collected throughout the years could be a unique advantage and a strong lever to transform the company.

Q: What are the main challenges you want to address by implementing a data catalog in your organization?

We think that only the business functions of the Autostrade group can truly transform the company into a data-driven one. To do this, business functions need to be supported by the right tools – efficient and usable – and they must be fully aware of the data they have available. Ideas, and therefore value, are generated only if you have a clear idea of ​​the environment in which you are moving within, and the objective you are aiming for. If, without knowing it, you have a gold bar under your mattress, you will sleep uncomfortably and realize that you could do something to improve your situation – probably by changing mattresses, for example. However, if you are aware that you have that gold bar, you will lift the mattress, take the bar, and turn it into a jewel – maximizing its value.

The data catalog builds the bridge between business and data at Autostrade. It is the tool that allows business users to have knowledge on the fact that there are many gold bars available and to know where they can be found.

Q: What features were you looking for in a data catalog and that you found in the platform?

From a business perspective, a data catalog is the access point to all data. It must be fast, complete, easy to understand and user friendly, and represent a lever (not an obstacle). Business users must not be forced to spend the majority of their time on it. Whereas from an IT perspective, a data catalog must be agile, scalable, as well as quickly and continuously upgradeable as data is continuously being ingested or created.

Q: What is your vision of a data catalog in the data management solutions’ ecosystem?

We don’t think of the catalog as a tool, but as a part of the environment we need, as IT, to make available to the business functions. This ecosystem naturally includes tools, but what’s also important is the mindset of its users. To lead this mindset change, business functions must be able to work with data, and that’s the reason Self-BI is our main goal for 2022 as CDO Office. As mentioned previously, the catalog is the starting point for all of that. It is the door that lets the business in the data-room.

Q: How will you drive catalog adoption among your data teams?

All leaders from our team, Leonardo B. for the Data Product, Fulvio C. for Data Science, Marco A. and Andrea Q. for Data Engineering and Cristina M. as Scrum (super)Master are focused on managing the program. This program foresees an initial training phase for business users, an on-the-job dedicated support and an on-the-room support. Business users will participate in the delivery of their own analysis. We will onboard business functions incrementally, to focus the effort and maximize the effectiveness of each business function. The goal is to onboard all business functions within 2022: it represents a lot of work, but is made easier by knowing that there is a whole company behind that supports us and strongly believes that we are going in the right direction.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Architecture

Enterprise Data Warehouse

Teresa Wingfield

March 2, 2022

Enterprise Data Warehouse

Do you need a data warehouse or Enterprise Data Warehouse (EDW) in your organization today? Its functional units, people, services, and other organizational assets support the vision and mission of an organization. An organization can have strategies, tactics, and operational activities that are performed to meet the overall vision and mission of the organization for service to its customers. To be high-performing today requires utilizing the power of data supported by innovations in information technology. People and advanced technologies are making decisions every day to support the organization’s success. Data drives decisions.

Today, one of the essential tools is an Enterprise Data Warehouse to support effective decisions using a single source of truth across the organization. The organization has to work as a high-performing team, exchanging data, information, and knowledge for decisions, and the EDW plays a central role. As organizations move their IT to the Cloud, the EDW is also transforming and moving there as well, further improving the organization’s business decision-making.

What is an Enterprise Data Warehouse?

An Enterprise Data Warehouse is a central repository of data to support a value chain of business practice interactions between all functional units within the organization. Also known as a EDW Data Warehouse, where data is collected from multiple sources and normalized for data-driven insightful analytical decisions across the organization for the services and products delivered and supported for its customers. Data is transformed into information, information into knowledge, and knowledge into decisions for analytics and overall Business Intelligence (BI). Enterprise Data Warehouse Reporting capabilities take advantage of the EDW to provide the organization with needed business and customer insights.

Enterprise Data Warehouse architecture makes use of an Extract, Transform, and Load (ETL) process to ingest, consolidate, and normalize data for organizational use. The data is modeled based on decisions that need to be made by the stakeholders and put in a consistent format for usage and consumption with integrated technologies and applications. The organization’s sales, marketing, and other teams can use the EDW for an end-to-end perspective of the organization and the customers that are being serviced. Enablement of the EDW utilizing the cloud is a plus because of the power of cloud technologies today in making data accessible anywhere and anytime.

The basic Enterprise data warehouse requirements that make up the EDW system include but are not limited to the following components:

  • Data sources – databases, including a transactional database, and other files with various formats
  • Data transformation engine – ETL tools (typically an external third-party tool)
  • The EDW database repository itself
  • The database administration for creation, management, and deletion of data tables, views, and procedures
  • End-user tools to access data or perform analytics and business intelligence

Leveraging data from multiple data silos into one unified data repository that contains all business data is powerful. An EDW is a database platform of multidimensional business data that different parts of the organization can use. The EDW has current and historical information that can be easily modified, including the model to support changes in business needs. EDWs can support additional sources of data quickly without redesigning the system. As the organization learns how to use the data and gives feedback, the solution can transform rapidly to support the organization and the data stakeholders. As the organization matures, so can the data in the EDW rapidly mature.

Enterprise Data Warehouse vs. Data Mart

An Enterprise Data Warehouse becomes a single source of truth for organizational decisions that need collaboration between multiple functional areas in the organization. The EDW can be implemented as a one-tier architecture with all functional units accessing the data in the warehouse. EDW can also be implemented with the addition of Data Marts (DM). The difference between DM and EDW is the DW is much smaller and focused than an enterprise data warehouse. Enterprise Data Warehouse services are also for the entire organization, whereas a Data Mart is usually for a single line of business within the organization.

Data Marts contain domain or unique functional data, such as only sales data or marketing data. The data mart can extend the usage of the EDW using a two-tier architecture leveraging on-premise and/or the cloud capabilities that use the EDW as a source of data for specific use cases. Data marts typically involve integration from a limited number of data sources and focus on a single line of business or functional unit. The size of a data mart is in gigabytes versus terabytes for an EDW. Data Marts do not have to use an EDW as a data source but can use other sources specific to needs.

Organizations may want to use a data mart or multiple data marts to help increase the security of the EDW by limiting access to only domain-specific data through the data mart if using a two-tier architecture. An organization may also use the data mart to reduce the complexity of managing access to EDW data for a single line of business.

Choosing between EDW and a data mart does not have to be one or the other. Both are valuable. Remember, the outcome is to provide data for high performing decision support within the organization. EDW helps bring the bigger organization perspective for delivering and supporting business services. Data marts can complement the EDW to optimize performance and data delivery. Overall, enterprise-wide performance for decisions, reporting, analytics, and business intelligence is best done with a solution that spans the organization. A complete end-to-end value view of customers, products, tactics, and operations that support the organizational vision and mission will benefit everyone in the organization, including the customers.

Data Marts are easier and quicker to deploy than an EDW and cost less. A line of business can derive value quickly with a solution that can be deployed faster with a limited scope, fewer stakeholders, less modeling, and integration complexity than an EDW. The data mart will be designed specifically for that line of business to support their ability to work in a coordinated, collaborative way within their function. This can help create a competitive advantage against competitors by enabling better data analytics for decision support within a specific line of business or functional unit.

Enterprise Data Warehouse and the Cloud

Cloud Enterprise Data Warehouse (EDW) takes advantage of the value of the cloud in the same manner as many other cloud services that are becoming the norm for many organizations. The EDW itself may be better suited to reside in the cloud instead of on-premise. The cloud provides:

  • The flexibility to build out and modify services in an agile manner.
  • The potential to scale almost infinitely.
  • The assurance of enhanced business continuity.
  • The ability to avoid capital expenditures (CapEx).

Organizations can still choose to architect hybrid-cloud solutions for EDW that take advantage of on-premise organizational capabilities and vendor cloud capabilities. EDW should be planned using expertise focused on organizational constraints and business objectives for best long-term solutions that can take advantage of the ease of use with continuous improvement of the EDW solution. This use of expertise includes using Data Marts in the solution for maximum benefit to the organization.

Conclusion

EDWs architecture can be challenging for bringing the organization’s data into one database, especially all simultaneously. Organizations should design for the big picture and deploy incrementally, starting with specific business challenges or specific lines of business. This will create patterns of success for improving the next increment. This will also help with the faster delivery of a solution that can benefit the organization without the complete solution being finished.

In many instances, an organization can’t simply rely on silos of line-of-business data marts. They need enterprise data warehouse reporting to get a complete view of customers, products, operations, and more to make decisions that best benefit the whole company. Yes, enterprise data warehouse architectures can be painful. In most instances, you can deploy incrementally, starting with specific domains or business challenges. This will help you deliver value faster and evolve into the holistic purpose your EDW is intended to serve.

The power of having a cross-organizational repository of meaningful data to enable better decision-making and overall better service delivery and support for the customer outweigh the challenges with the architecture. An organization that does this successfully will gain improved marketability, sales, and overall better relationships with its customers. The business data insights will also enable the organization to position its internal assets more appropriately based on the improvements in data insights, analytics, and business intelligence.

Managing and utilizing data for an organization has to be done effectively, efficiently, and economically for value. Data is the organization’s lifeblood that supports the long-term viability of the organization itself. An organization that is not informed and does not view data as a point of contention for business service performance and decisions may find themselves optional in the marketplace. An EDW can help with the organization’s current and future business decision needs.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Management

5 Uses Cases for Hybrid Cloud Data Management

Actian Corporation

February 25, 2022

Hybrid Cloud Data Management

With the rise of cloud computing, many organizations are opting to use a hybrid approach to their data management. Even though many companies still rely on on-premises storage, the benefits of having cloud storage as a backup or disaster recovery plan can be significant. This post will give you five of the most popular use cases for hybrid cloud data management.

Why Hybrid Cloud Data Management?

Hybrid cloud data management isn’t a new concept, but it’s finally starting to hit its stride as a viable option for enterprise data management.  It utilizes a mixture of on-premises and cloud storage and cloud computing to handle all aspects of a company’s data needs. Often, it’s the merger of on-premises databases or enterprise data warehouses (EDW) with cloud storage, SaaS application data and/or a cloud data warehouse (CDW). The benefits of this hybrid approach are twofold: it provides a backup plan for disaster recovery situations, and it gives an organization the ability to scale up as needed without purchasing additional hardware.

Backup and Disaster Recovery

One of the most obvious benefits of hybrid cloud data management is that it provides a backup for your data. If your on-premises storage system fails or you lose some important data, you can rely on your cloud storage to get it back. It will act as an additional fail-safe plan in case anything happens to your on-site server.

Data Accessibility

Data is not just one homogeneous entity. Many companies can feel hampered by data access. They may not have the in-house expertise or budget to handle the IT demands of data storage and real-time access. Through a hybrid cloud environment, your business can access data and applications stored in both on-premises and off-site locations. Global companies can store data closer to applications or users to improve processing time and reduce latency without having to have local data centers or infrastructure.

Data Analytics

Currently, many businesses are combining internal data sources with external data sources from partners or public sources for improved data analytics. A hybrid data warehouse can allow data teams to combine this third-party data with internal data sources to gain greater insights for decision-making. Data engineers can reduce the amount of effort required to source and combine data needed for users to explore new analytical models.

Data Migration

When an organization migrates their storage to the cloud, they can take advantage of public, private, and hybrid cloud solutions. This means utilizing a host of services, including backup storage, disaster recovery solutions, analytics, and more. All while paying less money on infrastructure costs and avoiding large capital expenses.

Data Compliance

The adoption of a hybrid data warehouse can relieve some of the compliance burdens that can often accompany stored data. For example, retired systems may leave behind orphaned databases, often with useful, historic data. This can create a data gap for analytic teams, but it can also pose a security and compliance risk for the business. Cloud service providers have teams of experts that work with governments and regulators globally to develop standards for things such as data retention times and security measures. Additionally, leveraging the cloud for data storage can also help address the challenges of data residency and data sovereignty regulations, which can become complex as data moves across geographical boundaries.

Regardless of where you are on your cloud journey, data is the most valuable asset to any organization. The cloud is an increasingly important component as businesses look for ways to leverage their data assets to maintain competitive advantage. Learn more about how the Actian Data Platform is helping organizations unlock more value from their data.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Leader

Does Your Organization Have a Data Platform Leader? It Could Soon.

Teresa Wingfield

February 17, 2022

data platform leader

There’s no one-size-fits-all solution for a modern data platform, and there likely never will be with the proliferation of multiple public and private cloud environments, entrenched on-premises data centers, and the exponential rise in edge computing – data sources are multiplying almost at the rate of data itself.

Today’s data platforms increasingly take a broad multi-platform approach that incorporates a wide range of data services (e.g. data warehouse, data lake, transactional database, IoT database and third-party data services),  and integration services that support all major clouds and on-premise platforms and applications that run on and across these environments. Modern data platforms need a data fabric – technology that enables data that is distributed across different areas to be accessed in real-time in a unifying data layer,  – to drive data flow orchestration, data enrichment, and automation To meet the varied requirements of users across an organization including data engineers, data scientists, business analysts and business users, the platform should also incorporate shared management and security services, as well as support a wide range of application development and analytical tools.

However, these needs create a singular challenge: who’s going to manage the creation and maintenance of such a platform? That’s where the role of the platform leader comes in. Just as we’ve seen the creation of roles like Chief Data Officer and Chief Diversity Officer in response to critical needs, organizations require a highly skilled individual to manage the creation and maintenance of their platform(s). Enter the data platform leader – someone with a broad understanding of databases and streaming technologies, as well as a practical understanding of how to facilitate frictionless access to these data sources, how to formulate a new purpose, vision and mission for the platform and how to form close partnerships with analytics translators. We’ll get to those folks in a minute.

Developing a New Purpose, Vision and Mission

Why must a data platform leader develop a new purpose, vision and mission? Consider this: data warehouse users have traditionally been data engineers, data scientists and business analysts who are interested in complex analytics. These users typically represent a relatively small percentage of an organization’s employees. The power and accessibility of a data platform capable of running not just in the data center, but also in the cloud or at the edge, will invariably bring in a broader base of business users who will use the platform to run simpler queries and analytics to make operational decisions.

However, accompanying these users will be new sets of business and operational requirements. To satisfy this ever-expanding user base and their different requirements, the data platform leader will need to formulate a new purpose for the platform (why it exists), a new vision for the platform (what it hopes to deliver) and a new mission (how will it achieve the vision).

Facilitating Data Service Convergence

Knowledge of relational databases with analytics-optimized schemas and/or analytic databases has long been part of a data warehouse manager’s wheelhouse. However, the modern data platform extends access much further, enabling access to data lakes and transactional and IoT databases, and even streaming data. Increasing demand for real-time insights and non-relational data that can enable decision intelligence are bringing these formerly distinct worlds closer together. This requires the platform leader to have a broad understanding of databases and streaming technologies as well as a practical understanding of how to facilitate frictionless access to these data sources.

Enabling Frictionless Data Access

A data warehouse typically includes a semantic layer that represents data so end users can access that data using common business terms. A modern data platform, though, demands more. While a semantic layer is valuable, data platform leaders will need to enable more dynamic data integration than is typically sufficient to support a centralized data warehouse design. Enter the data fabric to provide a service layer that enables real-time access to data sourced from the full range of the data platform’s various services. The data fabric offers frictionless access to data from any source located on-premises and in the cloud to support the wide range of analytic and operational use cases that such a platform is intended to serve.

Working With Analytics Translators

I mentioned earlier that data platform leaders would need the ability to form close partnerships with analytics translators. Let’s start with what an analytics translator does and then we’ll get to why a close relationship is important.

According to McKinsey & Company, the analytics translator serves the following purpose:

“At the outset of an analytics initiative, translators draw on their domain knowledge to help business leaders identify and prioritize their business problems, based on which will create the highest value when solved. These may be opportunities within a single line of business (e.g., improving product quality in manufacturing) or cross-organizational initiatives (e.g., reducing product delivery time).”

I expect the analytics translator and the data platform leader will become important partners. The analytics translator will be invaluable in establishing data platform priorities, and the platform leader will provide the analytics translator with key performance indicators (KPIs) on mutually-agreed-upon usage goals.

In conclusion, the data platform leader has many soft and hard skillset requirements in common with a data warehouse manager, but there are a few fundamental and significant differences. The key difference includes developing a new purpose, vision and mission, having expertise in new data services and data fabrics, knowing how best to access those services, and possessing the ability to form close partnerships with analytics translators.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Intelligence

What Makes a Data Catalog “Smart”? #5 – User Experience

Actian Corporation

February 16, 2022

smart-data-catalog-5-user-experience

A data catalog harnesses enormous amounts of very diverse information, and its volume will grow exponentially. This will raise 2 major challenges:

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
  • How to find the most relevant datasets for any specific use case?

A data catalog should be Smart to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect, we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

  1. Metamodeling
  2. The data inventory
  3. Metadata management
  4. The search engine
  5. User experience

A data catalog should also be smart in the experience it offers to its different pools of users. Indeed, one of the main challenges with the deployment of a data catalog is its level of adoption from those it is meant for: data consumers. And user experience plays a major role in this adoption.

User Experience Within the Data Catalog

The underlying purpose of user experience is the identification of personas whose behavior and objectives we are looking to model in order to provide them with a slick and efficient graphic interface. Pinning down personas in a data catalog is challenging – it is a universal tool that provides added value for any company regardless of its size, across all sectors of activity anywhere in the world.

Rather than attempting to model personas that are hard to define, it’s possible to handle the situation by focusing on the issue of data cataloging adoption. Here, there are two user populations that stand out:

  • Metadata producers who feed the catalog and monitor the quality of its content – this population is generally referred to as Data Stewards.
  • Metadata consumers who use the catalog to meet their business needs – well will call them Users.

These two groups are not totally unrelated to each other of course: some Data Stewards will also be Users.

The Challenges of Enterprise-Wide Catalog Adoption

The real value of a data catalog resides in large-scale adoption by a substantial pool of (meta) data consumers, not just the data management specialists.

The pool of data consumers is very diverse. It includes data experts (engineers, architects, data analysts, data scientists, etc.), business people (project managers, business unit managers, product managers, etc.), compliance and risk managers. And more generally, all operational managers are likely to leverage data to improve their performances.

Data Catalog adoption by Users is often slowed down for the following reasons:

  • Data catalog usage is sporadic. They will log on from time to time to obtain very specific answers to specific queries. They rarely have the time or patience to go through a learning curve on a tool they will only use periodically – weeks can go by between catalog usage.
  • Not everyone has the same stance on metadata. Some will focus more on technical metadata, others will focus heavily on the semantic challenges, and others might be more interested in the organizational and governance aspects.
  • Not everybody will understand the metamodel or the internal organization of the information within the catalog. They can quickly feel put off by an avalanche of concepts that feel irrelevant to their day-to-day needs.

The Smart Data Catalog attempts to jump these hurdles in order to accelerate catalog adoption. Here is how the Actian Data Intelligence Platform meets these challenges.

How the Actian Data Intelligence Platform Facilitates Catalog Adoption

The first solution is the graphic interface. The Users’ learning curve needs to be as short as possible. Indeed, the User should be up and running without the need for any training. To make this possible, we made a number of choices.

The first choice was to provide two different interfaces, one for the Data Stewards and one for the Users:

Studio: The management and monitoring tool for the catalog content – an expert tool solely for the Data Stewards.

Explorer: For the Users, it provides them with the simplest search and exploration experience possible.

Our approach is aligned with the user-friendly principles of marketplace solutions – the recognized specialists in catalog management (in the general sense). These solutions usually have two applications on offer. The first, a “back office” solution, which enables the staff of the marketplace (or its partners) to feed the catalog in the most automated manner possible and control its content to ensure its quality. The second application, for the consumers, usually takes the form of an e-commerce website and enables end-users to find articles or explore the catalog. Studio and Explorer reflect these two roles.

The Information is Ranked in Accordance With the Role of the User Within the Organization

Our second choice is still at the experimental stage and consists in dynamically adapting the information hierarchy in the catalog according to User profiles.

This information hierarchy challenge is what differentiates a data catalog from a marketplace type catalog. Indeed, a data catalog’s information hierarchy depends on the operational role of the user. For some, the most relevant information in a dataset will be technical: location, security, formats, types, etc. Others will need to know the data semantics and their business lineage. Others still will want to know the processes and controls that drive data production – for compliance or operational considerations.

The Smart Data Catalog should be able to dynamically adjust the structure of the information to adapt to its different prisms. 

The last remaining challenge is the manner in which the information is organized in the catalog in the form of exploration paths by theme (something similar to shelving in a marketplace). It is difficult to find a structure that agrees with everybody. Some will explore the catalog along technical lines (systems, applications, technologies, etc.). Others will explore the catalog from a more functional perspective (business domains), others still from a semantic angle (through business glossaries, etc.).

The challenge of having everyone agree on a sole universal classification seems (to us) insurmountable. The Smart Data Catalog should be adaptable and should not ask Users to understand a classification that makes no sense to them. Ultimately, user experience is one of the most important success factors for a data catalog.

For more information on how a Smart search engine enhances a Data Catalog, download our eBook: What is a Smart Data Catalog?”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.