Data Intelligence

What is Data Lineage?

Actian Corporation

September 13, 2021

data lineage cover blog

In order to access and exploit your data assets on a regular basis, your organization will need to know everything about your data! This includes its origins, transformations over time, and overall life cycle. All of this knowledge can be gathered from Data Lineage.

In this article, we will define Data Lineage, give an analogy, and explain its main benefits for data-driven organizations. 

After human resources, data has become the most valuable asset for businesses today. 

It is the foundation that links companies, clients, and partners together. Knowing this, data must be preserved and leveraged as it contains all of an organization’s intelligence.

However, with great information comes great responsibility for those who manage or use this data. On one hand, they must identify the data that reveals strategic insights for the company, and on the other, they must implement the right security measures to prevent devastating financial and reputational consequences. 

With the arrival of data compliance laws such as the BCBS-239 or the GDPR, the person in charge (usually the DPO) of data compliance must put in place transparent conditions to ensure that no data will be exploited to the detriment of a customer. 

This is where Data Lineage intervenes. Behind the word lineage lies an essential concept: data traceability. This traceability covers the entire life cycle of the data, from its collection to its use, storage, and preservation over time.

How Data Lineage Works

As mentioned above, the purpose of Data Lineage is to ensure the absolute traceability of your data assets. This traceability is not limited to knowing the source of information. It goes much further than that.

To understand the nature of lineage information, let’s use a little analogy:

Imagine that you are dining in a gourmet restaurant. The menu includes dishes with poetic names, composed of many more or less exotic ingredients, some of which are foreign to you. When the waiter brings you your plate, you taste, appreciate, and wonder about the origin of what you are eating.

Depending on your point of view, you will not expect the same answer.

As a fine cuisine enthusiast, you will want to know how the different ingredients were transformed and assembled to obtain the finished product. You will want to know the different steps of preparation, the cooking technique, the duration, the condiments used, the seasoning, etc. In short, you are interested in the most technical aspects of the final preparation: the recipe.

As a controller, you will focus more on the complete supply and processing chain: who the suppliers are, places and conditions of breeding or cultivation of raw products, transport, packaging, cutting and preparation, etc. You will also want to make sure that this supply chain complies with the various labels or appellations that the restaurant owner highlights (origin of ingredients, organic, “home-made”, AOC, AOP, etc.).

Others may focus on the historical and cultural dimensions – from what region or tradition is the dish derived or inspired from? When and by whom was it originally created? Others (admittedly rarer) will wonder about the phylogenetic origin of the breed of veal prepared by the chef…

In short, when it comes to gastronomy, the question of origin does not wait for a unique and homogeneous answer. And the same is true for data.

Indeed, With Data Lineage, You Will Have Access to a Real-Time Data Monitoring Tool

Once collected, the data is constantly monitored in order to:

  • Detect and monitor any errors in your data processing.
  • Manage and continuously monitor all process changes while minimizing the risks of data degradation.
  • Manage data migrations.
  • Have a 360° view on metadata.

Data Lineage ensures that your data comes from a reliable and controlled source, that the transformations it has undergone are known, monitored, and legitimate, and that it is available in the right place, at the right time and for the right user.

Acting as a control tool, the main mission of Data Lineage is to validate the accuracy and consistency of your data.

How do you do this? By allowing your employees to conduct research on the entire life cycle of the data, both upstream and downstream, from the source of the data to its final destination, in order to detect and isolate any anomalies and correct them.

The Main Advantages of Data Lineage

The first benefit of Data Lineage has to do with compliance. It helps identify and map all of the data production and exploitation processes and limits your exposure to the risk of non-compliance of personal data. 

Data Lineage also facilitates data governance because it provides your company and its employees with a complete repository describing your data flows and metadata. This knowledge is essential to design a 100% operational data architecture. 

Data Lineage makes it easier to automate the documentation of your data production flows. So, if you are planning to increase the importance of data in your development strategy, Data Lineage will allow you to save a considerable amount of time in the deployment of projects where data is key. 

Finally, the last major benefit of Data Lineage concerns your employees themselves. With data whose origin, quality and reliability are guaranteed by Data Lineage, they can fully rely on your data flows and base their daily actions on this indispensable asset. 

Save time, guarantee the compliance of your data, make the action of your teams more fluid while inscribing your company in a new dimension, based on an uncompromising data strategy. Don’t wait any longer, get started now.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Integration

Data Integration Architecture – What is it and Why Does it Matter?

Actian Corporation

September 8, 2021

Data Integration Concept

Data is everywhere. In any organization, you will find data in multiple applications, databases, data warehouses, and in many cases, public and private clouds. Usually, data does not belong to a specific group within an organization and is often shared across teams and applications.

Similar to a professional sports team, each function in the organization should have a specific role, sharing data and information in real-time to support better outcomes.

In order to efficiently share data, businesses need to focus on integrating their data in an automated and timely manner. This can be a challenge when functional units use multiple applications and store data in multiple locations. Organizations need a data integration architecture to connect secondary to primary data sources, normalize the information, and automate the flow of information without manual interventions.

What is Data Integration Architecture?

Data integration architecture defines the flow of data between IT assets and organizational processes to enable interoperability between systems. Today, data is everywhere, often stored in multiple formats in a complex non-integrated fashion. This means users spend far more time searching for data and information instead of using it to make better business decisions. When data is shared manually, obtaining knowledge for decision support becomes cumbersome and impacts customers and business performance.

Creating a data integration architecture allows integration of disparate data and provides normalization to enable faster decision support. The underpinning data and information used by functional units must be systemized and architected to enable collaborative decisions and faster innovation.

Creating a data integration architecture does not mean combining all data sources into one data source, such as a giant database or a data warehouse. It does mean understanding data relationships enabling interoperability between the systems and tools used across the organization. Data integration architecture helps define the flow of data between internal and external people and technologies. This helps remove data silos and enable accurate data usage across the organization.

Data integration design consists of mapping primary systems and secondary systems. Secondary systems feed data and information to the primary systems. The primary system can vary across functional units, but the data needs to remain consistent across the organization. Each functional unit in an organization can have a specific primary perspective based on their job function and the decisions they have to make. Some secondary systems will always be secondary systems. The overall architecture has to consider the users of the systems and the data sources that need to be accessed. In other words, enterprises need a single source of truth.

The Need for Data Integration Architecture

Data integration needs architecture to map, reconcile, and deliver data across multiple sources, often with different expressions. The architecture should understand the source of the data and help reconcile and normalize the data for use. This helps enable better overall communication between functional units in the organization and improves service performance.

Integration architecture management can be done from multiple perspectives.

  • Service-oriented data integration architectures (SOA).
  • Operational data integrations looking at key performance indicators (KPI) from multiple related operational processes.

All types of data can be segmented into a specific area with its architecture, data model, scope, and details. Organizations should understand the value of data integrations for decision support and knowledge management.

Examples of Data Integration Architecture

There are many starting points for the creation of a data integration architecture. Organizations can begin with a single functional unit or set of applications. Investigating what data sources are used to make decisions helps map data sources into primary and secondary use cases.

Examples of data integration architectures are:

  • Configuration Management Database (CMDB) feeding Configuration Management System (CMS) feeding Service Knowledge Management System (SKMS).
  • Marketing systems feeding into a Customer Relationship Management System (CRM) or Enterprise Resource Planning (ERP) application.
  • Moving SharePoint data into a Knowledge Management System (KMS).
  • Multiple data sources feeding an application.

Using a data integration architecture can also help with technology consolidation, saving money, time and improving the performance of functional units within the organization. Many times, duplicate sources of information may be discovered that cause inconsistent decisions and degrade the performance of the business. The organization should apply Lean principles when performing data integration architecture projects.

Actian and Data Integration Architecture

Actian is a leader in data management, including data integration architecture. Our data solutions enable organization performance and reduce the risk of manual processes. Actian helps ensure that business-critical enterprise information is effectively harnessed for real-time service delivery success no matter where it resides.

Specific enterprise data integration solutions are:

  • DataConnect – Highly scalable hybrid integration solution that enables you to quickly and easily design, deploy and manage integrations on-premises and in the cloud.
  • DataFlow – Provides a parallel execution platform for real-time processing of data-in-motion. Accelerate the analysis, extraction, transformation, and loading of data across the business.
  • Business Xchange – Fully managed B2B integration service that enables electronic data interchange (EDI) to exchange procurement and supply documents.

Contact us today to discuss how we can help your organization become higher-performing with your data and information.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

The Data Catalog: An Essential Solution for Metadata Management

Actian Corporation

September 6, 2021

data-catalog-metadata-management

Your company produces or uses more and more data? To better classify, manage, and give meaning to your data, there must be order. By putting in place rigorous metadata management, with the help of a data catalog, you can gain relevance and efficiency.

Companies are producing more and more data. To the point where processing and exploitation capacities can be undermined, not because of a lack of knowledge, but rather because of a lack of organization. When data volumes explode, data management becomes more complex.

To put it all in order, metadata management becomes a central issue. 

What is Metadata and How Do You Manage It?

Metadata is used to describe the information contained in data: source, type, time, date, size, … The range of metadata that can be attached to data is vast.

Without metadata, your data is decontextualized; it loses its knowledge and becomes difficult to classify, order, and value. But because they are so numerous and disparate, you must be able to master this mountain of information.

Metadata management is becoming an essential practice to ensure that it is up-to-date, accurate, and accessible. To meet the challenge of optimal metadata management, it is essential to rely on a Data Catalog.

Data Catalog: What is it For?

A data catalog is a bit like the index of a gigantic encyclopedia. Because the data you collect and manage daily is diverse by nature, it must be classified and identified. Otherwise, your data portfolio would become an unfathomable mess from which you would not derive any added value.

We define a data catalog as:

A detailed inventory of all of an organization’s data assets and their metadata, designed to help data professionals quickly find the most appropriate information for any business and analytical purpose.

A Data Catalog is a Pillar of Metadata Management Through the Following Features

Data Dictionary

Each piece of data collected or used is described in such a way that it can be put into perspective with others. This metadata thesaurus is a pillar of efficient and pragmatic exploitation of your data catalog. By referencing all of your company’s data in a Data Dictionary, the Data Catalog helps optimize accessibility to information even if the user does not have access to the software concerned.

Metadata Registry

A dynamic metadata repository intervenes at all levels: from the dataset to the data itself. For each element, this metadata registry can include a business and technical description, give you information on its owners, have quality indicators or even help create a taxonomy (properties, tags, etc.) for your items.

The Data Search Engine

Your data catalog will allow you to access your data through its integrated search features. All the metadata entered in the registry can be searched from the data catalog search engine. Searches can be sorted and filtered at all levels.

Data Catalog and Metadata: The Two Pillars of Data Excellence

There’s no need to try to oppose having a data catalog and the concept of metadata management because they simply go hand in hand. 

A Data Catalog is a kind of repository that cannot be ignored to standardize all the metadata that are likely to be shared in your company. This repository contributes to a detailed understanding and documentation of all your data assets.

But beware! The integration of a data catalog is a project that requires rigor and method. To begin this project and unleash your data potential, start by conducting a complete audit of your data and proceed in an iterative manner.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Security

Security, Governance and Privacy for the Modern Data Warehouse, Part 2

Teresa Wingfield

August 26, 2021

a data cloud securely communicating with multiple networks

Part 2: Cloud Service Security

In my previous post on modern data warehouse security, we looked at issues related to database security. This post will focus on the separate, but related issue of cloud service security, which is increasingly important since nearly half of all data warehouses are hosted in the cloud.* The economics of the cloud, the agility with which you can build out and modify services, and the potential to scale almost infinitely continue to accelerate the adoption of cloud data warehouses.

As I did in my database security post, I’ll run through some of the key questions you should ask when evaluating cloud service security. As noted before: you really do need to look at both data warehouse security and cloud service security, as without both you are left with security vulnerabilities you do not need. Answers detailed below and summarized in the Cloud Service Diagram on the right include key cloud service security features offered by our hybrid cloud data warehouse service, the Actian Data Platform, that strengthen security, governance and privacy.

How Can I Ensure That My Data Warehouse is Isolated From Threats?

Since the cloud data warehouse is under constant attack from threats that are often difficult to detect, a data warehouse should provide a variety of isolation and access control techniques to keep it secure from bad actors. These include:

  • Limiting data warehouse access to specific IP ranges with an “IP Allow” list using Classless Inter-Domain Routing (CIDR). CIDR is a set of Internet protocol standards used to create unique identifiers for networks and individual devices.
  • Having only a single tenant for your data warehouse (more on this in a minute).
  • Using the cloud service’s virtual private cloud (VPC) to isolate a private network.
  • Restricting administrative access to metadata and provisioning, management, and monitoring information with platform access control.

How Can I Ensure That Other Cloud Service Customers Can’t Access My Data Warehouse?

Make sure your data warehouse provider has configured your data warehouse as a single-tenant solution. In a single-tenancy architecture, dedicated infrastructure supports a single instance of a software application that is used by just one customer. Since nothing is shared with other tenants, there is less opportunity for other tenants to access the data in your data warehouse.

Are My Cryptographic Keys Safe?

Leading cloud service providers offer secure and robust key management services (KMSs) that use a hardware security module (HSM)—a hardened, tamper-resistant device—to manage, process and store cryptographic keys. HSMs ensure the safety of keys through strong access control systems. A cloud-ready data warehouse should be able to, at a minimum, leverage the KMS of its underlying cloud provider to encrypt and decrypt data.

How Can I Ensure Secure Data Sharing with Customers, Affiliates, and Trusted Partners?

Nowadays, organizations are extending data warehouse access to their customers, affiliates, and trusted partners to collaborate on opportunities to drive revenue, cut costs and increase efficiency. Data warehouse support for a federated single sign-on (SSO) service provides a highly secure, low-maintenance way to share data with trusted external organizations. Each external organization maintains and manages its own identities and links these through a third-party enterprise identity provider (IdP) that centralizes management and governance of permissions and authentication.

How Can I Make Sure My Users’ Communication With the Data Warehouse is Private and Secure?

By encrypting messages at both ends of a conversation, end-to-end encryption (E2EE) prevents anyone in the middle from reading private communications. Data warehouse support for E2EE is vital to help stop man-in-the-middle (MiTM) attacks that interrupt a data transfer.

Am I Protected Against Unauthorized Access? 

All technology service or SaaS companies that store customer data in the cloud should perform an audit using the System and Organization Controls 2 (SOC 2) framework, which will validate whether their organizational controls and practices effectively safeguard the privacy and security of customer and client data. The American Institute of Certified Public Accountants (AICPA) developed the SOC 2 framework which focuses on security controls related to availability, privacy, processing integrity, and confidentiality. Successful completion of a SOC 2 audit shows that the data warehouse can keep sensitive organizational and client data secure.

Next Steps:

That’s a wrap on modern data warehouse security at the cloud service level. If you haven’t already done so, check out my blog on Securing data within the data warehouse is just as important as securing the cloud services supporting the cloud data warehouse. Naturally, you should also check out the Actian data warehouse, which offers all the key security features enumerated above to provide you with a highly secure and extraordinarily powerful cloud data warehouse platform.

Finally, for other tips on data modernization, watch Actian’s on-demand webcast where you’ll hear from Jim Curtis, 451 Group’s resident expert on Data Modernization, as well as Actian’s Raghu Chakravarthi and Paul Wolmering. Raghu is Actian’s SVP of R&D, who formerly ran Teradata’s Big Data Group, and Paul is Actian’s VP of Solution Engineering, who previously led field engineering teams at Netezza.

* TDWI Best Practices Report | Building the Unified Data Warehouse and Data Lake | Transforming Data with Intelligence

 

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Intelligence

Data Curation: Essential for Enhancing Your Data Assets

Actian Corporation

August 24, 2021

data curation

Having large volumes of data isn’t enough: it’s what you make of it that counts. To make the most out of your data, you need to distill a real data culture within your company. The foundation of this culture is data curation.

0% of the world’s data has been created in the last two years. With the exponential growth of connected devices, companies will be confronted with the unfortunate reality that our ability to create data will far surpass our ability to manage and exploit it. 

And it’s not going to get any better! According to estimates published in Statista’s Digital Economy Compass 2020, the annual volume of digital data created globally has increased more than 20-fold over the past decade and will surpass the 50 zettabyte mark by 2021.

In this context, it is not surprising that most companies are currently only able to analyze 12% of the data they have at their disposal. Because, behind the collection, storage, and security of data, there is, above all, the business value that can be derived from it.

This is the challenge addressed by the concept of Data Curation: the essential step to exploit the potential of an organization’s abundant data assets. 

The Definition of Data Curation

According to the definition given by the INIST (Institut de l’Information Scientifique et Technique), which is attached to the CNRS, 

“Curation refers to all the activities and operations necessary for the active management of digital research data throughout its life cycle. The objective is to make them accessible, shareable, and reusable in a sustainable way. Three stakeholders can be identified in the data life cycle: the creators, most often researchers, the “curators,” and the users.”

In other words, data curation is a task that consists of identifying in a data catalog those that can be valorized and exploited, and, for a second time, putting them at the disposal of users likely to draw the best lessons from them.

To set up an efficient and relevant Data Curation, you need to start with a precise mapping of the available data. This initial mapping is the basis for pragmatic and operational data governance.

Once the rules of governance have been established, it is towards the data user that all attention must be focused. Data is a mineral that is only worthwhile if it is properly valued. This valuation must be thought of as a response to the user’s needs. It is the latter who is at the origin of the data curation project. 

An iterative and continuous process for data exploitation, distinct from all the tasks essential to data governance (from quality management to data protection and even data life cycle management).

Data Curation: Essential Prerequisites, Undeniable Benefits

Data Curation is a perspective of rapid and massive development of a data culture within an organization.

The creation of a data management and curation strategy allows you to take stock of the data produced. It is then possible to select the most relevant data and enrich it with the metadata necessary to understand and reuse it, including by business users.

Everyone in the company can then base their choices, decisions, strategies and methods on the systematic use of data, without having to have specific skills.

The objective: Creating the conditions for systematic use of data as a basis for any project or approach, and not to limit its use to Data Science or data expert teams.

To effectively deploy your data curation strategy, you must therefore rely on elements that are essential to the proper management of your data assets. The heart of the reactor is not limited to data catalogs. 

If they are essential and directly result from your data map, metadata governance plays an even more crucial role. Metadata makes it easier for users to interact with data portfolios in natural language. 

With data curation, get into a data-driven dynamic for good.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Security

Security, Governance and Privacy for the Modern Data Warehouse, Part 1

Teresa Wingfield

August 23, 2021

Database Security

Database Security: Part 1

Did you know that security breaches involving the cloud now surpass breaches involving on-premises infrastructure? According to the Verizon 2021 Data Breach Investigation Report, 73% of cybersecurity incidents involved external cloud assets—compared with 27% only one year ago.* Having just joined Actian from a cloud security provider, even I was surprised by this tremendous surge.

This new insight has motivated me to review the security requirements for a modern cloud data warehouse. The topic turns out to be broad, so I’m writing this blog in two parts: part 1 focuses on database security; part 2 focuses on cloud service security. They’re related but separate issues, and you won’t have truly robust security in the cloud unless you address both.

Let’s start with a series of key questions that will help determine how secure the data in your cloud data warehouse really is. The critical database security features described below and summarized in the diagram on the right are all standard features of the Actian Data Platform.

Database Security Questions:

Who Can Connect to the Data Warehouse?

At a minimum, a cloud data warehouse should require users to authenticate their identity using a unique login ID and password. Single sign-on (SSO) using Oauth allows a user to log into multiple independent software systems with one ID and password. Data warehouse support for SSO helps eliminate password fatigue for users and, more importantly, it can help you ensure that your organization’s identity policies are extended to protect your data warehouse. For added authentication protection, you can leverage your identity provider (IdP) to use multifactor authentication (MFA). This adds an extra level of security by requiring multiple methods of authentication (such as a password and a one-time numeric code sent via SMS to a known mobile phone number) before allowing a user to access the data warehouse.

What Can Users See and Do?

The answer better not be “everything”. Your data warehouse should have access control features that limit who can read, write to, and update different data in the warehouse.

One discretionary approach to securing direct access to the data warehouse involves the use of an “IP Allow” list. This list limits access to those users with approved device IP addresses. When used in conjunction with the user authentication methods discussed above, an IP Allow list provides a valuable layer of additional security by precluding access to the data warehouse when a user’s credentials are correct, but the device being used is not recognized. An unscrupulous individual may have stolen a legitimate user’s credentials but won’t be able to use them to access the data warehouse unless the individual also has stolen a computer whose IP address appears on the IP Allow list.

A non-discretionary approach to access control is exemplified by Role-Based Access Control (RBAC). RBAC is a technique that grants access to specified resources based on an individual’s role in an organization. RBAC does more than just simplify user administration; it can also help enforce the principle of least privilege where users have only the privileges they need for their job function. This helps comply with privacy and confidentiality regulations. A grant statement can narrow down exactly what user or role is granted a privilege.

How Can I Ensure That Users See Only the Data They Should?

This kind of data access control can be enabled by dynamic data masking. This is a process by which original data is dynamically occluded by modified content. Dynamic masking is often used to protect fields containing personally identifiable information, sensitive personal data, or commercially sensitive data. Because sensitive data masking is a mandatory requirement for achieving PCI DSS, GDPR, and HIPAA compliance, it is a must for data warehouse security in industries governed by these regulations.

Who Owns/Has Control Over What?

Just as no individual should have access to everything in a data warehouse, no individual should be able to control everything in a data warehouse. What an organization needs is a database server option ensuring role separation. The idea is based on the principle of separation of duties where more than one person should be involved when completing critical or sensitive tasks. For example, role separation in the data warehouse could require that the person who determines what to audit (DBSSO) must be different from the person who monitors the audit trail (AAO), and both must be different from the person who is responsible for the operations of the database server (the DBSA).**

By requiring individuals with distinct roles to work together to complete critical or sensitive tasks, you create a checks-and-balances mechanism that can reduce security risks and facilitate compliance with regulatory mandates such as SOX, HIPAA, PCI DSS, and GDPR—as well as with industry regulations such as ISO 17799.

How Would I Know If My Warehouse Experiences a Database Security Event?

Two data warehouse features can help here. Start with audit logs. These form a critical part of data protection and compliance because they record all or specified classes of security events for the entire data warehouse installation. Selected classes of events, such as use of database procedures or access to tables, can be recorded in the security audit log file for later analysis. Criteria can be selected that apply to a single object or across an entire class of installation objects.

The second key feature is a security alarm capability, which enables you to specify the events to be recorded in the security audit log for individual tables and databases. Using security alarms, you can place triggers on important databases and tables. If any user attempts to perform an access operation that is not normally expected, the security alarm will raise an alert.

How Can I Protect My Data at Rest and in Motion?

Data encryption scrambles data into “ciphertext” that makes it unreadable to anyone who doesn’t have a decryption key or password. Encryption of data at rest (that is, stored in a database) and in motion (in transit on a network) is needed to protect data everywhere.

Data at rest in the cloud provider’s filesystem should generally be protected by AES 256-bit encryption. Additionally, some data may warrant different privacy or security measures. To meet these needs, your data warehouse should offer the flexibility for both full database encryption and individual column encryption. The data warehouse should also support the rekeying of encryption keys where the entire data warehouse is decrypted and then re-encrypted with a new encryption key as recommended by NIST guidelines. This is a valuable way to limit the amount of time a bad actor can use a stolen key to access your data warehouse.

Because data in motion (data transported from one location to another) is vulnerable to man-in-the-middle (MiTM) attacks, it should be encrypted while in motion to prevent interception.

Is Data Being Handled Responsibly?

I’ll be discussing this in more detail in my next blog, but here are a few details: A data warehouse that operates in compliance with the System and Organization Controls 2 (SOC 2) framework demonstrates that your data warehouse maintains a high level of information security. A data warehouse vendor that has passed on-site audits shows that it has taken all steps necessary to keep the data warehouse, and the valuable information contained therein, safe from breaches.

Next Steps:

As noted earlier, there’s much more to say on this topic, and I will continue this thread in my next blog post on cloud service security. Meanwhile, you can learn more about Actian here.

* Verizon 2021 DBIR Master’s Guide

** HCL Software, Using Role Separation

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Platform

Semi-Structured Data: What it is and Why it Matters

Actian Corporation

August 15, 2021

Actian extending data

Semi-structured data is emerging as a critical element of business operations and strategies. Typically, business leaders make decisions based on analysis of data stored in forms, spreadsheets, and relational databases – in other words, structured data. However, in a modern business environment, constraining data with forms and tables is no longer sufficient.

What is Semi-Structured Data?

While structured data is the most common type of business data to be analyzed, it is not the most common type of information. Structured data represents only 5% to 10% of the information that modern businesses need to deal with regularly.

Most of the data that most businesses deal with is unstructured data, predominantly text and images. The many documents, email messages, photos, and social media posts we generate are all examples of unstructured data.

If you consider structured data as one end of a continuum and unstructured data as the other end, everything in between is semi-structured data. The amount of this type of data is growing, driven by new tools such as machine learning (ML) and new data formats such as JavaScript Open Notation (JSON).

Why Semi-Structured Data Matters

Much of the data that we once considered unstructured is better treated as semi-structured data. Unlike unstructured data, which is difficult to mine for business value, semi-structured data is easier to collate, query, and analyze. Semi-structured data, supported by a custom data model, can better support sound business decision-making and generate greater business value than unstructured data.

Many businesses are evolving from a focus on specific products or customers to a recognition that they are parts of one or more networks of products and services. This change in focus is driving a need for business intelligence beyond what can be derived from internal data sources. The outputs from external data sources that explore the marketplace and a business’s position within that marketplace are often in the form of semi-structured data. Analyzing data trends is essential if a business is to transition from analyzing what was to gaining insight and foresight about what needs to be.

Analysis of semi-structured data can also provide significant input to business process management. Business processes are often constrained by limitations imposed by data collection and analysis. When combined with semi-structured data and goal-driven behavior, the business processes can be more easily adapted to markets and even market segments, and more responsive to customer needs and conditions. The more a business can access and analyze semi-structured data, the more that business can refine its processes.

The improved insights gained from the analysis of new data sources like semi-structured data help business leaders to develop more efficient operations and improve the chance of success of strategic initiatives. These advantages can lead to new competitive advantages.

Data Storage Considerations

Multiple factors are driving the need for additional data storage and processing. In the Business-to-Consumer (B2C) world, there is an ever-increasing use of digital devices to connect to a business. This means more direct data to collect, store, and analyze, as well as increased opportunities to collect secondary data. Feedback forms, surveys and similar tools generate additional focused information. All this data tends to be semi-structured.

Most structured data can be stored, managed and analyzed with a relational database management system (RDBMS). For simple, one-table data, a spreadsheet can suffice. Regardless of your chosen management tool, you must be able to create data models that conform to that tool’s table format. As business data grows in volume and variety of forms, it becomes increasingly difficult to fit all data into a structured, relational mold.

Learn More About Semi-Structured Data

A hybrid cloud data warehouse such as Actian makes it easy to work with semi-structured data by natively ingesting JSON data and supporting it within a relational database.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Integration

Data Migration: A Roadmap to Success

Actian Corporation

August 13, 2021

3d rendering robot learning or machine learning

Organizations initiate data migration projects for several reasons. They might need to overhaul an entire system, upgrade databases, establish a new data warehouse, support a cloud migration roadmap, or merge new data from an acquisition or other source. Data migration is also necessary when deploying another system that resides next to incumbent applications. Regardless of the exact purpose for migration, the goal is generally to enhance business performance and competitiveness.

Achieving your migration goals can be difficult. But with a solid migration strategy and implementation approach and the right set of tools, you will be well-positioned for data migration success.

Why You Should Have a Data Migration Plan and Strategy

A strategic data migration plan should include consideration of these critical factors:

Know Your Data

To have a successful migration, before migrating your data, you must know (and understand) what you’re migrating, as well as how it fits within the intended destination system. Understand how much data is moving and what that data looks like. Ask yourself what needs to migrate, what can be left behind and what might be missing. If an organization skips this “source review” step and assumes an understanding of the data, the result could be wasted time and money. Worse, an organization could encounter a critical flaw in the data mapping that halts any progress in its tracks.

Ensure Data Quality

Once you identify any issues with your source data, they must be resolved. Otherwise, potentially fatal disruptions of your migration plans could result. This may require additional software tools and third-party resources because of the scale of the work.

Define and Design the Migration

The design phase is where organizations define the data migration strategy and implementation approach – “Big Bang” (all at once) or “trickle” (a bit at a time). Your data migration plans should also include details of the technical architecture of your chosen solution and of the migration processes. By first considering the design of the strategy, the data to be moved and the destination system, you can begin to define timelines and unearth any project concerns. By the end of this step, your whole project should be documented.

Maintain and Protect Your Data

Data degrades over time and can often become unreliable. This means there must be controls in place to maintain data quality, before, during and after any integration and migration projects are undertaken.

Don’t Forget Security

During planning, it’s important to consider security plans for the data. Any data that must be protected should have protection included throughout the plan.

Build the Migration Solution

 It can be tempting to approach migration with a “just-enough” development approach. However, since you will only implement once, it’s crucial to do it correctly. A common tactic is to separate the data into subsets and build one category at a time, followed by a test. If an organization is working on a particularly large migration, then it might make sense to build and test in parallel.

Conduct a Live Test

The testing process isn’t completed after testing the code during the build phase. It’s important to test the data migration design with real data to ensure the accuracy of the implementation and completeness of the solution.

Deploy, Then Audit

After final testing, proceed with implementation as defined in the plan.  Once the implementation is live, establish a system to audit the data to ensure the accuracy of the migration.

Govern Your Data

Tracking and reporting on data quality is important because it enables a better understanding of data integrity. Clear, easy-to-follow processes and highly automated tools can greatly ease data governance and ensure greater data integrity after any successful data migration.

How a Hybrid Integration Platform Can Aid Your Migration

A hybrid data integration platform, such as Actian DataConnect, can make the process of data migration much easier and lower the risk of business-disrupting connectivity issues. With DataConnect, instead of managing numerous point-to-point interactions between applications, connections to source systems are managed through a central platform, much like a telephone switchboard. The number of connections to source systems is reduced because all consuming services and applications share the same connection managed by DataConnect. That means fewer sets of credentials to manage, fewer points of potential failure or disruption, and the ability to consolidate management of the flow of data across your organization and IT environment.

As a hybrid integration platform, Actian DataConnect can manage connections with systems both on-premises and in the cloud. By connecting your applications through DataConnect before you start your migration or cloud migration project, you can shift configuration tasks from the migration window and test to make sure that they work ahead of time.

During the data migration event, data connections for migrated apps can be updated to point to whatever application environment is active by simply updating the configuration in the DataConnect dashboard. If connection issues are encountered, Actian DataConnect enables you to see where the problems are and correct them quickly. After the data migration is completed, DataConnect can help you verify that dependencies on on-premises applications have been eliminated and the applications can be safely decommissioned.

Before you plan your next data migration project, consider how Actian Data Connect can help you accelerate migration timelines and improve the likelihood of a successful and effective data migration project. To learn more, visit DataConnect.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Analytics

Data: The Beating Heart of Healthcare

Actian Corporation

August 4, 2021

Data: the Beating Heart of Healthcare

Payers, health plans, and integrated health systems run on data. Integration, processing, and analytics are integral to their business process and managing of costs and risk.

The role of payers in the healthcare industry has been expanding for years. As care models have shifted from fee-for-service to coordinated and value-based care, payers and providers alike have been generating—and relying upon—more and more data to run operations smoothly. This evolution had been underway for some time, but the Covid-19 pandemic abruptly put new pressure on the system, heightening the need and requirements for better interoperability and data sharing between payers, providers, members, and patients.

Once, payers focused primarily on claims processing, payments, and plan membership management. But that was in a simpler time. Now, payers are focused on managing the relationship between members and providers and on delivering better patient health outcomes for members. They’re also focused on delivering better experiences for members and providers alike as new competitors are appearing on the horizon from unexpected sectors. Large organizations like Walmart, Amazon, and Google, are expanding their own operational profiles and starting to compete as payers in their own right. In some cases, these organizations are even taking on the role of healthcare providers and competing with traditional providers.

All these changes are putting considerable strain on payers. Add to these the erosion in commercial plan revenues—accelerated by Covid-19 through an uptick in self-care as well as increases in Medicaid and consumer direct purchasing of plans through the Affordable Care Act exchanges—which has reduced revenues and disrupted financial forecasting. During 2020, between 35 and 45% of Americans delayed or opted not to receive care, citing numerous concerns, including the pandemic-driven layoffs, both temporary and long-term, that hit many workers whose healthcare was tied to their employment.

Driving Better Outcomes With Data

To meet these challenges and to gain more predictability, to lower costs and risk while improving member and patient outcomes, organizations throughout the healthcare industry are trying to turn to data and analytics. According to a Society of Actuaries (SOA) survey, more than 90% of payer and provider executives say predictive analytics is key to effective healthcare management.

However, the goal of integrating the systems housing the relevant data and then deriving actionable insight from an analysis of this consolidated pool of data can be costly and elusive. IT is often overburdened, already tasked with integrating cumbersome legacy systems and data sources, including claims systems, walled-in provider data (EMR/EHR), connect operational systems (ERP), member and patient engagement (CRM) systems, external data such as SDOH, and new technologies such as wearables and virtual care systems. Nor are these long-needed integration projects the only demands weighing down IT. CMS and OMB are pressing ahead industry mandates to open up data access through newly finalized rules for data sharing—not only to patients but also across the healthcare ecosystem. Initially introduced by the Cares Act in response to Covid-19, these new rules require the industry to enable member access to data and facilitate interoperability between payers and health systems. Deadlines for compliance begin in 2021.

While these rules will, in theory, benefit payers, members, and providers by ensuring greater and more standardized access to data across the healthcare ecosystem, the work to achieve compliance can almost completely tie up an organization’s IT resources. This can make it even more difficult for data analysts, actuaries, and fraud and claims analysts to dive into these deep pools of data and extract actionable insights, because consolidating and accessing all this data in advance of analysis still requires the assistance of IT, which now has even less bandwidth to support users.

But there is a solution that can accelerate the integration and data sharing goals facing IT and that can enable an organization’s data scientists and analysts to consolidate and analyze critical healthcare data themselves, without having to heavily rely on the support of the IT team.

Enter the Healthcare Data Analytics Hub

The Actian Healthcare Data Analytics Hub enables payers, providers, and others in the healthcare ecosystem to gain greater insights and drive better outcomes with data. This SaaS-based hub includes native integration with the powerful data extraction, transformation, and load (ETL) features of Actian DataConnect, which enable an organization to automate and accelerate the process of accessing, pulling, and qualifying relevant data from a wide range of internal and external systems. Because connectors have already been built to link many of the systems and data repositories that healthcare organizations use, the integration work that IT has already begun becomes dramatically easier to complete. Similarly, the IT team can build the data access APIs described by CMS and ONC right in the Data Analytics Hub, ensuring that data can be shared securely and that member data sharing preferences can be both tracked and enforced across the healthcare data fabric, regardless of source or destination. The Actian Healthcare Data Analytics Hub also ensures that the IT organization can comply with the rules requiring that the systems and networks supporting both the sharing and protection of personal data stay on top of constantly evolving HIPAA and other regulatory requirements.

Beyond helping healthcare IT organizations to accelerate the completion of their integration and data sharing projects, the Data Analytics Hub can all but eliminate the need for an organization’s end users to petition IT for support when striving to perform their analytics. Self-service, native integration tools in the Actian Data Platform make it easy for healthcare data analysts, actuaries, provider network, revenue cycle management, and fraud and claims analysts to access and integrate data sets on their own, without help from IT—and without impacting the existing systems of record. The Actian Data Platform supports all popular analytic and visualization tools, so users can use the tools with which they are already familiar to gain the insights needed to automate and optimize processes, improve outcomes, and drive a more satisfying experience for all parties involved.

HCA Use Case Mapping Diagram

The result? Having an intelligent, end-to-end healthcare data analytics hub enables IT organizations to accelerate completion of the critical integration and data access projects they are already committed to completing. It also empowers an organization’s data scientists and analysts to perform the critical work that they have been tasked to perform by removing the roadblocks preventing them from gaining the insights they are trying to discover and act upon. An Actian Healthcare Data Analytics Hub enables an organization to shift from siloed models of business and operations to models that are forward-looking and collaborative. An organization can better manage the move to value-based care, find ways to compete more effectively against new competitors (many of whom are less burdened with a legacy IT infrastructure and can move nimbly in seizing new opportunities), improve the experiences of and alignment with members and partners alike, and, ultimately, improve outcomes, save time, and reduce costs.

About Actian

Actian has helped hundreds of payers, providers, clearinghouses, and healthcare technology organizations automate data integration, processing, and analysis, enabling them to capitalize on real-time analytics to derive essential insights and automate processes in an ever-evolving healthcare data landscape. Actian helps these organizations apply analytics and data processing to address a breadth of use cases from claims processing to population health insights and compliance.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Management

The Top 10 Benefits of an Operational Data Warehouse for 2021

Actian Corporation

August 1, 2021

operational data warehouse

The previous blogs in this series discussed the top 5 pitfalls of traditional operational data warehouses and defined the Operational Data Warehouse (ODW) as a potential solution.  Below is a list of my Top 10 desirable benefits of an effective ODW:

Current. Continuous data updates via “micro-batches” or streamed singleton updates throughout the day provide the most current information for analytics-based decision-making.

Fast. Changes to ODW data need to be made with the lowest performance penalty. Columnar data blocks that maintain their min-max value metadata eliminate the overhead of creating indexes that need to be updated with every change, as traditional row-based databases do. The ability to make better business decisions faster can translate into multiple data warehouse benefits.

Scalable. An effective enterprise data warehouse must be scalable in two dimensions. Vertical scalability enables workloads to take advantage of more CPU and storage capacity on a single system. When you have saturated the hardware capacity of a single system, the ability to scale horizontally to a cluster of systems provides the ability to grow the ODW to handle larger databases and more users. The ability to increase capacity as demand grows is a key advantage of a modern data warehouse.

Secure. The explosive growth of cybercrime and increased regulation of data privacy means that even “internal” systems must be secured. A good ODW must offer built-in support for advanced encryption, auditing, role-based security and data masking.

Flexible. The days when an organization could standardize on a single computing platform are over. The ODW needs to offer the flexibility to be deployed on-premises (on Linux, Windows, or Hadoop Clusters) or in the cloud (on AWS, Microsoft Azure and beyond).

Consistent. Some databases sacrifice query integrity for speed. A good ODW needs to provide row-level locking and full read consistency for running queries even as the underlying data changes.

Robust. A key advantage of a modern data warehouse is the ability to deliver enterprise-level resiliency and manageability. This translates to having an ODW with solid back-up, recovery, failover and replication capabilities.

Economical. Several factors can affect the total cost of ownership (TCO) for a specific database technology being used to support a particular business case. One is the ability to run standard servers to avoid esoteric appliances. Others include offering flexible deployment models to match different business needs, flexibility to scale up and down according to performance requirements, and the option to use different sized components (compute, storage) to optimize operating efficiencies.

Interoperable. A good ODW needs to provide open application programming interfaces (APIs) such as those that support Open Database Connectivity (ODBC) and American National Standards Institute Structured Query Language (ANSI SQL). These are necessary to enable the data warehouse to work with the multitude of query tools an organization might use. Many organizations use more than 20 different visualization and query tools.

Connected. The ability to ingest data at high speed is a critical ODW requirement. If you cannot load your data in a reasonable time, the result is having to work with summary data or worse, using stale data.

I would be very interested to hear which benefits you value the most or others I could have included? Email me at Pradeep.bhanot@actian.com if you would like to share your views.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.