Data Intelligence

What is Data Fingerprinting and Similarity Detection?

Actian Corporation

December 3, 2019

data-fingerprinting

With the emergence of Big Data, enterprises found themselves with a colossal amount of data. In order to understand and analyze their data, as well as meet the various regulatory requirements, it is vital for organizations to document their data assets. However, documenting and giving context to thousands of datasets is a very difficult, even impossible, task to do by hand.

Or, you can use Data Fingerprinting.

What is Data Fingerprinting?

In the data domain, a fingerprint represents a “signature”, or fingerprint, of a data column. The goal here is to give context to these columns.

Via this technology, a Data Fingerprint can automatically detect similar datasets in your databases and can document them more easily, making data stewards’ tasks less tedious and more efficient. For example, supervised by the data steward, data fingerprinting technologies allow us to understand that a column of data with the information “France”, “United States”, and “Australia” represents “Countries”.

Data Fingerprinting

In the Actian Data Intelligence Platform’s case, our metadata management platform’s objective is to give meaning and context to your cataloged datasets in the most automatic way possible. With our Machine Learning technologies, the Actian Data Intelligence Platform identifies dataset schema columns, analyses them, and gives them their own “signature”. In this way, if any of these fingerprints are similar, our Data Catalog will make suggestions as to whether the Data Steward should give the same information relative to another.

This technology also gives a means for DPOs to, among others, underline and point out personal or sensitive information that the organization possesses in its databases.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Platform

Why Actian and DataConnect are Better Together – Part 2

Actian Corporation

December 2, 2019

3d rendering ai robot with cloud computing and technology icons

Checklist: Critical Capabilities to Consider When Selecting a Data Integration Vendor That Enables Real-Time Analytics Use Cases

Migrating to a cloud data warehouse makes strategic sense in the modern context of cloud services and digital transformation. Operationally, connecting disparate data sources into your cloud data warehouse, managing ongoing change in your IT environment, and delivering on the promises of real-time operational data is more complicated.

This is the 2nd of a 3-article series on how the Actian data warehouse and Actian DataConnect integration platform work together to help companies realize the real-time analytics vision faster by enabling customers and users to easily and quickly access the data, analytics, and insights they need to drive impactful business results. If you missed the first article in this series, you can view it here.

This article will focus on the critical capabilities that a cloud integration vendor must have to enable customers to ingest, transform, and deliver data from hundreds of applications and data sources across the enterprise into the Actian Data warehouse — quickly, securely, and at scale. As you consider this problem, it is important to realize that data migration into a data warehouse (cloud or other) isn’t a “one and done” activity.

The initial data loading and migration are only the beginning. Most cloud data warehouse solutions include extract, transfer, and load (ETL) capabilities for loading data into the system. Many of these ETL capabilities can also be used for subsequent data loads and refresh activities. Where these basic capabilities fall short is managing the disparate set of technologies and operating environments that are present in most modern IT ecosystems.

So, going into this discussion, it is essential to think about the cost, time, and effort to both stand up your data warehouse as well as what will be required to update and manage the web of connections to support ongoing operations. Here is a checklist of things to consider in a cloud integration vendor for loading disparate data into your cloud data warehouse.

  • Is the data integration platform designed to work with the cloud data warehouse you are implementing?
  • Is the vendor a specialist in data management and integration or a generic software company?
  • Is the vendor’s offering stable, secure, and mature, or is it new to the market?
  • Does the vendor have a robust product support process to address bugs and issues beyond implementation?
  • Can the vendor support data sources on different operating environments (on-premises, cloud, SaaS, IoT, and mobile)?
  • Does the vendor’s offering support a one-time, scheduled and real-time data transfer?
  • Does the integration platform offer capabilities for securely managing credentials and data connections for all your source systems in a centralized place?
  • Do you have audit capabilities to show which connections are being used and who is accessing credentials?
  • Does the integration platform have security controls to enable you to block and isolate connections when incidents are encountered until you can mitigate any risks?
  • How easy is it to add-remove connections and migrate connections as source systems change?
  • How does the cost of implementing an integration platform compare with managing point-to-point connections?

The choices you make in selecting an integration platform to help you connect your source systems into your cloud data warehouse will have a significant impact both on how long implementation takes and how much work it will be to operate and maintain the system once it goes live.

One of the most significant benefits of working with Actian is you have access to both a robust cloud data warehouse in the Actian Data Platform, as well as a world-class integration platform in Actian DataConnect. Together, they offer you a platform designed to support your integration needs of today as well as tomorrow. To learn more, visit DataConnect.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.