Data Intelligence

What is Data Fingerprinting and Similarity Detection?

Actian Corporation

December 3, 2019

data-fingerprinting

With the emergence of Big Data, enterprises found themselves with a colossal amount of data. In order to understand and analyze their data, as well as meet the various regulatory requirements, it is vital for organizations to document their data assets. However, documenting and giving context to thousands of datasets is a very difficult, even impossible, task to do by hand.

Or, you can use Data Fingerprinting.

What is Data Fingerprinting?

In the data domain, a fingerprint represents a “signature”, or fingerprint, of a data column. The goal here is to give context to these columns.

Via this technology, a Data Fingerprint can automatically detect similar datasets in your databases and can document them more easily, making data stewards’ tasks less tedious and more efficient. For example, supervised by the data steward, data fingerprinting technologies allow us to understand that a column of data with the information “France”, “United States”, and “Australia” represents “Countries”.

Data Fingerprinting

In the Actian Data Intelligence Platform’s case, our metadata management platform’s objective is to give meaning and context to your cataloged datasets in the most automatic way possible. With our Machine Learning technologies, the Actian Data Intelligence Platform identifies dataset schema columns, analyses them, and gives them their own “signature”. In this way, if any of these fingerprints are similar, our Data Catalog will make suggestions as to whether the Data Steward should give the same information relative to another.

This technology also gives a means for DPOs to, among others, underline and point out personal or sensitive information that the organization possesses in its databases.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.