Why a Privacy by Design Approach Works for Data Catalogs
Actian Corporation
April 11, 2022
 
                Since the beginning of the 21st century, we’ve been experiencing a true digital revolution. The world is constantly being digitized and human activity is increasingly structured around data and network services. The manufacturing, leisure, administration, service, medical, and so many other industries are now organized around complex and interconnected information systems. As a result, more and more data is continuously collected by the devices and technologies present in our daily lives (Web, Smartphone, IoT) and transited from system to system. It has become central for any company that provides products or services to do everything possible to protect the data of their customers. The best approach to do so is through Privacy by Design.
In this article, we explain what Privacy by Design is, how we applied this approach in the design of our data catalog, as well as how a data catalog can help companies implement Privacy by Design.
Data Protection: A Key Issue for Enterprises
Among all the various data mentioned above, some allow the direct or indirect identification of physical persons. These are known as personal data, as defined by the CNIL. It is of paramount importance in the modern world because of its intrinsic value.
On a daily basis, huge volumes of personal data pass between individuals, companies, and governments. There is a real risk of their misuse, as the Cambridge Analytica scandal in 2015 showed, for example. Cybercriminals can also make substantial gains from it, via account hacking, reselling data to other cybercriminals, identity theft, or attacking companies via phishing or president scams. For example, a real estate developer was recently robbed of several tens of millions of euros in France.
The need to protect data has never been so important.
States have quickly become aware of this issue to protect individuals from the abuses related to the exploitation of their data. In Europe, for example, the GDPR (the General Data Protection Regulation) has been in effect since 2016 and is already well established in the daily activities of companies. In the rest of the world, regulations are constantly evolving and are a concern for nearly every country. Recently, California passed a consumer data privacy law, a U.S. equivalent of the GDPR. Even China has just legislated on this topic.
Privacy by Design: Defining a Key Concept for Data Protection
While many legislations rely heavily on the notion of Privacy by Design, it was conceptualized by Ann Cavoukian in the late 1990s when she was the Information and Privacy Commissioner of the Province of Ontario in Canada. The essence of this idea is to include the issue of personal data protection right from the design of a computer system.
In this sense, Privacy by Design lists seven fundamental principles:
Proactivity: Any company must put in place the necessary provisions for data protection upstream, and must not rely on a reactive policy;
Personal data protection as a default setting: Any system must take as a default setting the highest possible level of protection for the sensitive data of its users;
Privacy by design: Privacy should be a systematically studied and considered aspect of the design and implementation of new functionality;
Full functionality: No compromise should be made with security protocols or with the user experience;
End-to-end security: The system must ensure the security of data throughout its lifecycle, from collection to destruction (including if the data is outsourced);
Visibility and transparency: The system and the company must document and communicate personal data protection procedures and actions taken in a clear, consistent and transparent manner;
Respect for user privacy: Every design and implementation decision must be made with the user’s interest at the center.
The Application of Privacy by Design
We’ve built our product on the foundations of Privacy by Design.
The Treatment of Users’ Personal Data
First of all, we have anchored data protection at the heart of our architecture. Each of our customer’s data is segregated into different tenants, each encrypted with their own key. User authentication is managed through a specialized third-party system. We encourage identity federation among our customers, which allows them to maintain control over the data needed for user identification and authentication.
We have also included the concept of Privacy by Design in the design of our application. For example, we collect only the bare minimum of information, all system outputs are anonymized (logs, application errors, APIs).
Processing Customer Business Data
Our main mission being to document the data, our solution contains by essence metadata. By design, the Actian Data Intelligence Platform does not extract any data from our customers’ systems. Indeed, the risk is intrinsically less on the metadata than on the data.
Nevertheless, we offer within the platform, several features allowing us to provide information on the data present in the client systems (statistics, sampling, etc.). Because of our architecture, the calculations are always done on the client’s infrastructure, as close as possible to the data and its security. And in compliance with principle #2 of Privacy by Design, we have set the protection of personal data as a default setting. Thus, all these features are disabled by default and can only be activated by the customer.
How Our Data Catalog Helps Companies Implement Privacy by Design
Our data catalog can help your company implement Privacy by Design, especially on the control and verification aspects. Taking the 7 principles described earlier, the data catalog can effectively participate in two of them: the visibility and transparency principle, and the end-to-end security principle. The data catalog also enables the automation of the identification of sensitive data.
Visibility and Transparency via the Data Catalog
The objective of a data catalog is to centralize a company’s data assets, document them, and share them with as many people as possible. This centralization allows each employee to know what data is collected by the CRM, and the marketing and customer success teams to process this information in the acquisition and churn tracking reports.
Once this inventory has been established, the catalog can be used to document certain additional information that is necessary for the company’s proper functioning. This is notably the case for the sensitive or non-sensitive nature of the documented information, the rules of governance, the processing, or the access procedures that must be applied.
In the context of a Privacy by Design approach, the data catalog can be used to add a business term corresponding to sensitive data (a social security number, a telephone number, etc.). This business term can then be easily associated with the tables or physical fields that contain the data, thus allowing its easy identification. This initiative contributes to the principle of visibility and transparency of Privacy by Design.
End-to-End Security via the Data Catalog
The data catalog also provides data lineage capabilities. Automatic data lineage ensures that the processes applied to data identified as sensitive comply with what is defined by the company’s data governance. It is then simple with the data catalog to fill in the governance rules to be applied to sensitive data.
Moreover, the lineage allows us to follow the whole life cycle of the data, from its creation to its final use, including its transformations. This makes it easy to check that all the stages of this life cycle comply with the rules and correct any errors.
The data catalog, via the data lineage, thus contributes to the principle of the end-to-end security of Privacy by Design.
With that said, we remain convinced that a data catalog is not a compliance solution, but rather a tool for the acculturation of teams to sensitive data and its particularities of use.
Identifying Sensitive Data via the Data Catalog
In a rapidly changing data environment, the data catalog must reflect reality as much as possible in order to maintain the trust of its users. Without this, the entire adoption of the data catalog project is put into question.
We are firmly convinced that the data catalog must be automated as much as possible to be scalable and efficient. This starts with the inventory of available data. In this sense, our inventory is automated and is responsible for passing on all modifications to the original system (source) of the data directly into the catalog. Thus, at any time, the customer has an exhaustive list of the data present in its systems.
And to help our customers identify which of the inventoried data deserve special treatment because of their sensitive data status, the automation does not stop at the inventory. We now offer a system that suggests the tagging of new data inventoried with a sensitive data profile. This makes it easier to bring this data to the forefront and to spread the information faster and more easily throughout the company.
Conclusion
In the past few years, personal data has become a real concern for most consumers. More and more countries are setting up regulations to guarantee their citizens maximum protection. One of the major principles governing all these regulations is Privacy by Design.
We have from the start included the reflection around personal data at the heart of our product. Both in our technical development and in the processing of our users’ data, as well as in our reflection on the data that our clients process via our catalog.
We believe that a data catalog can be a significant asset in the implementation and monitoring of Privacy by Design policies. We also heavily rely on automation and AI to bring many more improvements in the upcoming months: automatic construction of technical data lineage, improved detection of sensitive data in the catalog objects to better document them, quality control of processes applied to sensitive data, etc. The possibilities are numerous.
To learn more about the advantages of the catalog in the management of your sensitive and personal data, don’t hesitate to schedule a meeting with one of our experts.
Subscribe to the Actian Blog
Subscribe to Actian’s blog to get data insights delivered right to you.
- Stay in the know – Get the latest in data analytics pushed directly to your inbox.
- Never miss a post – You’ll receive automatic email updates to let you know when new posts are live.
- It’s all up to you – Change your delivery preferences to suit your needs.
Subscribe
(i.e. sales@..., support@...)
 
     
                 
                
 
                 
                









 
                










 
                 
                 
                 
                