eBook

What is a Smart Data Catalog?

“The idea of a Smart Data Catalog has been around for a few years in metadata management-related literature, although it has no official definition. The general consensus is that a modern data catalog must-have machine learning and AI to unlock its potential.

In this piece, we will attempt to define how Zeenea handles the idea of the Smart Data Catalog which, for us, cannot be limited to machine learning capabilities.”

– Guillaume Bodet – CEO, Zeenea

We have identified 5 areas in which a data catalog can be “smart” – most of which do not involve machine learning:

  • Metamodeling
  • The data inventory
  • Metadata management
  • The search engine
  • User experience

Grab Your eBook

Overview

Regardless of its size, an information system contains several dozen systems and applications that store data through a wide variety of sources (relational and non-relational databases, distributed file systems, APIs, cloud solutions, etc.), according to specific protocols, formats, and rules. Each system manages hundreds or thousands of datasets – usually, tables or files – themselves made of dozens of fields (or columns). And each dataset and each field feeds into a metamodel (in other words, an ensemble of structured metadata) which makes data exploration possible.

Ultimately, a data catalog will have to harness enormous amounts of very diverse information – and its volume will grow exponentially, just as the volume of usable data will. This volume of information will raise 2 major problems:

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management.
  • How to find the most relevant datasets for any specific use case.

For us, a Smart Data Catalog should have a much wider scope than the integration of AI algorithms and should include a range of smart technological and conceptual features that provide answers to the 2 questions above.