What is a Smart Data Catalog?
The idea of a Smart Data Catalog has been around for a few years in metadata management-related literature, although it has no official definition. The general consensus is that a modern data catalog must-have machine learning and AI to unlock its potential.
- Metamodeling
- Data inventory
- Metadata management
- Search engine
- User experience
Grab Your eBook
Overview
Regardless of its size, an information system contains several dozen systems and applications that store data through a wide variety of sources (relational and non-relational databases, distributed file systems, APIs, cloud solutions, etc.), according to specific protocols, formats, and rules. Each system manages hundreds or thousands of datasets – usually, tables or files – themselves made of dozens of fields (or columns). And each dataset and each field feeds into a metamodel (in other words, an ensemble of structured metadata) which makes data exploration possible.
Ultimately, a data catalog will have to harness enormous amounts of very diverse information – and its volume will grow exponentially, just as the volume of usable data will. This volume of information will raise 2 major problems:
- How to feed and maintain the volume of information without tripling (or more) the cost of metadata management.
- How to find the most relevant datasets for any specific use case.
For us, a smart data catalog should have a much wider scope than the integration of AI algorithms and should include a range of smart technological and conceptual features that provide answers to the 2 questions above.