Traps to Avoid for a Data Catalog Project – Technical Integration

Abstrakte Grafik Weltkarte Illustration auf blauem Hintergrund, Big Data und Networking-Konzept 3d Rendering

Zusammenfassung

Ein Datenkatalog technisch in das Unternehmensökosystem integriert werden, um Metadaten zu konsolidieren und doppelten manuellen Aufwand zu vermeiden.
Nicht alle Metadaten automatisch erfasst werden, sodass weiterhin Beiträge von Mitwirkenden erforderlich sind, um Fachwissen einzubringen und fehlende Systeminformationen zu ergänzen.
Ein Datenkatalog leistungsstark, aber er ist kein „Wundermittel“ und kann nicht automatisch jede Art von Metadaten jeder Quelle abrufen.
Es sollte mit mehreren sich ergänzenden Metadaten verbunden sein, anstatt sich auf ein einziges System zu verlassen.
Diese Integrationen sollten schrittweise eingeführt werden, wobei der Schwerpunkt auf einer iterativen Strategie liegt, die auf die Schaffung von Mehrwert ausgerichtet ist.

Metadata management is an important component in a data management project and it requires more than just the data catalog solution, however connected it may be.

A data catalog tool will, of course, reduce the workload but won’t in and of itself guarantee the success of the project.

In this series of articles, discover the pitfalls and preconceived ideas that should be avoided when rolling out an enterprise-wide data catalog project. The traps described in this are articulated around 4 central themes that are crucial to the success of the initiative:

Datenkultur innerhalb der Organisation.
Interne Projektförderung.
Projektleitung.
Technische Integration des Datenkatalog.

Integrating the data catalog into the enterprise ecosystem will provide opportunities to create value. It is essential to consider these aspects and understand the potential rewards.

Nicht alle Metadaten manuell eingegeben werden

Immer mehr Systeme erzeugen, bündeln und ermöglichen die Erfassung von Metadaten lokale Zwecke. Diese Informationen müssen im Katalog abgerufen und konsolidiert werden, ohne dass sie aus naheliegenden Gründen (Kosteneinsparungen, Datenzuverlässigkeit und Verfügbarkeit) doppelt eingegeben werden müssen.

The data catalog, therefore, presents an opportunity to consolidate this information with the knowledge of the contributors in their respective fields. However, this consolidation has to be thought out through a technical integration rather than a manual effort. Even if it’s obvious that entering the same information twice isn’t efficient, nor is carrying out imports/exports between systems through human actions the way to go.

The strength of a data catalog remains its capacity to ingest metadata via technical integration chains and thus ensure a robust synchronization between systems.

The Data Catalog isn’t an “Automagical” Tool

On the flip side, thinking that a data catalog can extract all types of metadata regardless of its source or format, would be misleading.

The catalog should of course facilitate metadata retrieval, but some metadata won’t be retrievable automatically. There will therefore always be a cost linked to the intervention of the contributors.

The first reason for this resides in the origin of some metadata: some information may simply not be present in the systems because it originates solely from the knowledge of experts. The data catalog is therefore, in this case, a potential candidate for becoming the master system and eligible to receive this information.

Umgekehrt können bestimmte Informationen in einem System vorhanden sein, aber aus vielen Gründen nicht automatisiert abgerufen werden können. Beispielsweise könnte es keine Schnittstelle geben, die einen stabilen Zugriff auf Informationen ermöglicht (
). Das Risiko, dass es zu Störungen im Zusammenhang mit den Informationen kommt, ist daher hoch und kann zu einer Verschlechterung der Qualität des Kataloginhalts führen und letztendlich die Nutzer davon abhalten, ihn zu verwenden.

The Data Catalog Must Not be Connected to a Unique Metadata Source

Metadata stems from many varied layers. As a result, there are multiple and complementary sources involved for a global understanding. It is precisely the reconciliation of this information in a central solution, a data catalog, that will provide the necessary elements to the users.

Opting for a connected data catalog is a real asset, because asset discovery and the associated metadata retrieval are made considerably easier as a result of automation.

Diese Vernetzung kann sich auch auf andere ergänzende Systeme erstrecken. Diese Systeme können potenziell vor oder nach dem ersten System angesiedelt sein, wodurch bei Bedarf die Nachverfolgung der Herkunft ermöglicht und somit die Abläufe und Umwandlungen zwischen den Systemen dokumentiert werden.

The systems can also be independent of one another and simply allow for, by their addition to the catalog, an exhaustive cartography of the company’s patrimony.

Schließlich können angesichts der Vielfalt der Arten von Objekten, die im Katalog dokumentiert werden können, die verschiedenen verknüpften Quellen auch zur Anreicherung eines bestimmten Universums im Datenkatalog beitragen: semantische Ebenen für die einen, physische Ebenen für die anderen usw.

Always with an iterative approach in mind, the multiple sources that will feed the data catalog will be integrated progressively, in accordance with a strategy that seeks the production of value, under the global supervision of the Data Office.

Über den Autor