Summary

  • El Big Data aumenta la complejidad en la identificación, el uso y la gestión de los datos.
  • Las organizaciones tienen dificultades para comprender las fuentes de datos, las transformaciones y las repercusiones.
  • Las preguntas clave se centran en la pertinencia, la sensibilidad, el origen y las repercusiones posteriores.
  • El linaje de datos mejora la calidad de los datos, el cumplimiento normativo (por ejemplo, el RGPD) y la transparencia.
  • Permite a los usuarios comprender mejor los datos, confiar en ellos y utilizarlos de forma autónoma.

The arrival of Big Data did not simplify how enterprises work with data. The volume, the variety, and the various data storage systems are exploding. With the Big Data revolution, it is even more difficult to answer “primary” questions related to data mapping:

  • What are the most pertinent datasets and tables for my use cases and my organization?
  • Do I have sensitive data? How are they used?
  • Where does my data come from? How have they been transformed?
  • What will be the impacts on my datasets if they are transformed?

So many questions that information systems managers, Data Lab managers, Data Analysts, or even Data Scientists ask themselves to be able to deliver efficient and pertinent data analysis.

Among others, these questions allow enterprises to:

  • Improve data quality: Providing as much information as possible allows users to know if the data is suitable for use.
  • Comply with European regulations (GDPR): mark personal data and the carried-out processes.
  • Make employees more efficient and autonomous in understanding data through graphical and ergonomic data mapping.

To put these into action, companies must build what is called data lineage.