A data mesh is an architecture for implementing the democratization of data across a business. Unlike centralized data warehouses, a data mesh federates data and delegates data ownership to the specialist business domains, who publish their data as a service for all business functions to consume it. The result is a more agile data architecture that allows individual business units some autonomy to manage their core data assets.
Why use a data mesh architecture?
The primary idea behind mesh architectures is to enable a more flexible and scalable data architecture. Monolithic, centralized enterprise data warehouses can be cumbersome to implement, inflexible and expensive to change. By devolving the curation and administration of domain-specific data sets to the business functions that know them best, the business can better adapt to changing business conditions.
One of the primary reasons why the data mesh model scales is because it avoids overburdening centralized data teams. This is accomplished by propagating standard best practices across business domains. Skills shortages are a common cause of big data and data lake projects stagnating into data swamps. Skills gained by staff in one business domain are easily transferred to other domains, reducing training times and allowing projects to be delivered faster.
Maintaining interoperability between pools of data
A core component of a data mesh is the built-in universal interoperability bus that all the domain-specific data warehouses or data marts plug into. This avoids the problems of traditional siloed data marts that often use duplicate, out-of-sync data, and ad hoc tools. Curated data held by one department is available to related business units. Each departmental data warehouse publishes its data-as-a-product to the interoperability bus.
How is a data mesh different to data fabric?
A data mesh is composed of an interconnected set of domain-specific data product services with ownership responsibilities delegated to the various domains in a business. A data fabric creates a single virtual centralized system without distributed data ownership.
Key elements of a data mesh
The main components of a data mesh are:
- Data sources
- Data infrastructure
- Domain-specific data-as-a-service
- Shared standardized governance, data quality and metadata conventions
Data ownership and responsibilities
Each domain data owner agrees with its peers’ data quality and availability service levels. Every domain uses centralized standards for data pipelines. The data mesh provides standardized storage and streaming infrastructure. ETL pipelines can be domain specific but need to use standard metadata labels, data formats, cataloging, lineage, and data governance conventions to ease interoperability and promote compliance.
Some of the many benefits of data mesh architectures include the following:
- Faster time to value for data-oriented projects.
- Lines of business can respond quickly to competitive, regulatory and market pressures or opportunities to explore new markets.
- Shared tools, standards, and processes benefit the whole business by increasing efficiency by reducing duplicated efforts.
- Avoids central resource bottlenecks by delegating data responsibilities to specialist business domains that best understand their data needs.
- More modular data services are easier to understand and use. As with microservices, refactoring monolithic applications into smaller, more digestible components makes them easier to share and consume.
- Consistent application of data quality and data governance requirements across a business improves cooperation and eases future data integration efforts.
- Data and process transparency in the mesh eliminate departmental pools of unconnected siloed data.
- Businesses get more value from their data because federating it across the organization enables better data-driven decision-making.
The three components of a data product
The three major components of data-as-a-product are:
Code – including data pipelines, governance controls, policies and application interfaces.
Data and metadata – can include tables, views, referential integrity constraints, graphs, and associated metadata.
Infrastructure – includes scripts to build and instantiate a data product service.
What are the characteristics of a successful data product?
The most significant success factor for a data product is adoption. The characteristics that drive adoption include discoverability, reliability, trustworthiness, security, and data quality.
Because a data mesh is essentially a self-service model, published data needs to be easy to find, well-documented and easy to consume. Consumers can provide feedback to domain owners on the quality and utility of a data product to ensure shortcomings are addressed and to enable continuous refinement.
Data mesh management
Data products and pipelines need to be supervised at the domain and infrastructure levels to ensure high availability levels and address failures. Monitoring and observability capabilities are therefore design-in to make the life of developers and infrastructure teams easier.
Data products should be protected by encrypting data at rest and in motion. Versioning of data services enables the rollback of bad deployments.
Actian supports data marts
The Actian Data Platform can support multiple data marts and warehouses hosted on-premises or on multiple cloud platforms. Actian has built-in connectors to hundreds of prebuilt connector sources, including NetSuite, Salesforce and ServiceNow. The Actian Data Platform uses a vectorized columnar database that outperforms alternatives by 7.9x and is ideal for staging data before being published as a data product within a domain.
Try the Actian Data Platform for 30-days using the free trial at: https://www.actian.com/avalanche-try-now-start-free/