Data Intelligence

Building a Marketplace for Data Mesh: Domain Data Catalogs – Part 3

Actian Corporation

June 10, 2024

person touching a screen to look at data mesh

Over the past decade, data catalogs have emerged as important pillars in the landscape of data-driven initiatives. However, many vendors on the market fall short of expectations with lengthy timelines, complex and costly projects, bureaucratic data governance models, poor user adoption rates, and low-value creation. This discrepancy extends beyond metadata management projects, reflecting a broader failure at the data management level.

Given these shortcomings, a new concept is gaining popularity: the internal marketplace, or what we call the Enterprise Data Marketplace (EDM).

In this series of articles, get an excerpt from our Practical Guide to Data Mesh where we explain the value of internal data marketplaces for data product production and consumption, how an EDM supports data mesh exploitation on a larger scale, and how they go hand-in-hand with a data catalog solution:

Facilitating data product consumption through metadata.
Setting up an enterprise-level marketplace.
Feeding the marketplace via domain-specific data catalogs.

Structuring data management around domains and data products is an organizational transformation that does not change the operational reality of most organizations: data is available in large quantities, from numerous sources, evolves rapidly, and its control is complex.

Data catalogs traditionally serve to inventory all available data and manage a set of metadata to ensure control and establish governance practices.

Data mesh does not eliminate this complexity: it allows certain data, managed as data products, to be distinguished and intended for sharing and use beyond the domain to which they belong. But each domain is also responsible for managing its internal data, which will be used to develop robust and high-value data products – its proprietary data, in other words.

Metadata Management in the Context of an Internal Marketplace fed by Domain-Specific Catalogs

In the data mesh, the need for a Data Catalog does not disappear, quite the contrary: each domain should have a catalog allowing it to efficiently manage its proprietary data, support domain governance, and accelerate the development of robust and high-value data products. Metadata management is thus done at two levels:

At the domain level – in the form of a catalog allowing the documentation and organization of the domain’s data universe. Since the Data Catalog is a proprietary component, it is not necessary for all domains to use the same solution.
At the mesh level – in the form of a marketplace in which the data products shared by all domains are registered; the marketplace is naturally common to all domains.

With a dedicated marketplace component, the general architecture for metadata management is as follows:

data marketplace architecture

In this architecture, each domain has its own catalog – which may rely on a single solution or not – but should be instantiated for each domain to allow it to organize its data most effectively and avoid the pitfalls of a universal metadata organization.

The marketplace is a dedicated component, offering simplified ergonomics, and in which each domain deploys metadata (or even data) for its data products. This approach requires close integration of the different modules:

Domain catalogs must be integrated with the marketplace to avoid duplicating efforts in producing certain metadata – especially lineage, but also data dictionaries (schema), or even business definitions that will be present in both systems.
Domain catalogs potentially need to be integrated with each other – to share/synchronize certain information, primarily the business glossary but also some repositories.

Data Catalog vs. EDM Capabilities

When we look at the respective capabilities of an Enterprise Data Marketplace and a Data Catalog, we realize that these capabilities are very similar:

Data Catalog Vs Enterprise Data Marketplace

In the end, on a strictly functional level, their capabilities are very similar. What distinguishes a modern Data Catalog from an EDM are:

Their scope – The Data Catalog is intended to cover all data, whereas the marketplace is limited to the objects shared by domains (data products and other domain analytics products).
Their user experience – The Data Catalog is often a fairly complex tool, designed to support governance processes globally – it focuses on data stewardship workflows. The marketplace, on the other hand, typically offers very simple ergonomics, heavily inspired by that of an e-commerce platform, and provides an experience centered on consumption – data shopping.

The Practical Guide to Data Mesh: Setting up and Supervising an Enterprise-Wide Data Mesh

Written by Guillaume Bodet, our guide was designed to arm you with practical strategies for implementing data mesh in your organization, helping you:

Start your data mesh journey with a focused pilot project.
Discover efficient methods for scaling up your data mesh.
Acknowledge the pivotal role an internal marketplace plays in facilitating the effective consumption of data products.
Learn how the Actian Data Intelligence Platform emerges as a robust supervision system, orchestrating an enterprise-wide data mesh.

Get the eBook.

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.

AI & ML

Your Company is Ready for GenAI. But is Your Data?

Dee Radh

June 5, 2024

The buzz around Generative AI (GenAI) is palpable, and for good reason. This powerful technology promises to revolutionize how businesses like yours operate, innovate, and engage with customers. From creating compelling marketing content to developing new product designs, the potential applications of GenAI are vast and transformative. But here’s the kicker: to unlock these benefits, your data needs to be in tip-top shape. Yes, your company might be ready for GenAI, but the real question is—are your data and data preparation up to the mark? Let’s delve into why data preparation and quality are the linchpins for GenAI success.

The GenAI Foundation: Data Preparation

Think of GenAI as a master chef. No matter how skilled the chef is, the quality of the dish hinges on the ingredients. In the realm of GenAI, data is the primary ingredient. Just as a chef needs fresh, high-quality ingredients to create a gourmet meal, GenAI needs well-prepared, high-quality data to generate meaningful and accurate outputs.

Garbage In, Garbage Out

There’s a well-known adage in the data world: “Garbage in, garbage out.” This means that if your GenAI models are fed poor-quality data, the insights and outputs they generate will be equally flawed. Data preparation involves cleaning, transforming, and organizing raw data into a format suitable for analysis. This step is crucial for several reasons:

Accuracy

Ensuring data is accurate prevents AI models from learning incorrect patterns or making erroneous predictions.

Consistency

Standardizing data formats and removing duplicates ensure that the AI model’s learning process is not disrupted by inconsistencies.

Completeness

Filling in missing values and ensuring comprehensive data coverage allows AI to make more informed and holistic predictions.

The Keystone: Data Quality

Imagine you’ve meticulously prepared your ingredients, but they’re of subpar quality. The dish, despite all your efforts, will be a disappointment. Similarly, even with excellent data preparation, the quality of your data is paramount. High-quality data is relevant, timely, and trustworthy. Here’s why data quality is non-negotiable for GenAI success:

Relevance

Your GenAI models need data that is pertinent to the task at hand. Irrelevant data can lead to noise and outliers, causing the model to learn patterns that are not useful or, worse, misleading. For example, if you’re developing a GenAI model to create personalized marketing campaigns, data on customer purchase history, preferences, and behavior is crucial. Data on their shoe size? Not so much.

Timeliness

GenAI thrives on the latest data. Outdated information can result in models that are out of sync with current trends and realities. For instance, using last year’s market data to generate this year’s marketing strategies can lead to significant misalignment with the current market demands and changing consumer behavior.

Trustworthiness

Trustworthy data is free from errors and biases. It’s about having confidence that your data reflects the true state of affairs. Biases in data can lead to biased AI models, which can have far-reaching negative consequences. For example, if historical hiring data used to train an AI model contains gender bias, the model might perpetuate these biases in future hiring recommendations.

Real-World Implications

Let’s put this into perspective with some real-world scenarios:

Marketing and Personalization

A retail company leveraging GenAI to create personalized marketing campaigns can see a substantial boost in customer engagement and sales. However, if the customer data is riddled with inaccuracies—wrong contact details, outdated purchase history, or incorrect preferences—the generated content will miss the mark, leading to disengagement and potentially damaging the brand’s reputation.

Product Development

In product development, GenAI can accelerate the creation of innovative designs and prototypes. But if the input data regarding customer needs, market trends, and existing product performance is incomplete or outdated, the resulting designs may not meet current market demands or customer needs, leading to wasted resources and missed opportunities.

Healthcare and Diagnostics

In healthcare, GenAI has the potential to revolutionize diagnostics and personalized treatment plans. However, this requires precise, up-to-date, and comprehensive patient data. Inaccurate or incomplete medical records can lead to incorrect diagnoses and treatment recommendations, posing significant risks to patient health.

The Path Forward: Investing in Data Readiness

To truly harness the power of GenAI, you must prioritize data readiness. Here’s how to get started:

Data Audits

Conduct regular data audits to assess the current state of your data. Identify gaps, inconsistencies, and areas for improvement. This process should be ongoing to ensure continuous data quality and relevance.

Data Governance

Implement robust data governance frameworks that define data standards, policies, and procedures. This ensures that data is managed consistently and remains high-quality across the organization.

Advanced Data Preparation Tools

Leverage advanced data preparation tools that automate the cleaning, transformation, and integration of data. These tools can significantly reduce the time and effort required to prepare data, allowing your team to focus on strategic analysis and decision-making.

Training and Culture

Foster a culture that values data quality and literacy. Train employees on the importance of data integrity and equip them with the skills to handle data effectively. This cultural shift ensures that everyone in the organization understands and contributes to maintaining high data standards.

The Symbiosis of Data and GenAI

GenAI holds immense potential to drive innovation and efficiency across various business domains. However, the success of these initiatives hinges on the quality and preparation of the underlying data. As the saying goes, “A chain is only as strong as its weakest link.” In the context of GenAI, the weakest link is often poor data quality and preparation.

By investing in robust data preparation processes and ensuring high data quality, you can unlock the full potential of GenAI. This symbiosis between data and AI will not only lead to more accurate and meaningful insights but also drive sustainable competitive advantage in the rapidly evolving digital landscape.

So, your company is ready for GenAI. But the million-dollar question remains—is your data?

Download our free GenAI Data Readiness Checklist shared at the Gartner Data & Analytics Summit.

About Dee Radh

As Senior Director of Product Marketing, Dee Radh heads product marketing for Actian. Prior to that, she held senior PMM roles at Talend and Formstack. Dee has spent 100% of her career bringing technology products to market. Her expertise lies in developing strategic narratives and differentiated positioning for GTM effectiveness. In addition to a post-graduate diploma from the University of Toronto, Dee has obtained certifications from Pragmatic Institute, Product Marketing Alliance, and Reforge. Dee is based out of Toronto, Canada.

Databases

Actian Ingres 12.0 Enhances Cloud Flexibility and Offers Faster Analytics

Emma McGrattan

June 4, 2024

Today, we are excited to announce Actian Ingres 12.0*, which is designed to make cloud deployment simpler, enhance security, and deliver up to 20% faster analytics. The first release I worked on was Ingres 6.4/02 back in 1992, and the first bug I fixed was for a major US car manufacturer that used Ingres to drive its production line. It gives me great pride to see that three decades later, Ingres continues to manage some of the world’s most mission-critical data deployments and that there’s so much affection for the Ingres product.

With this release, we’re returning to the much-loved Ingres brand for all platforms. We continue to partner with our customers to understand their evolving business needs and make sure that we deliver products that enable their modernization journey. With this new release, we focused on the following capabilities:

Backup to cloud and disaster recovery. Ingres 12.0 greatly simplifies these configurations for both on-premises and cloud deployments through the use of Virtual Machines (VMs) or Docker containers in Kubernetes.
Fortified protection automatically enables AES-256 encryption and hardened security to defend against brute force and Denial of Service (DoS) attacks.
Improved performance and workload management with up to 20% faster analytical queries using the X100 engine. Workload Manager 2.0 provides greater flexibility in the allocation of resources to meet specific user demand.
Elevated developer experiences in OpenROAD 12. We make it quick and easy to create and transform database-centric applications for web and mobile environments.

These new capabilities, coupled with our previous enhancements to cloud deployment, are designed to help our customers deliver on their modernization goals. They reflect Actian’s vision to develop solutions that our customers can trust, are flexible to meet their specific needs, and are easy to use so they can thrive when uncertainty is the only certainty they can plan for.

Customers like Lufthansa Systems rely on Actian Ingres to power their Lido flight and route planning software. “It’s very reassuring to know that our solution, which keeps airplanes and passengers safe, is backed up by a database that has for so many years been playing in the ‘premier league’,” said Rudi Koffer, Senior Database Software Architect at the Lufthansa Systems Airlines Operations Solutions division in Frankfurt Raunheim, Germany.

Experience the new capabilities first-hand. Connect with an Actian representative to get started. Below we dive into what each capability delivers.

A Database Built for Your Modernization Journey

Backup to Cloud and Disaster Recovery

Most businesses today have 24×7 data operations, so a system outage can have serious consequences. With Ingres 12.0 we’ve added new backup functionality to cloud and disaster recovery capabilities to dramatically reduce the risk of application downtime and data loss with a new component called IngresSync. IngresSync makes copies of a database to a target location for offsite storage and quick restoration.

Disaster recovery is now Docker or Kubernetes container-ready for Ingres 12.0 customers, allowing users to set up a read-only standby server in their Kubernetes deployment. Recovery Point Objectives are in the order of minutes and are user configurable.

Actian Ingres 12.0 Process to Disaster Recovery
Backup to cloud and disaster recovery are imperative for situations like:

Natural Disasters: When a natural disaster such as a hurricane or earthquake strikes a local datacenter, cloud backups ensure that a copy of the data is readily available, and an environment can be spun up quickly in the cloud of your choosing to resume business operations.
Cyberattacks: In the event of a cyberattack such as ransomware, having cloud backups and a disaster recovery plan are essential to establish a non-compromised version of the database in a protected cloud environment.

Fortified Protection

Actian Ingres 12.0 enables AES-256 bit encryption on data in motion by default. AES-256 bit is considered one of the most secure encryption standards available today and is widely used to protect sensitive data. The 256-bit key size makes it extremely resistant to attacks and is often used by governments and highly regulated industries like banking and healthcare.

In addition, Actian Ingres 12.0 offers user-protected privileges and containerized User Defined Functions (UDFs). These UDFs, which can be authored in SQL, JavaScript, or Python, safeguard against unauthorized activities within the company’s firewall that may target the database directly. Containerization of UDFs further enhances security by isolating user operations from core database management system (DBMS) processes.

Improved Performance and Workload Automation

Actian Ingres 12.0 customers can increase resource efficiency on transactional and analytic workloads in the same database. Workload Manager 2.0 enhances the data management experience with priority-driven queues, enabling the system to allocate resources based on predefined priorities and user roles. Now database administrators can define role-types such as DBAs, application developers, and end users, and assign a priority for each role-type.

The X100 engine, included with Ingres on Linux and Windows, brings efficiency improvements such as table cloning for x100 tables that allow customers to conduct projects or experiments in isolation from core DBMS operations.

Our Performance Engineering Team has determined that for analytics workloads, these enhancements make Actian Ingres 12.0 the fastest Ingres version yet with a 20% improvement over prior versions. Transactional workloads see improved release over release performance.

Elevated Developer Experiences

Actian OpenROAD 12.0, the latest update to the Ingres graphical 4GL, also sees some new enhancements designed to assist customers on their modernization journey. Surprisingly or not, we still have customers with forms-based applications and while many argue that these are the fastest and most reliable apps for data-entry, our customers want to deliver more modern versions of these apps mostly on tablet style devices. To facilitate this modernization and to protect the decades of investments in business logic, we have delivered enhanced versions of abf2or and WebGen in OpenROAD 12.0.

Additionally, OpenROAD users will benefit from the new gRPC-based architecture, which streamlines administration, bolsters concurrency support, and offers a more efficient framework, thanks to HTTP/2 and protocol buffers. The gRPC design is optimized for microservices and can be neatly packaged within a distinct container for deployment. The introduction of a newly distributed Docker file lays the groundwork for cloud deployment, providing production-ready business logic ready for integration with any modern client.

Leading Database Modernization and Innovation

These latest innovations join our recent milestones to solidify Actian’s position as a data and analytics leader. These achievements build on recent recognitions, including:

With this momentum, we are ready to accelerate solutions that our customers can trust, are flexible to their needs, and are easy-to-use.

Get hands-on with the new capabilities today. Connect with an Actian representative to get started.

*Actian Ingres includes the product formerly known as Actian X.

About Emma McGrattan

Emma McGrattan is CTO at Actian, leading global R&D in high-performance analytics, data management, and integration. With over two decades at Actian, Emma holds multiple patents in data technologies and has been instrumental in driving innovation for mission-critical applications. She is a recognized authority, frequently speaking at industry conferences like Strata Data, and she's published technical papers on modern analytics. In her Actian blog posts, Emma tackles performance optimization, hybrid cloud architectures, and advanced analytics strategies. Explore her top articles to unlock data-driven success.

Data Intelligence

Building a Marketplace for Data Mesh: Enterprise-Level Marketplace – Part 2

Actian Corporation

June 3, 2024

women working and building a marketplace for data mesh

Given these shortcomings, a new concept is gaining popularity: the internal marketplace, or what we call the Enterprise Data Marketplace (EDM).

Facilitating data product consumption through metadata.
Setting up an enterprise-level marketplace.
Feeding the marketplace via domain-specific data catalogs.

As described in our previous article, an Enterprise Data Marketplace is a simple system in which consumers can search among the data product offerings for one or more eligible to perform a specific use case, become aware of the information related to these products, and then order them. The order materializes as access opening, physical data delivery, or even a request for data product evolution to cover the new use case.

Three Main Options for Setting up an Internal Data Marketplace

When establishing an internal data marketplace, organizations typically consider three primary approaches:

Develop It

This approach involves building a custom data marketplace tailored to the organization’s unique requirements. While offering the potential for a finely tuned user experience, this option often entails significant time and financial investment.

Integrate a Solution From the Market

Alternatively, organizations can opt for pre-existing solutions available in the market. Originally designed for data commercialization or external data exchange, these solutions can be repurposed for internal use. However, they may require customization to align with internal workflows and security standards.

Use Existing Systems

Some organizations choose to leverage their current infrastructure by repurposing tools such as data catalogs and corporate wikis. While this approach may offer familiarity and integration with existing workflows, it might lack the specialized features of dedicated data marketplace solutions.

The Drawbacks of Commercial Marketplaces

Although often offering a satisfying user experience and native support for the data product concept, commercial marketplaces often have significant drawbacks: highly focused on transactional aspects (distribution, licensing, contracting, purchase or subscription, payment, etc.), they are often poorly integrated with internal data platforms and access control tools. They generally require data to be distributed by the marketplace, meaning they constitute a new infrastructure component onto which data must be transferred and shared (such a system is sometimes called a Data Sharing Platform).

Actian Data Intelligence Platform’s Enterprise Data Marketplace

In a pragmatic approach, it is not desirable to introduce a new infrastructure component to deploy a data mesh – it seems highly preferable to leverage existing capabilities as much as possible.

Therefore, we’ve evolved our data discovery platform and data catalog to offer a unique solution, one that mirrors the data mesh at the metadata level to continually adapt to the organization’s evolving data platform architecture. This Enterprise Data Marketplace (EDM) integrates a cross-domain marketplace with private data catalogs tailored to each domain’s needs.

An approach that we detail in the next article of our series, made possible by what has long distinguished the Actian Data Intelligence Platform and differentiates it from most other data catalogs or metadata management platform vendors: an evolving knowledge graph.

In our final article, discover how an internal data marketplace paired with domain-specific catalogs, provides a comprehensive data mesh supervision system.

About Actian Corporation

Data Management

Actian Ingres Disaster Recovery

Emma McGrattan

May 31, 2024

Most production Actian Ingres installations need some degree of disaster recovery (DR). Options range from shipping nightly database checkpoints to off-site storage locations to near real-time replication to a dedicated off-site DR site.

Actian Ingres enterprise hybrid database that ships with built-in checkpoint and journal shipping features which provide the basic building blocks for constructing low-cost, efficient DR implementations. One such implementation is IngresSync, which utilizes Actian Ingres’ native checkpoint/journal shipping and incremental roll-forward capabilities to implement a cost-effective DR solution.

ingressync

IngresSync works on the concept of source and target Actian Ingres installations. The source installation is the currently active production environment. The target, or multiple targets if needed, kept current by an IngresSync job scheduled to execute on a user-defined interval. Each sync operation copies only journals created since the previous sync and applies those transactions to the targets. Checkpoints taken on the source node are automatically copied to and rolled forward on all targets.

Example

Suppose we have an environment where the production installation is hosted on node corp and we need to create two DR sites dreast and drwest.

The DR nodes each need:

An Ingres installation at the same version and patch level as corp.
Passwordless SSH configured to and from the other nodes.
Ingres/Net VNODE entries to the other nodes.

DR nodes for Ingress

To configure this environment, we must first designate the source and target hosts and apply the latest source checkpoint to the targets.

ingresSync --source=corp --target=dreast,drwest --database=corpdb --iid=II --ckpsync --restart

source and target hosts for Ingress

The two target installations are now synched with the source, and the target databases are in incremental rollforward (INCR_RFP) state. This state allows journals to be applied incrementally to keep the targets in sync with the source. Incremental rollforward is performed by:

ingresSync --hosts=corp,dreast,drwest --database=corpdb --iid=II --jnlsync

When executed, this will close the current journal on the source, copy new journals to the targets, and roll forward those journals to the targets. The journal sync step should be configured to execute at regular intervals using the system scheduler, such as cron. Frequent execution results in minimal sync delay between the source and targets.

The target installations at dreast and drwest are now in sync with the source installation at corp. Should the corp environment experience a hardware or software failure, we can designate one of the target nodes as the new source and direct client connections to that node. In this case, we’ll designate drwest as the new source and dreast will remain as a target (DR site).

ingresSync --target=drwest --database=corpdb --iid=II --incremental_done

This takes the drwest corpdb database out of incremental rollforward mode; the database will now execute both read and update transactions and is the new source. The dreast database is still in incremental rollforward mode and will continue to functioning as a DR target node.

drwest for ingress

Since the corp node is no longer available, the journal sync job must be started on either drwest or dreast. The journal sync job can be configured and scheduled to execute on all three nodes using the –strict flag. In this case, the job determines if it executes on the current source node; if so it will execute normally. If executing on a target, the job will simply terminate. This configuration allows synchronization to continue even as node roles change.

Once corp is back online it can be brought back into the configuration as a DR target.

ingresSync --source=drwest --target=corp --database=corpdb --iid=II --ckpsync --restart

dr target for Ingress

At some point, we may need to revert to the original configuration with corp as the source. The steps are:

Terminate all database connections to drwest

Sync

corp

 with

drwest

 to ensure

corp

 is current
ingresSync --source=drwest --target=corp --database=corpdb --iid=II

--jnlsync

Reassign node roles

ingresSync --target=corp --database=corpdb --iid=II --incremental_done

ingresSync --source=corp --target=drwest --database=corpdb --iid=II

--ckpsync --restart

revert to original corp as source for Ingress

Summary

IngresSync is one mechanism for implementing a DR solution. It is generally appropriate in cases where some degree of delay is acceptable and the target installations have little or no database user activity. Target databases can be used for read only/reporting applications with the stipulation that incremental rollforwards cannot run while there are active database connections. The rollforward process will catch up on the first refresh cycle when there are no active database connections.

The main pros and cons of the alternative methods of delivering disaster recovery for Actian Ingres are outlined below:

Feature	Checkpoint Shipping	IngresSync	Replication
Scope	Database	Database	Table
Granularity	Database	Journal	Transaction
Sync Frequency	Checkpoint	User Defined	Transaction
Target Database	Read/Write(1)	Read Only	Read/Write(2)

Target database supports read and write operations but all changes are lost on the next checkpoint refresh.
Target database supports read and write operations but there may be update conflicts that require manual resolution.

Note: IngresSync currently runs on Linux and Microsoft Windows. Windows environments require the base Cygwin package and rsync.

About Emma McGrattan

Databases

Types of Databases, Pros & Cons, and Real-World Examples

Dee Radh

May 30, 2024

Summary

This blog offers a comprehensive overview of major database models—including relational, NoSQL, in‑memory, graph, and hybrid types—highlighting their strengths, weaknesses, and real-world use cases to guide decision-makers in selecting the right database for their needs.

Relational (SQL): Ideal for structured, ACID-compliant workloads—great for transactions and complex queries—but can struggle with horizontal scaling and rigid schema.
NoSQL (Document, Key‑Value, Columnar): Offers high flexibility and horizontal scalability for large, unstructured data sets; may sacrifice consistency, require complex modeling, and incur training/development costs.
In‑Memory & Graph/Hybid Models: In‑memory databases deliver ultra-low latency; graph databases simplify relationship-heavy queries. Hybrid systems (like Actian’s) combine OLTP and OLAP strengths for real-world analytic performance.

Databases are the unsung heroes behind nearly every digital interaction, powering applications, enabling insights, and driving business decisions. They provide a structured and efficient way to store vast amounts of data. Unlike traditional file storage systems, databases allow for the organization of data into tables, rows, and columns, making it easy to retrieve and manage information. This structured approach, coupled with data governance best practices, ensures data integrity, reduces redundancy, and enhances the ability to perform complex queries. Whether it’s handling customer information, financial transactions, inventory levels, or user preferences, databases underpin the functionality and performance of applications across industries.

Types of Information Stored in Databases

Telecommunications: Verizon

Verizon uses databases to manage its vast network infrastructure, monitor service performance, and analyze customer data. This enables the company to optimize network operations, quickly resolve service issues, and offer personalized customer support. By leveraging database technology, Verizon can maintain a high level of service quality and customer satisfaction.

E-commerce: Amazon

Amazon relies heavily on databases to manage its vast inventory, process millions of transactions, and personalize customer experiences. The company’s sophisticated database systems enable it to recommend products, optimize delivery routes, and manage inventory levels in real-time, ensuring a seamless shopping experience for customers.

Finance: JPMorgan Chase

JPMorgan Chase uses databases to analyze financial markets, assess risk, and manage customer accounts. By leveraging advanced database technologies, the bank can perform complex financial analyses, detect fraudulent activities, and ensure regulatory compliance, maintaining its position as a leader in the financial industry.

Healthcare: Mayo Clinic

Mayo Clinic utilizes databases to store and analyze patient records, research data, and treatment outcomes. This data-driven approach allows the clinic to provide personalized care, conduct cutting-edge research, and improve patient outcomes. By integrating data from various sources, Mayo Clinic can deliver high-quality healthcare services and advance medical knowledge.

Types of Databases

The choice between relational and non-relational databases depends on the specific requirements of your application. Relational databases are ideal for scenarios requiring strong data integrity, complex queries, and structured data. In contrast, non-relational databases excel in scalability, flexibility, and handling diverse data types, making them suitable for big data, real-time analytics, and content management applications.

Types of databases: Relational databases and non-relational databases

Image ⓒ Existek

1. Relational Databases

Strengths

Structured Data: Ideal for storing structured data with predefined schemas
ACID Compliance: Ensures transactions are atomic, consistent, isolated, and durable (ACID)
SQL Support: Widely used and supported SQL for querying and managing data

Limitations

Scalability: Can struggle with horizontal scaling
Flexibility: Less suited for unstructured or semi-structured data

Common Use Cases

Transactional Systems: Banking, e-commerce, and order management
Enterprise Applications: Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems

Real-World Examples of Relational Databases

MySQL: Widely used in web applications like WordPress.
PostgreSQL: Used by organizations like Instagram for complex queries and data integrity.
Oracle Database: Powers large-scale enterprise applications in finance and government sectors.
Actian Ingres: Widely used by enterprises and public sector like the Republic of Ireland.

2. NoSQL Databases

Strengths

Scalability: Designed for horizontal scaling
Flexibility: Ideal for handling large volumes of unstructured and semi-structured data
Performance: Optimized for high-speed read/write operations

Limitations

Consistency: Some NoSQL databases sacrifice consistency for availability and partition tolerance (CAP theorem)
Complexity: Can require more complex data modeling and application logic
Common Use Cases

Big Data Applications: Real-time analytics, IoT data storage
Content Management: Storing and serving large volumes of user-generated content

Real-World Examples of NoSQL Databases

MongoDB: Used by companies like eBay for its flexibility and scalability.
Cassandra: Employed by Netflix for handling massive amounts of streaming data.
Redis: Utilized by X (formerly Twitter) for real-time analytics and caching.
Actian Zen: Embedded database built for IoT and the intelligent edge. Used by 13,000+ companies.
HCL Informix: Small footprint and self-managing. Widely used in financial services, logistics, and retail.
Actian NoSQL: Object-oriented database used by the European Space Agency (ESA).

3. In-Memory Databases

Strengths
Speed: Extremely fast read/write operations due to in-memory storage
Low Latency: Ideal for applications requiring rapid data access

Limitations

Cost: High memory costs compared to disk storage
Durability: Data can be lost if not backed up properly

Common Use Cases

Real-Time Analytics: Financial trading platforms, fraud detection systems
Caching: Accelerating web applications by storing frequently accessed data

Real-World Examples of In-Memory Databases

Redis: Used by GitHub to manage session storage and caching.
SAP HANA: Powers real-time business applications and analytics.
Actian Vector: One of the world’s fastest columnar databases for OLAP workload.

Combinations of two or more database models are often developed to address specific use cases or requirements that cannot be fully met by a single type alone. Actian Vector blends OLAP principles, relational database functionality, and in-memory processing, enabling accelerated query performance for real-time analysis of large datasets. The resulting capability showcases the technical versatility of modern database platforms.

4. Graph Databases

Strengths

Relationships: Optimized for storing and querying relationships between entities
Flexibility: Handles complex data structures and connections

Limitations

Complexity: Requires understanding of graph theory and specialized query languages
Scalability: Can be challenging to scale horizontally

Common Use Cases

Social Networks: Managing user connections and interactions
Recommendation Engines: Suggesting products or content based on user behavior

Real-World Examples of Graph Databases

Neo4j: Used by LinkedIn to manage and analyze connections and recommendations.
Amazon Neptune: Supports Amazon’s personalized recommendation systems.

Factors to Consider in Database Selection

Selecting the right database involves evaluating multiple factors to ensure it meets the specific needs of your applications and organization. As organizations continue to navigate the digital landscape, investing in the right database technology will be crucial for sustaining growth and achieving long-term success. Here are some considerations:

1. Data Structure and Type

Structured vs. Unstructured: Choose relational databases for structured data and NoSQL for unstructured or semi-structured data.
Complex Relationships: Opt for graph databases if your application heavily relies on relationships between data points.

2. Scalability Requirements

Vertical vs. Horizontal Scaling: Consider NoSQL databases for applications needing horizontal scalability.
Future Growth: For growing data needs, cloud-based databases offer scalable solutions.

3. Performance Needs

Latency: In-memory databases are ideal for applications requiring high-speed transactions, real-time data access, and low-latency access.
Throughput: High-throughput applications may benefit from NoSQL databases.

4. Consistency and Transaction Needs

ACID Compliance: If your application requires strict transaction guarantees, a relational database might be the best choice.
Eventual Consistency: NoSQL databases often provide eventual consistency, suitable for applications where immediate consistency is not critical.

5. Cost Considerations

Budget: Factor in both initial setup costs and ongoing licensing, maintenance, and support.
Resource Requirements: Consider the hardware and storage costs associated with different database types.

6. Ecosystem and Support

Community and Vendor Support: Evaluate the availability of support, documentation, and community resources.
Integration: Ensure that the database can integrate seamlessly with your existing systems and applications.

Databases are foundational to modern digital infrastructure. By leveraging the right database for the right use case, organizations can meet their specific needs and leverage data as a strategic asset. In the end, the goal is not just to store data but to harness its full potential to gain a competitive edge.

About Dee Radh

Data Intelligence

Building a Marketplace for Data Mesh: Facilitating Data Product – Part 1

Actian Corporation

May 28, 2024

person building a marketplace for data mesh via laptop

Given these shortcomings, a new concept is gaining popularity: the internal marketplace, or what we call the Enterprise Data Marketplace (EDM).

Facilitating data product consumption through metadata
Setting up an enterprise-level marketplace
Feeding the marketplace via domain-specific data catalogs

Before diving into the internal marketplace, let’s quickly go back to the notion of a data product, which we believe is the cornerstone of the data mesh and the first step in transforming data management.

Sharing and Exploiting Data Products Through Metadata

As mentioned in our previous series on data mesh, a data product is a governed, reusable, scalable dataset offering data quality and compliance guarantees to various regulations and internal rules. Note that this definition is quite restrictive – it excludes other types of products such as machine learning algorithms, models, or dashboards.

While these artifacts should be managed as products, they are not data products. There are other types of products, which could be very generally termed “Analytics Products”, of which data products are one subset.

In practice, an operational data product consists of two things:

Data – Materialized on a centralized or decentralized data platform, guaranteeing data addressing, interoperability, and access security.
Metadata – Providing all the necessary information for sharing and using the data.

Metadata ensures consumers have all the information they need to use the product.

It typically covers the following aspects:

Schema – Providing the technical structure of the data product, data classification, samples, and their origin (lineage).
Governance – Identifying the product owner(s), its successive versions, its possible deprecation, etc.
Semantics – Providing a clear definition of the exposed information, ideally linked to the organization’s business glossary and comprehensive documentation of the data product.
Contract – Defining quality guarantees, consumption modalities (protocols and security), potential usage restrictions, redistribution rules, etc.

In the data mesh logic, these metadata are managed by the product team and are deployed according to the same lifecycle as data and pipelines. There remains a fundamental question: where can metadata be deployed?

Using a Data Marketplace to Deploy Metadata

Most organizations already have a metadata management system, usually in the form of a Data Catalog.

But data catalogs, in their current form, have major drawbacks:

They don’t always support the notion of a data product – it must be more or less emulated with other concepts.
They are complex to use – designed to catalog a large number of assets with sometimes very fine granularity, they often suffer from a lack of adoption beyond centralized data management teams.
They mostly impose a rigid and unique organization of data, decided and designed centrally – which fails to reflect the variety of different domains or the organization’s evolution as the data mesh expands.
Their search capabilities are often limited, particularly for exploratory aspects – it’s often necessary to know what you’re looking for to be able to find it.
The experience they offer sometimes lacks the simplicity users aspire to – search with a few keywords, identify the appropriate data product, and then trigger the operational process of an access request or data delivery.

The internal marketplace, or Enterprise Data Marketplace (EDM) is therefore a new concept gaining popularity in the data mesh circle. Like a general-purpose marketplace, the EDM aims to provide a shopping experience for data consumers. It is thus an essential component to ensure the exploitation of the data mesh on a larger scale – it allows data consumers to have a simple and effective system to search for and access data products from various domains.

In our next article, learn the different ways to set up an internal data marketplace, and how it is essential for data mesh exploitation.

About Actian Corporation

Actian Life

Actian Life: Celebrating Our Author and Our SEO Award Winner

Actian Corporation

May 20, 2024

Headshots of Ron Weber and Thomas Schweser representing the joy of being a part of Actian Life

Actian employees Thomas Schweser coauthored a book on graph theory and Ron Weber earned a 2023 SEO Evangelist Edgie award, exemplifying Actian’s culture of innovation.

At Actian, we believe that our employees’ achievements are a strong reflection of our vibrant culture and innovative spirit. That holds true whether employees are making breakthroughs and delivering value in their day jobs or finding success in work-adjacent activities.

Today, we’re proud to shine a spotlight on Thomas Schweser, who co-wrote a book on graph theory called “Brooks’ Theorem,” and Ron Weber, who received BrightEdge’s 2023 SEO Evangelist Edgie award. They showcase employee achievements in two highly competitive areas.

Graph Theory Guru and SEO Maestro

Schweser is a research engineer on Actian’s Vector team based in Ilmenau, Germany. His book, published by Springer, focuses on graph coloring and critical graphics, which are a niche but important area of discrete mathematics.

While graph theory isn’t his primary focus at work, he appreciates its ubiquitous presence in the technology world. “Graphs are everywhere, especially in computer science,” he explains. “They make complex information digestible and help visualize relationships clearly.”

His book offers a valuable resource for those studying or utilizing Brooks’ Theorem—which states a relationship between the maximum degree of a graph and its chromatic number. “The book gives an overview of all the important graph coloring theorems and trends that have occurred over the last decades,” Schweser points out. “It should serve as a nice book if you want to give a college lecture on graph coloring.”

On the U.S. side of the business, Weber is the Senior Director, Web Communications and SEO, for Actian. Based in San Diego, he leads efforts to enable coworkers across the organization to succeed with and leverage SEO-driven content. As soon as he joined Actian a couple years ago, he went right to work on a complete website redesign while creating an aggressive content development schedule and building a formal SEO program from scratch.

The results were immediate and impressive:

A whopping 96% improvement in Actian content that comes up on the first page in search results because of strong keywords and robust content.
33% improvement in second page results, and 51% increase in third page results.
7% improvement in website traffic.
1% boost in conversion rates.
Overall increases in website traffic, lead volume, and qualified leads.

“We grew our website traffic exponentially from 2022 to 2023, and again in 2024,” Weber notes. “A lot of it was the content strategy, like insisting that we start developing a lot of reader-friendly content. This is not a surprise—you can’t have SEO without content, and I’ve been advocating for SEO since I got here.”

Pursuing Passions Leads to Successes

Schweser and Weber’s successes are the direct result of pursuing their passions. Weber’s journey into SEO began in the early days of the internet when he was helping clients with website optimization and using paid search engines to drive results. His passion for search engine optimization has only grown since then, which mirrors the importance for Actian to place near the top in internet search results.

“If we think about how we want companies to migrate to Actian, we have to know what they’re searching for and we need to have content around that part of the journey,” he explains. “More than 90% of the customer journey involves companies using search engines, so we need to meet them at every step.”

Weber continues to stay ahead of changes in search engine algorithms that impact page rankings. He enjoys seeing Actian place high in search results that feature specific keywords. “We’re number four right now in a search term against 23 billion results,” he notes. “That to me is a thrill—you get to number four or even number one against millions and millions of index pages—and that excitement never gets old.”

Schweser’s journey to having a book published began in 2015 when he was finishing his bachelor’s thesis. A professor, Michael Stiebitz, shared an early version of the book that he was working on with his colleague Bjarne Toft. That draft served as the starting point for Schweser’s master’s thesis and later his Ph.D. thesis. The three collaborated, gathered examples and papers about the theorem from across decades, and co-authored the book together.

“In 2020, I was asked to join the book as a coauthor, and of course I accepted,” he relates. “A lot of the research that I was dealing with in my PhD thesis also made it into the book.”

Commitments to End Goals Are Validated

The SEO award is particularly gratifying for Weber because it validates his ongoing efforts at Actian. “It’s meaningful because it shows that our strategy works and that our team’s hard work pays off,” he notes.

He challenges himself and his team to continue evolving their strategy to engage and retain website visitors. “Our play is, ‘How do we bring people to our site? How do we engage them with good content? How do we get them to do the thing that we want them to do?’” he explains. “We have to understand how to acquire, convert, and then retain them over time.”

Weber credits the Actian leadership team, especially CMO Jennifer Jackson, for supporting his efforts, including investing in the tools needed to build and measure the success of a modern website. “This is very much their award too,” he says. “When I see our CMO showcase our site, it makes our work very meaningful.”

For Schweser, the book was a culmination of his ongoing interest and research in Brooks’ Theorem. “There was no comprehensive overview of all the recent trends in graph coloring theory,” he points out. “A lot of people were writing papers, but nobody tried to collect all of them, and nobody was trying to figure out the large trends that exist there. That’s what we did with our book.”

Helping the Next Generation of Employees

One area that Schweser and Weber have in common is their enthusiasm for helping students who are about to enter the workforce. Schweser, along with coworkers, works with interns on Actian projects, while Weber is an adjunct professor for content marketing at the University of California, San Diego.

Schweser is excited about mentoring the next generation of tech talent and has helped guide numerous students through hands-on projects that actively contributed to Actian goals and product releases. Over the last year, his office has mentored about 10 students.

“Along with my colleague Steffen Kläbe, I’m responsible for the German student program at Actian,” he says. “We try to find students from the universities who want to do an internship with us or want to write their thesis in collaboration with Actian. I have always enjoyed working with students, and it’s great that Actian offers us the opportunity to continue doing that here.”

Weber also has experience mentoring college students by serving as an adjunct professor. He teaches students about the value of SEO and how to optimize SEO platforms to drive results. In addition, he has experience working with interns and supporting them as they transition to full-time careers.

Many Paths to Innovation

Actian prides itself on innovation. As Schweser and Weber have demonstrated, there are many ways to innovate and drive success. Having a clear strategy, the right resources, and strong backing leads to exceptional results.

Their achievements reflect Actian’s culture of supporting and valuing all employees’ contributions. Employees’ diverse backgrounds and ability to combine different perspectives ultimately enable outstanding solutions. Whether it’s writing and researching graph theory or creating award-winning SEO strategies, Actian employees show how to achieve innovation in their fields.

About Actian Corporation

Data Integration

Top 5 Data Integration Use Cases for Data Leaders

Dee Radh

May 13, 2024

Summary

Chief Data Officers (CDOs) and Chief Information Officers (CIOs) play critical roles in navigating the complexities of modern data environments. As data grows exponentially and spans across cloud, on-premises, and various SaaS applications, the challenge of integrating and managing this data becomes increasingly daunting. In the guide Top 5 Data Integration Use Cases, we explore five key data integration use cases that empower business users by enabling seamless access, consolidation, and analysis of data. These use cases highlight the significance of robust data integration solutions in driving efficiency, informed decision-making, and overall business success.

Modern organizations face significant data integration challenges due to the exponential growth of cloud-based data. With the surge in projects fueled by cloud computing, IoT, and sophisticated ecosystems, there is an intensified pressure on data integration initiatives. Effective data integration strategies are necessary to leverage data and other technologies across multiple platforms such as SaaS applications, cloud-based data warehouses, and internal systems.

As digital transformations advance, the need for efficient data delivery methods grows, encompassing both on-premises and cloud-based endpoints. Integration capabilities provided as a service have emerged as a robust solution to meet the evolving demands of modern data integration. The widespread adoption of SaaS applications among line-of-business (LOB) users is a significant driver for cloud-based integration solutions. Business users require a straightforward way to exchange data across various SaaS applications, often without IT’s involvement.

However, integrating data stored in apps and enterprise systems typically necessitates IT assistance, creating barriers to data access and causing blind spots in both on-premises and cloud data. Enterprise systems hold crucial data that can provide insights into customer interactions, payments, support issues, and other business areas. Yet, this data is often isolated and safeguarded as mission-critical assets.

For effective integration, a solution is needed that enables secure information sharing with all users, independent of engineering or IT resources. Empowering LOBs to access and integrate data securely and independently helps avoid delays and bottlenecks associated with traditional integration methods. Ensuring critical information is easily accessible to all employees is essential for maintaining a competitive advantage, adapting swiftly to evolving business conditions, and building a data-driven culture.

Below are five typical use cases that can benefit from a modern data platform with self-service data integration:

Data Consolidation and Access

Data platforms with integration capabilities empower business users to access and leverage data stored in data warehouses, as well as on-premises and cloud-based data. Equipped with pre-built connectors, data quality features, and scheduling functions, these platforms minimize IT involvement. Business users can create tailored integration scenarios, effortlessly retrieving pertinent data from various sources, leading to improved decision-making and valuable insights tailored to user needs.

Process Automation

Integration and automation enhance efficiency and streamline operations. Through system integration and task automation, companies can accelerate data processing and analysis, enabling faster access to information. This saves significant time and allows business users to focus on more strategic endeavors. Automation optimizes workflows, minimizes errors, and ultimately improves operational efficiency.

Sales and Marketing Alignment

Integrating CRM systems with marketing automation platforms ensures seamless data flow between sales and marketing teams, optimizing lead management and customer engagement. This integration enhances revenue generation processes and facilitates informed decision-making through real-time tracking and analysis of customer data. By aligning sales and marketing efforts, businesses boost productivity and achieve cohesive goals faster, driving growth and delivering exceptional customer experiences.

Customer 360

Integrating customer data from various touchpoints, such as website interactions, support tickets, and sales interactions, offers a comprehensive understanding of each customer. This holistic view allows marketing teams to personalize activities based on individual customer preferences and behaviors. Integrated data helps identify patterns and trends, maximizing marketing efforts and better controlling budgets. It also enhances customer service, enabling businesses to anticipate and address customer needs effectively.

Real-Time Reporting and Analytics

Integrating operational systems with business intelligence (BI) tools empowers business users to access real-time insights and reports, facilitating data-driven decision-making. Real-time reporting and analytics are indispensable for competitiveness in today’s fast-paced market, allowing businesses to react quickly to market changes and improve customer service with up-to-date information.

Data integration is a strategic necessity for organizations aiming to leverage their data effectively. For CDOs and CIOs, investing in robust data integration solutions is not just about addressing immediate challenges but also about laying the foundation for long-term success. By embracing the use cases outlined above, organizations can empower their teams, streamline operations, and drive sustainable growth. Ultimately, a well-integrated data environment enables leaders to make informed decisions, adapt swiftly to changes, and maintain a competitive edge in the marketplace.

For data leaders dealing with data that resides on-premises, in the cloud, and in hybrid environments, downloading the Top 5 Data Integration Use Cases guide is an essential step towards eliminating data silos.

About Dee Radh

Data Management

Modernizing Data Architectures in the Public Sector

Tim Williams

May 7, 2024

In our current digital landscape where trusted and integrated data plays an increasingly critical role for business success, the public sector is facing a significant challenge—how to modernize their data architecture to connect and share data. Strategic modernization is needed to manage the ever-growing volumes of diverse data while ensuring quality, efficient service delivery to meet the changing needs of government employees, citizens, and other stakeholders.

Relying on legacy systems in the public sector can lead to problems such as:

An inability to scale to meet current and future data needs.
A lack of integration capabilities creates barriers to data sharing.
Manual processes cause inefficiencies and increase the risk of errors.
Limited data accessibility leads to delays in data-driven processes.
Analysts don’t trust siloed data, hindering decision-making.
An increased risk of cybersecurity threats and breaches.

To solve these challenges and foster a data-driven culture, public sector organizations must move away from antiquated technologies to a modern, agile infrastructure. This will allow every person and every application that needs timely and accurate data to easily access it.

Embrace Hybrid Cloud Solutions as a First Step

One proven solution to data challenges is to implement hybrid cloud technologies. These technologies span third-party cloud services and on-premises infrastructure. Organizations benefit from the ultra-fast scalability, cost advantages, and efficiency of the cloud while also optimizing on-prem investments.

A hybrid approach lets organizations transition to the cloud at their own pace as part of their modernization efforts, while benefitting from apps or systems that run best on-premises. A gradual migration also helps minimize disruption and maintains data integrity.

For example, in the UK, local councils and even large government organizations are accustomed to siloed systems that require manual input and ongoing employee intervention to bring the silos together. These fragmented systems cause inefficiencies compared to modern and automated processes. This necessitates a shift to responsive systems that can handle organizations’ modern data needs.

Moving to the cloud can be complex due to legacy systems being deeply entrenched in operational processes and storing essential data. To make the migration as smooth as possible, organizations need to use a hybrid cloud data platform and work with an experienced vendor that has experience in data integration.

Make Data Integration and Data Access Completely Seamless

To be a modern and digital-first organization, public sector agencies must have the ability to integrate disparate data sources from a myriad of systems and bring data out of organizational silos. The data must then be made available to employees at all skill levels. Select data also needs to be made available to citizens and other organizations. The data can then be utilized for everything from informing decision-making to forming policies.

Modernizing systems and infrastructure can be more economical, too. Legacy systems may seem financially advantageous in the short term, but over time, maintenance costs, downtime, and barriers to using data will quickly increase the total cost of ownership (TCO). A strategic and well-executed modernization plan supported by advanced data management technologies can reduce overall operational costs, automate processes, gain public trust, and accelerate digital transformation initiatives.

Ongoing modernization efforts should include a plan to integrate advanced technologies such as machine learning, artificial intelligence (AI), and generative AI. This helps public organizations bring together systems and technologies to build a fully connected ecosystem that makes it easy to integrate, manage, and share data, and support new use cases.

It’s worth noting that for AI and GenAI initiatives to be successful, organizations must first ensure their data is ready. This means the data is prepared and has the quality needed to drive trusted outcomes. Training an AI model on inaccurate, untrustworthy data will produce unreliable results.

Take a Future-Looking Approach to Connecting Data

A comprehensive data management strategy enables public sector organizations to predict and quickly respond to changes, make integrated data actionable, and better meet the needs of the public. Like their counterparts in the private sector, public organizations need to prioritize their modernization efforts. They also need to stay current on technological advancements and integrate the ones that meet the specific needs of their organization.

By adopting scalable, secure, and integrated data management solutions, the public sector can pave the way for a more efficient, responsive, connected, and data-driven future. Actian can help with these efforts. The Actian Data Platform allows organizations to easily connect data and build new pipelines. The platform can integrate into an organization’s existing infrastructure to meet their changing needs, including providing real-time data access at scale.

The platform simplifies today’s complex data environment by breaking down siloes, providing a unified approach to data, and bringing together data from diverse sources. In addition, the modern platform helps future-proof organizations by offering comprehensive data services spanning data integration, management, and accessibility. These capabilities facilitate a data-driven approach, enabling quick, reliable decisions across the public sector.

About Tim Williams

Tim Williams is an Account Director at Actian, advising organizations on data governance, quality, and real-time analytics. He has a broad range of expertise from enterprise to SMB, with a special focus on public sector challenges. Tim offers best practices on unifying data across systems, presenting at government tech seminars to share success stories. Check out his Actian blog posts for advice on modern data governance and continuous analytics at scale.

Data Intelligence

The Journey to Data Mesh – Part 4 – Federated Computational Governance

Actian Corporation

May 6, 2024

While the literature on data mesh is extensive, it often describes a final state, rarely how to achieve it in practice. The question then arises:

What approach should be adopted to transform data management and implement a data mesh?

In this series of articles, get an excerpt from our Practical Guide to Data Mesh where we propose an approach to kick off a data mesh journey in your organization, structured around the four principles of data mesh (domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance) and leveraging existing human and technological resources.

Part 1: Scoping Your Pilot Project
Part 2: Assembling a Development Team & Data Platform for the Pilot Project
Part 3: Creating Your First Data Products
Part 4: Implementing Federated Computational Governance

Throughout this series of articles, and in order to illustrate this approach for building the foundations of a successful data mesh, we will rely on an example: that of the fictional company Premium Offices – a commercial real estate company whose business involves acquiring properties to lease to businesses.

In the previous articles of the series, we’ve identified the domains, defined an initial use case, assembled the team responsible for its development, and created our first data products. Now, it’s time to move on to the final data mesh principle, federated computational governance.

What is Federated Computational Governance?

Federated computational governance refers to a system of governance where decision-making processes are distributed across multiple entities or organizations, using computational algorithms and distributed technologies. In this system, decision-making authority is decentralized, with each participating entity retaining a degree of autonomy while collaborating within a broader framework. Federated computational governance’s key characteristics are:

Decentralization: Decision-making authority is distributed among multiple entities rather than concentrated in a single central authority.
Computational Algorithms: Algorithms play a significant role in governing processes, helping to automate decision-making, enforce rules, and ensure transparency and fairness.
Collaborative Framework: Entities collaborate within a broader framework, sharing resources, data, and responsibilities to achieve common goals.
Transparency and Accountability: Using computational algorithms and distributed ledgers can enhance transparency by providing a clear record of processes and ensuring accountability among participating entities.
Adaptability and Resilience: Federated computational governance systems are designed to be adaptable and resilient, capable of evolving and responding to changes in the environment or the needs of participants.

The Challenges of a Federated Governance in a Data Mesh

The fourth data mesh principle, federated computational governance, implies that a central body defines the rules and standards that domains must adhere to. Local leaders are responsible for implementing these rules in their domain and providing the central body with evidence of their compliance – usually in the form of reporting.

Although the model is theoretically simple, its implementation often faces internal cultural challenges. This is particularly the case in heavily regulated sectors, where centralized governance teams are reluctant to delegate all or part of the controls they historically had responsibility for.

Federated governance also faces a rarely favorable ground reality: data governance is closely linked to risk management and compliance, two areas that rarely excite operational teams.

Consequently, it becomes difficult to identify local responsible parties or to transfer certain aspects of governance to data product owners – who, for the most part, must already learn a new profession. Therefore, in most large organizations, the federated structure will likely be emulated by the central body and then gradually implemented in the domains as their maturity progresses.

To avoid an explosion of governance costs or fragmentation, Dehghani envisions that the data platform could eventually automatically support entire aspects of governance.

The Aspects of Governance That Can be Automated

We firmly believe in harnessing automation to address this challenge on multiple fronts:

Quality controls – Many solutions already exist.
Traceability – Development teams can already automatically extract complete lineage information from their data products and document transformations.
Fine-grained access policy management – There are already solutions, all of which rely at least on tagging information.

With a little imagination, one could even imagine generative AI analyzing transformation SQL queries and translating them into natural language (solutions exist).The road is long, of course, but decentralization allows for iterative progress, domain by domain, product by product. And let’s also remember that any progress in automating governance, in whatever aspect, relies on the production and processing of metadata.

Premium Offices Example:

At Premium Offices, the Data Office has a very defensive governance culture – as the company operates in the capital market, it is subject to strict regulatory constraints.

As part of the pilot, it was decided not to impact the governance framework. Quality and traceability remain the responsibility of the Data Office and will be addressed retroactively with their tools and methods. Access control will also be its responsibility – a process is already in place, in the form of a ServiceNow workflow (setting permissions on BigQuery requires several manual operations and reviews). The only concession is that the workflow will be modified so that access requests are verified by the Data Product Owner before being approved and processed by the Data Office. In other words, a small step toward federated governance.

Regarding metadata, the new tables and views in BigQuery must be documented, at both the conceptual and physical levels, in the central data catalog (which is unaware of the concept of data product). It is a declarative process that the pilot team already knows. Any column tagging will be done by the Data Office after evaluation.

For the rest, user documentation for data products will be disseminated in a dedicated space on the internal wiki, organized by domain, which allows for very rich and structured documentation and has a decent search engine.

The Practical Guide to Data Mesh: Setting up and Supervising an Enterprise-Wide Data Mesh

Written by Guillaume Bodet, our guide was designed to arm you with practical strategies for implementing data mesh in your organization, helping you:

Start your data mesh journey with a focused pilot project.
Discover efficient methods for scaling up your data mesh.
Acknowledge the pivotal role an internal marketplace plays in facilitating the effective consumption of data products.
Learn how the Actian Data Intelligence Platform emerges as a robust supervision system, orchestrating an enterprise-wide data mesh.

Get the eBook.

About Actian Corporation

Data Analytics

Real-Time Analytics for Smarter Decision-Making in Public Services

Tim Williams

April 30, 2024

Real-Time Analytics for Smarter Decision

Consumers and citizens are accustomed to getting instant answers and results from businesses. They expect the same lightning-fast responses from the public sector, too. Likewise, employees at public sector organizations need the ability to quickly access and utilize data—including employees without advanced technical or analytics skills—to identify and address citizens’ needs.

Giving employees the information to meet citizen demand and answer their questions requires public sector organizations to capture and analyze data in real-time. Real-time data supports intelligent decision-making, automation, and other business-critical functions.

Easily accessible and trusted data can also increase operational effectiveness, predict risk with greater accuracy, and ultimately increase satisfaction for citizens. That data must be secure while still enabling frictionless sharing between departments for collaboration and use cases.

This naturally leads to a pressing question—How can your organization achieve real-time analytics to benefit citizens and staff alike? The answer, at a foundational level, is to implement a modern, high-performance data platform.

Make Efficient Data Utilization a Priority

Achieving a digital transformation in the public sector involves more than upgrading technology. It entails rethinking how services are delivered, how data is shared, and how your infrastructure handles current and future workloads. Too often in public service organizations, just like with their counterparts in the private sector, legacy systems are limiting the effectiveness of data.

These systems lack the scalability and integration needed to support digital transformation efforts. They also face limitations making trusted data available when and where it’s needed, including availability for real-time data analytics. Providing the data, analytics, and IT capabilities required by modern organizations is only possible with a modern and scalable data platform. This type of platform is designed to integrate systems and operations, capture and share all relevant data to predict and respond quickly to changes, and improve service delivery to citizens.

At the same time, modernization efforts that include a cloud migration can be complex. This is often due to the vast amounts of data that need to be moved to the cloud and the legacy systems entrenched in organizational processes. That’s why you need a clear and proven strategy and to work with an experienced vendor to make the transition seamless while ensuring data quality.

Meet Demand for Real-Time Analytics

Hybrid cloud data platforms have emerged as a proven solution for integrating and sharing data in the public sector. By combining on-premises infrastructure with cloud-based services, these platforms offer the flexibility, scalability, and capability to manage, integrate, and share large data volumes.

Another benefit of hybrid solutions is that they allow organizations to optimize their on-premises investments while keeping costs from spiraling out of control in the cloud—unlimited scaling in the cloud can have costs associated with it. Public sector organizations can use a hybrid platform to deliver uninterrupted service, even during peak times or critical events, while making data available in real time for analytics, apps, or other needs.

Smart decision-making demands accurate, trustworthy, and integrated data. This means that upstream, you need a platform capable of seamlessly integrating data and adding new data pipelines—without relying on IT or advanced coding.

Likewise, manual processes and IT intervention will quickly bog down an organization. For example, when a social housing team needs data from multiple systems to ensure buildings meet safety regulations, accessing and analyzing the information might take days or weeks—with no guarantee the data is trustworthy. Automating the pipelines reduces time to insights and ensures data quality measures are in place to catch errors and duplication.

Data integration is essential to breaking down data silos, providing deeper context and relevancy to data, and ensuring the most informed decisions possible. For example, central government agencies can use the data to drive national policies while identifying issues and needs, and strategically allocating resources.

Expect New Value and Use Cases With Real-Time Analytics

Moving from legacy systems to a modern platform and migrating to the cloud at a pace your organization is comfortable with enables a range of benefits:

Lower long-term costs and total cost of ownership (TCO).
Enhanced service delivery.
Gain the trust of data users and the public.
Have confidence in the data and analytic insights.
Immediate scalability coupled with increased flexibility.

With a solution like the Actian Data Platform, you can do even more. For example, the platform lets you easily connect, transform, and manage data. The data platform enables real-time data access at scale along with real-time analytics. Public sector organizations can benefit, for instance, by using the data to craft employee benefits programs, housing policies, tax guidelines, and other government programs.

The Actian Data Platform can integrate into your existing infrastructure and easily scale to meet changing needs. The platform makes data easy to use so you can better predict citizen needs, provide more personalized services, identify potential problems, and automate operations.

Taking a modern approach to data management, integration, and quality, along with having the ability to process, store, and analyze even large and complex data sets, allows you to digitally transform faster and be better positioned for intelligent decision-making. As the public sector strives to effectively serve the needs of the public in a cost-effective, sustainable, and responsible way, data-driven decision-making will play a greater role for all stakeholders.

Building a Marketplace for Data Mesh: Domain Data Catalogs – Part 3

Metadata Management in the Context of an Internal Marketplace fed by Domain-Specific Catalogs

Data Catalog vs. EDM Capabilities

The Practical Guide to Data Mesh: Setting up and Supervising an Enterprise-Wide Data Mesh

About Actian Corporation

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Your Company is Ready for GenAI. But is Your Data?

The GenAI Foundation: Data Preparation

Garbage In, Garbage Out

Accuracy

Consistency

Completeness

The Keystone: Data Quality

Relevance

Timeliness

Trustworthiness

Real-World Implications

Marketing and Personalization

Product Development

Healthcare and Diagnostics

The Path Forward: Investing in Data Readiness

Data Audits

Data Governance

Advanced Data Preparation Tools

Training and Culture

The Symbiosis of Data and GenAI

About Dee Radh

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Actian Ingres 12.0 Enhances Cloud Flexibility and Offers Faster Analytics

A Database Built for Your Modernization Journey

Backup to Cloud and Disaster Recovery

Fortified Protection

Improved Performance and Workload Automation

Elevated Developer Experiences

Leading Database Modernization and Innovation

About Emma McGrattan

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Building a Marketplace for Data Mesh: Enterprise-Level Marketplace – Part 2

Three Main Options for Setting up an Internal Data Marketplace

Develop It

Integrate a Solution From the Market

Use Existing Systems

The Drawbacks of Commercial Marketplaces

Actian Data Intelligence Platform’s Enterprise Data Marketplace

About Actian Corporation

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Actian Ingres Disaster Recovery

Example

Summary

About Emma McGrattan

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Types of Databases, Pros & Cons, and Real-World Examples

Types of Information Stored in Databases

Telecommunications: Verizon

E-commerce: Amazon

Finance: JPMorgan Chase

Healthcare: Mayo Clinic

Types of Databases

1. Relational Databases

2. NoSQL Databases

3. In-Memory Databases

4. Graph Databases

Factors to Consider in Database Selection

1. Data Structure and Type