Data Management

Actian Ingres Disaster Recovery

Emma McGrattan

May 31, 2024

Most production Actian Ingres installations need some degree of disaster recovery (DR). Options range from shipping nightly database checkpoints to off-site storage locations to near real-time replication to a dedicated off-site DR site.

Actian Ingres enterprise hybrid database that ships with built-in checkpoint and journal shipping features which provide the basic building blocks for constructing low-cost, efficient DR implementations. One such implementation is IngresSync, which utilizes Actian Ingres’ native checkpoint/journal shipping and incremental roll-forward capabilities to implement a cost-effective DR solution.

ingressync

IngresSync works on the concept of source and target Actian Ingres installations. The source installation is the currently active production environment. The target, or multiple targets if needed, kept current by an IngresSync job scheduled to execute on a user-defined interval. Each sync operation copies only journals created since the previous sync and applies those transactions to the targets. Checkpoints taken on the source node are automatically copied to and rolled forward on all targets.

Example

Suppose we have an environment where the production installation is hosted on node corp and we need to create two DR sites dreast and drwest.

The DR nodes each need:

An Ingres installation at the same version and patch level as corp.
Passwordless SSH configured to and from the other nodes.
Ingres/Net VNODE entries to the other nodes.

DR nodes for Ingress

To configure this environment, we must first designate the source and target hosts and apply the latest source checkpoint to the targets.

ingresSync --source=corp --target=dreast,drwest --database=corpdb --iid=II --ckpsync --restart

source and target hosts for Ingress

The two target installations are now synched with the source, and the target databases are in incremental rollforward (INCR_RFP) state. This state allows journals to be applied incrementally to keep the targets in sync with the source. Incremental rollforward is performed by:

ingresSync --hosts=corp,dreast,drwest --database=corpdb --iid=II --jnlsync

When executed, this will close the current journal on the source, copy new journals to the targets, and roll forward those journals to the targets. The journal sync step should be configured to execute at regular intervals using the system scheduler, such as cron. Frequent execution results in minimal sync delay between the source and targets.

The target installations at dreast and drwest are now in sync with the source installation at corp. Should the corp environment experience a hardware or software failure, we can designate one of the target nodes as the new source and direct client connections to that node. In this case, we’ll designate drwest as the new source and dreast will remain as a target (DR site).

ingresSync --target=drwest --database=corpdb --iid=II --incremental_done

This takes the drwest corpdb database out of incremental rollforward mode; the database will now execute both read and update transactions and is the new source. The dreast database is still in incremental rollforward mode and will continue to functioning as a DR target node.

drwest for ingress

Since the corp node is no longer available, the journal sync job must be started on either drwest or dreast. The journal sync job can be configured and scheduled to execute on all three nodes using the –strict flag. In this case, the job determines if it executes on the current source node; if so it will execute normally. If executing on a target, the job will simply terminate. This configuration allows synchronization to continue even as node roles change.

Once corp is back online it can be brought back into the configuration as a DR target.

ingresSync --source=drwest --target=corp --database=corpdb --iid=II --ckpsync --restart

dr target for Ingress

At some point, we may need to revert to the original configuration with corp as the source. The steps are:

Terminate all database connections to drwest

Sync

corp

 with

drwest

 to ensure

corp

 is current
ingresSync --source=drwest --target=corp --database=corpdb --iid=II

--jnlsync

Reassign node roles

ingresSync --target=corp --database=corpdb --iid=II --incremental_done

ingresSync --source=corp --target=drwest --database=corpdb --iid=II

--ckpsync --restart

revert to original corp as source for Ingress

Summary

IngresSync is one mechanism for implementing a DR solution. It is generally appropriate in cases where some degree of delay is acceptable and the target installations have little or no database user activity. Target databases can be used for read only/reporting applications with the stipulation that incremental rollforwards cannot run while there are active database connections. The rollforward process will catch up on the first refresh cycle when there are no active database connections.

The main pros and cons of the alternative methods of delivering disaster recovery for Actian Ingres are outlined below:

Feature	Checkpoint Shipping	IngresSync	Replication
Scope	Database	Database	Table
Granularity	Database	Journal	Transaction
Sync Frequency	Checkpoint	User Defined	Transaction
Target Database	Read/Write(1)	Read Only	Read/Write(2)

Target database supports read and write operations but all changes are lost on the next checkpoint refresh.
Target database supports read and write operations but there may be update conflicts that require manual resolution.

Note: IngresSync currently runs on Linux and Microsoft Windows. Windows environments require the base Cygwin package and rsync.

About Emma McGrattan

Emma McGrattan is CTO at Actian, leading global R&D in high-performance analytics, data management, and integration. With over two decades at Actian, Emma holds multiple patents in data technologies and has been instrumental in driving innovation for mission-critical applications. She is a recognized authority, frequently speaking at industry conferences like Strata Data, and she's published technical papers on modern analytics. In her Actian blog posts, Emma tackles performance optimization, hybrid cloud architectures, and advanced analytics strategies. Explore her top articles to unlock data-driven success.

Databases

Types of Databases, Pros & Cons, and Real-World Examples

Dee Radh

May 30, 2024

Summary

This blog offers a comprehensive overview of major database models—including relational, NoSQL, in‑memory, graph, and hybrid types—highlighting their strengths, weaknesses, and real-world use cases to guide decision-makers in selecting the right database for their needs.

Relational (SQL): Ideal for structured, ACID-compliant workloads—great for transactions and complex queries—but can struggle with horizontal scaling and rigid schema.
NoSQL (Document, Key‑Value, Columnar): Offers high flexibility and horizontal scalability for large, unstructured data sets; may sacrifice consistency, require complex modeling, and incur training/development costs.
In‑Memory & Graph/Hybid Models: In‑memory databases deliver ultra-low latency; graph databases simplify relationship-heavy queries. Hybrid systems (like Actian’s) combine OLTP and OLAP strengths for real-world analytic performance.

Databases are the unsung heroes behind nearly every digital interaction, powering applications, enabling insights, and driving business decisions. They provide a structured and efficient way to store vast amounts of data. Unlike traditional file storage systems, databases allow for the organization of data into tables, rows, and columns, making it easy to retrieve and manage information. This structured approach, coupled with data governance best practices, ensures data integrity, reduces redundancy, and enhances the ability to perform complex queries. Whether it’s handling customer information, financial transactions, inventory levels, or user preferences, databases underpin the functionality and performance of applications across industries.

Types of Information Stored in Databases

Telecommunications: Verizon

Verizon uses databases to manage its vast network infrastructure, monitor service performance, and analyze customer data. This enables the company to optimize network operations, quickly resolve service issues, and offer personalized customer support. By leveraging database technology, Verizon can maintain a high level of service quality and customer satisfaction.

E-commerce: Amazon

Amazon relies heavily on databases to manage its vast inventory, process millions of transactions, and personalize customer experiences. The company’s sophisticated database systems enable it to recommend products, optimize delivery routes, and manage inventory levels in real-time, ensuring a seamless shopping experience for customers.

Finance: JPMorgan Chase

JPMorgan Chase uses databases to analyze financial markets, assess risk, and manage customer accounts. By leveraging advanced database technologies, the bank can perform complex financial analyses, detect fraudulent activities, and ensure regulatory compliance, maintaining its position as a leader in the financial industry.

Healthcare: Mayo Clinic

Mayo Clinic utilizes databases to store and analyze patient records, research data, and treatment outcomes. This data-driven approach allows the clinic to provide personalized care, conduct cutting-edge research, and improve patient outcomes. By integrating data from various sources, Mayo Clinic can deliver high-quality healthcare services and advance medical knowledge.

Types of Databases

The choice between relational and non-relational databases depends on the specific requirements of your application. Relational databases are ideal for scenarios requiring strong data integrity, complex queries, and structured data. In contrast, non-relational databases excel in scalability, flexibility, and handling diverse data types, making them suitable for big data, real-time analytics, and content management applications.

Types of databases: Relational databases and non-relational databases

Image ⓒ Existek

1. Relational Databases

Strengths

Structured Data: Ideal for storing structured data with predefined schemas
ACID Compliance: Ensures transactions are atomic, consistent, isolated, and durable (ACID)
SQL Support: Widely used and supported SQL for querying and managing data

Limitations

Scalability: Can struggle with horizontal scaling
Flexibility: Less suited for unstructured or semi-structured data

Common Use Cases

Transactional Systems: Banking, e-commerce, and order management
Enterprise Applications: Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems

Real-World Examples of Relational Databases

MySQL: Widely used in web applications like WordPress.
PostgreSQL: Used by organizations like Instagram for complex queries and data integrity.
Oracle Database: Powers large-scale enterprise applications in finance and government sectors.
Actian Ingres: Widely used by enterprises and public sector like the Republic of Ireland.

2. NoSQL Databases

Strengths

Scalability: Designed for horizontal scaling
Flexibility: Ideal for handling large volumes of unstructured and semi-structured data
Performance: Optimized for high-speed read/write operations

Limitations

Consistency: Some NoSQL databases sacrifice consistency for availability and partition tolerance (CAP theorem)
Complexity: Can require more complex data modeling and application logic
Common Use Cases

Big Data Applications: Real-time analytics, IoT data storage
Content Management: Storing and serving large volumes of user-generated content

Real-World Examples of NoSQL Databases

MongoDB: Used by companies like eBay for its flexibility and scalability.
Cassandra: Employed by Netflix for handling massive amounts of streaming data.
Redis: Utilized by X (formerly Twitter) for real-time analytics and caching.
Actian Zen: Embedded database built for IoT and the intelligent edge. Used by 13,000+ companies.
HCL Informix: Small footprint and self-managing. Widely used in financial services, logistics, and retail.
Actian NoSQL: Object-oriented database used by the European Space Agency (ESA).

3. In-Memory Databases

Strengths
Speed: Extremely fast read/write operations due to in-memory storage
Low Latency: Ideal for applications requiring rapid data access

Limitations

Cost: High memory costs compared to disk storage
Durability: Data can be lost if not backed up properly

Common Use Cases

Real-Time Analytics: Financial trading platforms, fraud detection systems
Caching: Accelerating web applications by storing frequently accessed data

Real-World Examples of In-Memory Databases

Redis: Used by GitHub to manage session storage and caching.
SAP HANA: Powers real-time business applications and analytics.
Actian Vector: One of the world’s fastest columnar databases for OLAP workload.

Combinations of two or more database models are often developed to address specific use cases or requirements that cannot be fully met by a single type alone. Actian Vector blends OLAP principles, relational database functionality, and in-memory processing, enabling accelerated query performance for real-time analysis of large datasets. The resulting capability showcases the technical versatility of modern database platforms.

4. Graph Databases

Strengths

Relationships: Optimized for storing and querying relationships between entities
Flexibility: Handles complex data structures and connections

Limitations

Complexity: Requires understanding of graph theory and specialized query languages
Scalability: Can be challenging to scale horizontally

Common Use Cases

Social Networks: Managing user connections and interactions
Recommendation Engines: Suggesting products or content based on user behavior

Real-World Examples of Graph Databases

Neo4j: Used by LinkedIn to manage and analyze connections and recommendations.
Amazon Neptune: Supports Amazon’s personalized recommendation systems.

Factors to Consider in Database Selection

Selecting the right database involves evaluating multiple factors to ensure it meets the specific needs of your applications and organization. As organizations continue to navigate the digital landscape, investing in the right database technology will be crucial for sustaining growth and achieving long-term success. Here are some considerations:

1. Data Structure and Type

Structured vs. Unstructured: Choose relational databases for structured data and NoSQL for unstructured or semi-structured data.
Complex Relationships: Opt for graph databases if your application heavily relies on relationships between data points.

2. Scalability Requirements

Vertical vs. Horizontal Scaling: Consider NoSQL databases for applications needing horizontal scalability.
Future Growth: For growing data needs, cloud-based databases offer scalable solutions.

3. Performance Needs

Latency: In-memory databases are ideal for applications requiring high-speed transactions, real-time data access, and low-latency access.
Throughput: High-throughput applications may benefit from NoSQL databases.

4. Consistency and Transaction Needs

ACID Compliance: If your application requires strict transaction guarantees, a relational database might be the best choice.
Eventual Consistency: NoSQL databases often provide eventual consistency, suitable for applications where immediate consistency is not critical.

5. Cost Considerations

Budget: Factor in both initial setup costs and ongoing licensing, maintenance, and support.
Resource Requirements: Consider the hardware and storage costs associated with different database types.

6. Ecosystem and Support

Community and Vendor Support: Evaluate the availability of support, documentation, and community resources.
Integration: Ensure that the database can integrate seamlessly with your existing systems and applications.

Databases are foundational to modern digital infrastructure. By leveraging the right database for the right use case, organizations can meet their specific needs and leverage data as a strategic asset. In the end, the goal is not just to store data but to harness its full potential to gain a competitive edge.

About Dee Radh

As Senior Director of Product Marketing, Dee Radh heads product marketing for Actian. Prior to that, she held senior PMM roles at Talend and Formstack. Dee has spent 100% of her career bringing technology products to market. Her expertise lies in developing strategic narratives and differentiated positioning for GTM effectiveness. In addition to a post-graduate diploma from the University of Toronto, Dee has obtained certifications from Pragmatic Institute, Product Marketing Alliance, and Reforge. Dee is based out of Toronto, Canada.

Data Intelligence

Building a Marketplace for Data Mesh: Facilitating Data Product – Part 1

Actian Corporation

May 28, 2024

person building a marketplace for data mesh via laptop

Over the past decade, data catalogs have emerged as important pillars in the landscape of data-driven initiatives. However, many vendors on the market fall short of expectations with lengthy timelines, complex and costly projects, bureaucratic data governance models, poor user adoption rates, and low-value creation. This discrepancy extends beyond metadata management projects, reflecting a broader failure at the data management level.

Given these shortcomings, a new concept is gaining popularity: the internal marketplace, or what we call the Enterprise Data Marketplace (EDM).

In this series of articles, get an excerpt from our Practical Guide to Data Mesh where we explain the value of internal data marketplaces for data product production and consumption, how an EDM supports data mesh exploitation on a larger scale, and how they go hand-in-hand with a data catalog solution:

Facilitating data product consumption through metadata
Setting up an enterprise-level marketplace
Feeding the marketplace via domain-specific data catalogs

Before diving into the internal marketplace, let’s quickly go back to the notion of a data product, which we believe is the cornerstone of the data mesh and the first step in transforming data management.

Sharing and Exploiting Data Products Through Metadata

As mentioned in our previous series on data mesh, a data product is a governed, reusable, scalable dataset offering data quality and compliance guarantees to various regulations and internal rules. Note that this definition is quite restrictive – it excludes other types of products such as machine learning algorithms, models, or dashboards.

While these artifacts should be managed as products, they are not data products. There are other types of products, which could be very generally termed “Analytics Products”, of which data products are one subset.

In practice, an operational data product consists of two things:

Data – Materialized on a centralized or decentralized data platform, guaranteeing data addressing, interoperability, and access security.
Metadata – Providing all the necessary information for sharing and using the data.

Metadata ensures consumers have all the information they need to use the product.

It typically covers the following aspects:

Schema – Providing the technical structure of the data product, data classification, samples, and their origin (lineage).
Governance – Identifying the product owner(s), its successive versions, its possible deprecation, etc.
Semantics – Providing a clear definition of the exposed information, ideally linked to the organization’s business glossary and comprehensive documentation of the data product.
Contract – Defining quality guarantees, consumption modalities (protocols and security), potential usage restrictions, redistribution rules, etc.

In the data mesh logic, these metadata are managed by the product team and are deployed according to the same lifecycle as data and pipelines. There remains a fundamental question: where can metadata be deployed?

Using a Data Marketplace to Deploy Metadata

Most organizations already have a metadata management system, usually in the form of a Data Catalog.

But data catalogs, in their current form, have major drawbacks:

They don’t always support the notion of a data product – it must be more or less emulated with other concepts.
They are complex to use – designed to catalog a large number of assets with sometimes very fine granularity, they often suffer from a lack of adoption beyond centralized data management teams.
They mostly impose a rigid and unique organization of data, decided and designed centrally – which fails to reflect the variety of different domains or the organization’s evolution as the data mesh expands.
Their search capabilities are often limited, particularly for exploratory aspects – it’s often necessary to know what you’re looking for to be able to find it.
The experience they offer sometimes lacks the simplicity users aspire to – search with a few keywords, identify the appropriate data product, and then trigger the operational process of an access request or data delivery.

The internal marketplace, or Enterprise Data Marketplace (EDM) is therefore a new concept gaining popularity in the data mesh circle. Like a general-purpose marketplace, the EDM aims to provide a shopping experience for data consumers. It is thus an essential component to ensure the exploitation of the data mesh on a larger scale – it allows data consumers to have a simple and effective system to search for and access data products from various domains.

In our next article, learn the different ways to set up an internal data marketplace, and how it is essential for data mesh exploitation.

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.

Actian Life

Actian Life: Celebrating Our Author and Our SEO Award Winner

Actian Corporation

May 20, 2024

Headshots of Ron Weber and Thomas Schweser representing the joy of being a part of Actian Life

Actian employees Thomas Schweser coauthored a book on graph theory and Ron Weber earned a 2023 SEO Evangelist Edgie award, exemplifying Actian’s culture of innovation.

At Actian, we believe that our employees’ achievements are a strong reflection of our vibrant culture and innovative spirit. That holds true whether employees are making breakthroughs and delivering value in their day jobs or finding success in work-adjacent activities.

Today, we’re proud to shine a spotlight on Thomas Schweser, who co-wrote a book on graph theory called “Brooks’ Theorem,” and Ron Weber, who received BrightEdge’s 2023 SEO Evangelist Edgie award. They showcase employee achievements in two highly competitive areas.

Graph Theory Guru and SEO Maestro

Schweser is a research engineer on Actian’s Vector team based in Ilmenau, Germany. His book, published by Springer, focuses on graph coloring and critical graphics, which are a niche but important area of discrete mathematics.

While graph theory isn’t his primary focus at work, he appreciates its ubiquitous presence in the technology world. “Graphs are everywhere, especially in computer science,” he explains. “They make complex information digestible and help visualize relationships clearly.”

His book offers a valuable resource for those studying or utilizing Brooks’ Theorem—which states a relationship between the maximum degree of a graph and its chromatic number. “The book gives an overview of all the important graph coloring theorems and trends that have occurred over the last decades,” Schweser points out. “It should serve as a nice book if you want to give a college lecture on graph coloring.”

On the U.S. side of the business, Weber is the Senior Director, Web Communications and SEO, for Actian. Based in San Diego, he leads efforts to enable coworkers across the organization to succeed with and leverage SEO-driven content. As soon as he joined Actian a couple years ago, he went right to work on a complete website redesign while creating an aggressive content development schedule and building a formal SEO program from scratch.

The results were immediate and impressive:

A whopping 96% improvement in Actian content that comes up on the first page in search results because of strong keywords and robust content.
33% improvement in second page results, and 51% increase in third page results.
7% improvement in website traffic.
1% boost in conversion rates.
Overall increases in website traffic, lead volume, and qualified leads.

“We grew our website traffic exponentially from 2022 to 2023, and again in 2024,” Weber notes. “A lot of it was the content strategy, like insisting that we start developing a lot of reader-friendly content. This is not a surprise—you can’t have SEO without content, and I’ve been advocating for SEO since I got here.”

Pursuing Passions Leads to Successes

Schweser and Weber’s successes are the direct result of pursuing their passions. Weber’s journey into SEO began in the early days of the internet when he was helping clients with website optimization and using paid search engines to drive results. His passion for search engine optimization has only grown since then, which mirrors the importance for Actian to place near the top in internet search results.

“If we think about how we want companies to migrate to Actian, we have to know what they’re searching for and we need to have content around that part of the journey,” he explains. “More than 90% of the customer journey involves companies using search engines, so we need to meet them at every step.”

Weber continues to stay ahead of changes in search engine algorithms that impact page rankings. He enjoys seeing Actian place high in search results that feature specific keywords. “We’re number four right now in a search term against 23 billion results,” he notes. “That to me is a thrill—you get to number four or even number one against millions and millions of index pages—and that excitement never gets old.”

Schweser’s journey to having a book published began in 2015 when he was finishing his bachelor’s thesis. A professor, Michael Stiebitz, shared an early version of the book that he was working on with his colleague Bjarne Toft. That draft served as the starting point for Schweser’s master’s thesis and later his Ph.D. thesis. The three collaborated, gathered examples and papers about the theorem from across decades, and co-authored the book together.

“In 2020, I was asked to join the book as a coauthor, and of course I accepted,” he relates. “A lot of the research that I was dealing with in my PhD thesis also made it into the book.”

Commitments to End Goals Are Validated

The SEO award is particularly gratifying for Weber because it validates his ongoing efforts at Actian. “It’s meaningful because it shows that our strategy works and that our team’s hard work pays off,” he notes.

He challenges himself and his team to continue evolving their strategy to engage and retain website visitors. “Our play is, ‘How do we bring people to our site? How do we engage them with good content? How do we get them to do the thing that we want them to do?’” he explains. “We have to understand how to acquire, convert, and then retain them over time.”

Weber credits the Actian leadership team, especially CMO Jennifer Jackson, for supporting his efforts, including investing in the tools needed to build and measure the success of a modern website. “This is very much their award too,” he says. “When I see our CMO showcase our site, it makes our work very meaningful.”

For Schweser, the book was a culmination of his ongoing interest and research in Brooks’ Theorem. “There was no comprehensive overview of all the recent trends in graph coloring theory,” he points out. “A lot of people were writing papers, but nobody tried to collect all of them, and nobody was trying to figure out the large trends that exist there. That’s what we did with our book.”

Helping the Next Generation of Employees

One area that Schweser and Weber have in common is their enthusiasm for helping students who are about to enter the workforce. Schweser, along with coworkers, works with interns on Actian projects, while Weber is an adjunct professor for content marketing at the University of California, San Diego.

Schweser is excited about mentoring the next generation of tech talent and has helped guide numerous students through hands-on projects that actively contributed to Actian goals and product releases. Over the last year, his office has mentored about 10 students.

“Along with my colleague Steffen Kläbe, I’m responsible for the German student program at Actian,” he says. “We try to find students from the universities who want to do an internship with us or want to write their thesis in collaboration with Actian. I have always enjoyed working with students, and it’s great that Actian offers us the opportunity to continue doing that here.”

Weber also has experience mentoring college students by serving as an adjunct professor. He teaches students about the value of SEO and how to optimize SEO platforms to drive results. In addition, he has experience working with interns and supporting them as they transition to full-time careers.

Many Paths to Innovation

Actian prides itself on innovation. As Schweser and Weber have demonstrated, there are many ways to innovate and drive success. Having a clear strategy, the right resources, and strong backing leads to exceptional results.

Their achievements reflect Actian’s culture of supporting and valuing all employees’ contributions. Employees’ diverse backgrounds and ability to combine different perspectives ultimately enable outstanding solutions. Whether it’s writing and researching graph theory or creating award-winning SEO strategies, Actian employees show how to achieve innovation in their fields.

About Actian Corporation

Data Integration

Top 5 Data Integration Use Cases for Data Leaders

Dee Radh

May 13, 2024

Summary

Chief Data Officers (CDOs) and Chief Information Officers (CIOs) play critical roles in navigating the complexities of modern data environments. As data grows exponentially and spans across cloud, on-premises, and various SaaS applications, the challenge of integrating and managing this data becomes increasingly daunting. In the guide Top 5 Data Integration Use Cases, we explore five key data integration use cases that empower business users by enabling seamless access, consolidation, and analysis of data. These use cases highlight the significance of robust data integration solutions in driving efficiency, informed decision-making, and overall business success.

Modern organizations face significant data integration challenges due to the exponential growth of cloud-based data. With the surge in projects fueled by cloud computing, IoT, and sophisticated ecosystems, there is an intensified pressure on data integration initiatives. Effective data integration strategies are necessary to leverage data and other technologies across multiple platforms such as SaaS applications, cloud-based data warehouses, and internal systems.

As digital transformations advance, the need for efficient data delivery methods grows, encompassing both on-premises and cloud-based endpoints. Integration capabilities provided as a service have emerged as a robust solution to meet the evolving demands of modern data integration. The widespread adoption of SaaS applications among line-of-business (LOB) users is a significant driver for cloud-based integration solutions. Business users require a straightforward way to exchange data across various SaaS applications, often without IT’s involvement.

However, integrating data stored in apps and enterprise systems typically necessitates IT assistance, creating barriers to data access and causing blind spots in both on-premises and cloud data. Enterprise systems hold crucial data that can provide insights into customer interactions, payments, support issues, and other business areas. Yet, this data is often isolated and safeguarded as mission-critical assets.

For effective integration, a solution is needed that enables secure information sharing with all users, independent of engineering or IT resources. Empowering LOBs to access and integrate data securely and independently helps avoid delays and bottlenecks associated with traditional integration methods. Ensuring critical information is easily accessible to all employees is essential for maintaining a competitive advantage, adapting swiftly to evolving business conditions, and building a data-driven culture.

Below are five typical use cases that can benefit from a modern data platform with self-service data integration:

Data Consolidation and Access

Data platforms with integration capabilities empower business users to access and leverage data stored in data warehouses, as well as on-premises and cloud-based data. Equipped with pre-built connectors, data quality features, and scheduling functions, these platforms minimize IT involvement. Business users can create tailored integration scenarios, effortlessly retrieving pertinent data from various sources, leading to improved decision-making and valuable insights tailored to user needs.

Process Automation

Integration and automation enhance efficiency and streamline operations. Through system integration and task automation, companies can accelerate data processing and analysis, enabling faster access to information. This saves significant time and allows business users to focus on more strategic endeavors. Automation optimizes workflows, minimizes errors, and ultimately improves operational efficiency.

Sales and Marketing Alignment

Integrating CRM systems with marketing automation platforms ensures seamless data flow between sales and marketing teams, optimizing lead management and customer engagement. This integration enhances revenue generation processes and facilitates informed decision-making through real-time tracking and analysis of customer data. By aligning sales and marketing efforts, businesses boost productivity and achieve cohesive goals faster, driving growth and delivering exceptional customer experiences.

Customer 360

Integrating customer data from various touchpoints, such as website interactions, support tickets, and sales interactions, offers a comprehensive understanding of each customer. This holistic view allows marketing teams to personalize activities based on individual customer preferences and behaviors. Integrated data helps identify patterns and trends, maximizing marketing efforts and better controlling budgets. It also enhances customer service, enabling businesses to anticipate and address customer needs effectively.

Real-Time Reporting and Analytics

Integrating operational systems with business intelligence (BI) tools empowers business users to access real-time insights and reports, facilitating data-driven decision-making. Real-time reporting and analytics are indispensable for competitiveness in today’s fast-paced market, allowing businesses to react quickly to market changes and improve customer service with up-to-date information.

Data integration is a strategic necessity for organizations aiming to leverage their data effectively. For CDOs and CIOs, investing in robust data integration solutions is not just about addressing immediate challenges but also about laying the foundation for long-term success. By embracing the use cases outlined above, organizations can empower their teams, streamline operations, and drive sustainable growth. Ultimately, a well-integrated data environment enables leaders to make informed decisions, adapt swiftly to changes, and maintain a competitive edge in the marketplace.

For data leaders dealing with data that resides on-premises, in the cloud, and in hybrid environments, downloading the Top 5 Data Integration Use Cases guide is an essential step towards eliminating data silos.

About Dee Radh

Data Management

Modernizing Data Architectures in the Public Sector

Tim Williams

May 7, 2024

In our current digital landscape where trusted and integrated data plays an increasingly critical role for business success, the public sector is facing a significant challenge—how to modernize their data architecture to connect and share data. Strategic modernization is needed to manage the ever-growing volumes of diverse data while ensuring quality, efficient service delivery to meet the changing needs of government employees, citizens, and other stakeholders.

Relying on legacy systems in the public sector can lead to problems such as:

An inability to scale to meet current and future data needs.
A lack of integration capabilities creates barriers to data sharing.
Manual processes cause inefficiencies and increase the risk of errors.
Limited data accessibility leads to delays in data-driven processes.
Analysts don’t trust siloed data, hindering decision-making.
An increased risk of cybersecurity threats and breaches.

To solve these challenges and foster a data-driven culture, public sector organizations must move away from antiquated technologies to a modern, agile infrastructure. This will allow every person and every application that needs timely and accurate data to easily access it.

Embrace Hybrid Cloud Solutions as a First Step

One proven solution to data challenges is to implement hybrid cloud technologies. These technologies span third-party cloud services and on-premises infrastructure. Organizations benefit from the ultra-fast scalability, cost advantages, and efficiency of the cloud while also optimizing on-prem investments.

A hybrid approach lets organizations transition to the cloud at their own pace as part of their modernization efforts, while benefitting from apps or systems that run best on-premises. A gradual migration also helps minimize disruption and maintains data integrity.

For example, in the UK, local councils and even large government organizations are accustomed to siloed systems that require manual input and ongoing employee intervention to bring the silos together. These fragmented systems cause inefficiencies compared to modern and automated processes. This necessitates a shift to responsive systems that can handle organizations’ modern data needs.

Moving to the cloud can be complex due to legacy systems being deeply entrenched in operational processes and storing essential data. To make the migration as smooth as possible, organizations need to use a hybrid cloud data platform and work with an experienced vendor that has experience in data integration.

Make Data Integration and Data Access Completely Seamless

To be a modern and digital-first organization, public sector agencies must have the ability to integrate disparate data sources from a myriad of systems and bring data out of organizational silos. The data must then be made available to employees at all skill levels. Select data also needs to be made available to citizens and other organizations. The data can then be utilized for everything from informing decision-making to forming policies.

Modernizing systems and infrastructure can be more economical, too. Legacy systems may seem financially advantageous in the short term, but over time, maintenance costs, downtime, and barriers to using data will quickly increase the total cost of ownership (TCO). A strategic and well-executed modernization plan supported by advanced data management technologies can reduce overall operational costs, automate processes, gain public trust, and accelerate digital transformation initiatives.

Ongoing modernization efforts should include a plan to integrate advanced technologies such as machine learning, artificial intelligence (AI), and generative AI. This helps public organizations bring together systems and technologies to build a fully connected ecosystem that makes it easy to integrate, manage, and share data, and support new use cases.

It’s worth noting that for AI and GenAI initiatives to be successful, organizations must first ensure their data is ready. This means the data is prepared and has the quality needed to drive trusted outcomes. Training an AI model on inaccurate, untrustworthy data will produce unreliable results.

Take a Future-Looking Approach to Connecting Data

A comprehensive data management strategy enables public sector organizations to predict and quickly respond to changes, make integrated data actionable, and better meet the needs of the public. Like their counterparts in the private sector, public organizations need to prioritize their modernization efforts. They also need to stay current on technological advancements and integrate the ones that meet the specific needs of their organization.

By adopting scalable, secure, and integrated data management solutions, the public sector can pave the way for a more efficient, responsive, connected, and data-driven future. Actian can help with these efforts. The Actian Data Platform allows organizations to easily connect data and build new pipelines. The platform can integrate into an organization’s existing infrastructure to meet their changing needs, including providing real-time data access at scale.

The platform simplifies today’s complex data environment by breaking down siloes, providing a unified approach to data, and bringing together data from diverse sources. In addition, the modern platform helps future-proof organizations by offering comprehensive data services spanning data integration, management, and accessibility. These capabilities facilitate a data-driven approach, enabling quick, reliable decisions across the public sector.

Our new eBook “Accelerate a Digital Transformation in the UK Public Sector” offers proven approaches to help organizations meet their need for a modern infrastructure that connects data, ensures quality, and builds trust in the data. The eBook can help the public sector achieve new levels of automation and modernization to enable intelligent growth, faster outcomes, and digital services.

About Tim Williams

Tim Williams is an Account Director at Actian, advising organizations on data governance, quality, and real-time analytics. He has a broad range of expertise from enterprise to SMB, with a special focus on public sector challenges. Tim offers best practices on unifying data across systems, presenting at government tech seminars to share success stories. Check out his Actian blog posts for advice on modern data governance and continuous analytics at scale.

Data Intelligence

The Journey to Data Mesh – Part 4 – Federated Computational Governance

Actian Corporation

May 6, 2024

While the literature on data mesh is extensive, it often describes a final state, rarely how to achieve it in practice. The question then arises:

What approach should be adopted to transform data management and implement a data mesh?

In this series of articles, get an excerpt from our Practical Guide to Data Mesh where we propose an approach to kick off a data mesh journey in your organization, structured around the four principles of data mesh (domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance) and leveraging existing human and technological resources.

Part 1: Scoping Your Pilot Project
Part 2: Assembling a Development Team & Data Platform for the Pilot Project
Part 3: Creating Your First Data Products
Part 4: Implementing Federated Computational Governance

Throughout this series of articles, and in order to illustrate this approach for building the foundations of a successful data mesh, we will rely on an example: that of the fictional company Premium Offices – a commercial real estate company whose business involves acquiring properties to lease to businesses.

In the previous articles of the series, we’ve identified the domains, defined an initial use case, assembled the team responsible for its development, and created our first data products. Now, it’s time to move on to the final data mesh principle, federated computational governance.

What is Federated Computational Governance?

Federated computational governance refers to a system of governance where decision-making processes are distributed across multiple entities or organizations, using computational algorithms and distributed technologies. In this system, decision-making authority is decentralized, with each participating entity retaining a degree of autonomy while collaborating within a broader framework. Federated computational governance’s key characteristics are:

Decentralization: Decision-making authority is distributed among multiple entities rather than concentrated in a single central authority.
Computational Algorithms: Algorithms play a significant role in governing processes, helping to automate decision-making, enforce rules, and ensure transparency and fairness.
Collaborative Framework: Entities collaborate within a broader framework, sharing resources, data, and responsibilities to achieve common goals.
Transparency and Accountability: Using computational algorithms and distributed ledgers can enhance transparency by providing a clear record of processes and ensuring accountability among participating entities.
Adaptability and Resilience: Federated computational governance systems are designed to be adaptable and resilient, capable of evolving and responding to changes in the environment or the needs of participants.

The Challenges of a Federated Governance in a Data Mesh

The fourth data mesh principle, federated computational governance, implies that a central body defines the rules and standards that domains must adhere to. Local leaders are responsible for implementing these rules in their domain and providing the central body with evidence of their compliance – usually in the form of reporting.

Although the model is theoretically simple, its implementation often faces internal cultural challenges. This is particularly the case in heavily regulated sectors, where centralized governance teams are reluctant to delegate all or part of the controls they historically had responsibility for.

Federated governance also faces a rarely favorable ground reality: data governance is closely linked to risk management and compliance, two areas that rarely excite operational teams.

Consequently, it becomes difficult to identify local responsible parties or to transfer certain aspects of governance to data product owners – who, for the most part, must already learn a new profession. Therefore, in most large organizations, the federated structure will likely be emulated by the central body and then gradually implemented in the domains as their maturity progresses.

To avoid an explosion of governance costs or fragmentation, Dehghani envisions that the data platform could eventually automatically support entire aspects of governance.

The Aspects of Governance That Can be Automated

We firmly believe in harnessing automation to address this challenge on multiple fronts:

Quality controls – Many solutions already exist.
Traceability – Development teams can already automatically extract complete lineage information from their data products and document transformations.
Fine-grained access policy management – There are already solutions, all of which rely at least on tagging information.

With a little imagination, one could even imagine generative AI analyzing transformation SQL queries and translating them into natural language (solutions exist).The road is long, of course, but decentralization allows for iterative progress, domain by domain, product by product. And let’s also remember that any progress in automating governance, in whatever aspect, relies on the production and processing of metadata.

Premium Offices Example:

At Premium Offices, the Data Office has a very defensive governance culture – as the company operates in the capital market, it is subject to strict regulatory constraints.

As part of the pilot, it was decided not to impact the governance framework. Quality and traceability remain the responsibility of the Data Office and will be addressed retroactively with their tools and methods. Access control will also be its responsibility – a process is already in place, in the form of a ServiceNow workflow (setting permissions on BigQuery requires several manual operations and reviews). The only concession is that the workflow will be modified so that access requests are verified by the Data Product Owner before being approved and processed by the Data Office. In other words, a small step toward federated governance.

Regarding metadata, the new tables and views in BigQuery must be documented, at both the conceptual and physical levels, in the central data catalog (which is unaware of the concept of data product). It is a declarative process that the pilot team already knows. Any column tagging will be done by the Data Office after evaluation.

For the rest, user documentation for data products will be disseminated in a dedicated space on the internal wiki, organized by domain, which allows for very rich and structured documentation and has a decent search engine.

The Practical Guide to Data Mesh: Setting up and Supervising an Enterprise-Wide Data Mesh

Written by Guillaume Bodet, our guide was designed to arm you with practical strategies for implementing data mesh in your organization, helping you:

Start your data mesh journey with a focused pilot project.
Discover efficient methods for scaling up your data mesh.
Acknowledge the pivotal role an internal marketplace plays in facilitating the effective consumption of data products.
Learn how the Actian Data Intelligence Platform emerges as a robust supervision system, orchestrating an enterprise-wide data mesh.

Get the eBook.

About Actian Corporation

Data Analytics

Real-Time Analytics for Smarter Decision-Making in Public Services

Tim Williams

April 30, 2024

Real-Time Analytics for Smarter Decision

Consumers and citizens are accustomed to getting instant answers and results from businesses. They expect the same lightning-fast responses from the public sector, too. Likewise, employees at public sector organizations need the ability to quickly access and utilize data—including employees without advanced technical or analytics skills—to identify and address citizens’ needs.

Giving employees the information to meet citizen demand and answer their questions requires public sector organizations to capture and analyze data in real-time. Real-time data supports intelligent decision-making, automation, and other business-critical functions.

Easily accessible and trusted data can also increase operational effectiveness, predict risk with greater accuracy, and ultimately increase satisfaction for citizens. That data must be secure while still enabling frictionless sharing between departments for collaboration and use cases.

This naturally leads to a pressing question—How can your organization achieve real-time analytics to benefit citizens and staff alike? The answer, at a foundational level, is to implement a modern, high-performance data platform.

Make Efficient Data Utilization a Priority

Achieving a digital transformation in the public sector involves more than upgrading technology. It entails rethinking how services are delivered, how data is shared, and how your infrastructure handles current and future workloads. Too often in public service organizations, just like with their counterparts in the private sector, legacy systems are limiting the effectiveness of data.

These systems lack the scalability and integration needed to support digital transformation efforts. They also face limitations making trusted data available when and where it’s needed, including availability for real-time data analytics. Providing the data, analytics, and IT capabilities required by modern organizations is only possible with a modern and scalable data platform. This type of platform is designed to integrate systems and operations, capture and share all relevant data to predict and respond quickly to changes, and improve service delivery to citizens.

At the same time, modernization efforts that include a cloud migration can be complex. This is often due to the vast amounts of data that need to be moved to the cloud and the legacy systems entrenched in organizational processes. That’s why you need a clear and proven strategy and to work with an experienced vendor to make the transition seamless while ensuring data quality.

Meet Demand for Real-Time Analytics

Hybrid cloud data platforms have emerged as a proven solution for integrating and sharing data in the public sector. By combining on-premises infrastructure with cloud-based services, these platforms offer the flexibility, scalability, and capability to manage, integrate, and share large data volumes.

Another benefit of hybrid solutions is that they allow organizations to optimize their on-premises investments while keeping costs from spiraling out of control in the cloud—unlimited scaling in the cloud can have costs associated with it. Public sector organizations can use a hybrid platform to deliver uninterrupted service, even during peak times or critical events, while making data available in real time for analytics, apps, or other needs.

Smart decision-making demands accurate, trustworthy, and integrated data. This means that upstream, you need a platform capable of seamlessly integrating data and adding new data pipelines—without relying on IT or advanced coding.

Likewise, manual processes and IT intervention will quickly bog down an organization. For example, when a social housing team needs data from multiple systems to ensure buildings meet safety regulations, accessing and analyzing the information might take days or weeks—with no guarantee the data is trustworthy. Automating the pipelines reduces time to insights and ensures data quality measures are in place to catch errors and duplication.

Data integration is essential to breaking down data silos, providing deeper context and relevancy to data, and ensuring the most informed decisions possible. For example, central government agencies can use the data to drive national policies while identifying issues and needs, and strategically allocating resources.

Expect New Value and Use Cases With Real-Time Analytics

Moving from legacy systems to a modern platform and migrating to the cloud at a pace your organization is comfortable with enables a range of benefits:

Lower long-term costs and total cost of ownership (TCO).
Enhanced service delivery.
Gain the trust of data users and the public.
Have confidence in the data and analytic insights.
Immediate scalability coupled with increased flexibility.

With a solution like the Actian Data Platform, you can do even more. For example, the platform lets you easily connect, transform, and manage data. The data platform enables real-time data access at scale along with real-time analytics. Public sector organizations can benefit, for instance, by using the data to craft employee benefits programs, housing policies, tax guidelines, and other government programs.

The Actian Data Platform can integrate into your existing infrastructure and easily scale to meet changing needs. The platform makes data easy to use so you can better predict citizen needs, provide more personalized services, identify potential problems, and automate operations.

Taking a modern approach to data management, integration, and quality, along with having the ability to process, store, and analyze even large and complex data sets, allows you to digitally transform faster and be better positioned for intelligent decision making. As the public sector strives to effectively serve the needs of the public in a cost-effective, sustainable, and responsible way, data-driven decision-making will play a greater role for all stakeholders.

The path toward an effective and responsive public sector lies in the power of data and a modern data platform. Our new eBook “Accelerate a Digital Transformation in the UK Public Sector” explains why a shift from legacy technologies to a modern infrastructure is essential for today’s organizations. The eBook shares how local councils and central government organizations can balance the need to modernize with maximizing investments in current on-prem systems, meeting the changing needs of the public, and making decisions with confidence.

About Tim Williams

Data Integration

8 Key Reasons to Consider a Hybrid Data Integration Solution

Dee Radh

April 23, 2024

The guide “8 Key Reasons to Consider a Hybrid Data Integration Solution” delves deeply into the complexities of managing vast and diverse data environments, particularly emphasizing the merits for data engineers working with on-premises data systems. This comprehensive guide illustrates how hybrid data integration not only addresses existing challenges but also prepares organizations for future data management needs.

One of the primary challenges highlighted in the guide is data fragmentation. Organizations today often operate on a mix of on-premises and cloud-based systems, leading to isolated data silos that can hinder efficient data utilization and business operations. Hybrid data integration solutions bridge these gaps, enabling seamless communication and data flow between disparate systems. This integration is crucial for supporting robust analytics and business intelligence capabilities, as it ensures that data from various sources can be consolidated into a cohesive, accessible format.

Data Processing, Security, and Compliance

Resource utilization is another significant aspect addressed by hybrid data integration. This approach allows data engineers to optimally allocate computing resources based on specific needs. For example, processing large volumes of data or handling peak loads may be more feasible using cloud resources due to their scalability and cost-effectiveness. Conversely, certain operations might be better suited to on-premises environments due to performance or security considerations. By balancing these resources, organizations can achieve a more efficient and cost-effective operation.

Security and compliance are paramount in data management, especially with the increasing emphasis on data privacy regulations across different regions. Hybrid data integration systems facilitate adherence to such regulations by enabling organizations to store and manage data in a manner that complies with local data sovereignty laws. Furthermore, these systems enhance data security through advanced measures such as encryption and detailed access controls, ensuring that sensitive information is protected across all environments.

Improving Resource Management

Business agility is greatly enhanced through hybrid data integration. The flexibility provided by these systems allows organizations to swiftly adapt to changing business conditions and data requirements. Whether scaling up operations to meet increased demand or integrating new technologies, hybrid solutions offer the agility needed to respond effectively to dynamic market conditions. This adaptability is crucial for maintaining competitiveness and operational efficiency.

Cost management is another benefit of hybrid data integration. By leveraging the specific advantages of both on-premises and cloud environments, organizations can optimize their operational expenditures. Cloud environments, for instance, allow for scaling resources up or down based on current needs, which can significantly reduce costs during periods of low demand. Conversely, maintaining critical operations on-premises can mitigate risks associated with third-party services and fluctuating cloud pricing models.

Building a Better Business Ecosystem

The guide also discusses the importance of digital ecosystems and the integration of services from various external partners. Hybrid data integration supports these efforts by enabling seamless data exchanges and collaborations, thereby expanding business capabilities and facilitating entry into new markets or sectors.

Self-service integration capabilities are emphasized as a key feature of modern hybrid systems. These allow business users and departmental staff to manage their integration tasks directly, which speeds up processes and enhances overall business agility. The empowerment of non-specialist users to perform complex integrations democratizes data handling and accelerates decision-making processes.

Lastly, the guide touches on the future-proof nature of hybrid data integration solutions. As technologies evolve, organizations equipped with hybrid systems can easily integrate new tools and platforms without needing to overhaul their existing data infrastructure. This readiness not only protects the organization’s investment in technology but also ensures it can continuously adapt to the latest innovations and industry standards.

In conclusion, hybrid data integration offers a versatile and strategic solution for managing complex data environments, particularly benefiting organizations that manage a blend of on-premises and cloud-based data systems. Actian Data Platform, a hybrid data integration solution addresses the multifaceted challenges presented throughout the guide. For data engineers dealing with data that resides on-premises, in the cloud, and in hybrid environments, downloading the “8 Key Reasons to Consider a Hybrid Data Integration Solution” guide is an essential step towards developing a future-proof data strategy.

About Dee Radh

Data Intelligence

The Journey to Data Mesh – Part 3 – Creating Your First Data Products

Actian Corporation

April 22, 2024

While the literature on data mesh is extensive, it often describes a final state, rarely how to achieve it in practice. The question then arises:

What approach should be adopted to transform data management and implement a data mesh?

Part 1: Scoping Your Pilot Project
Part 2: Assembling a Development Team & Data Platform for the Pilot Project
Part 3: Creating Your First Data Products
Part 4: Implementing Federated Computational Governance

In the initial articles of the series, we’ve identified the domains, defined an initial use case, and assembled the team responsible for its development. Now, it’s time to move on to the second data mesh principle, “data as a product,” by developing the first data products.

The Product-Thinking Approach of the Mesh

Over the past decade, domains have often developed a product culture around their operational capabilities. They offer their products to the rest of the organization as APIs that can be consumed and composed to develop new services and applications. In some organizations, teams strive to provide the best possible experience to developers using their domain APIs: search in a global catalog, comprehensive documentation, code examples, sandbox environments, guaranteed and monitored service levels, etc.

These APIs are then managed as products that are born, evolve over time (without compatibility breaks), enriched, and are eventually deprecated, usually replaced by a newer, more modern, more performant version.

The data mesh proposes to apply this same product-thinking approach to the data shared by the domains.

Data Products Characteristics

In some organizations, this product-oriented culture is already well established. In others, it will need to be developed or introduced. But let’s not be mistaken:

A data product is not a new digital artifact requiring new technical capabilities (like an API Product). It is simply the result of a particular data management approach exposed by a domain to the rest of the organization.

Managing APIs as a product did not require a technological breakthrough: existing middleware did the job just fine. Similarly, data products can be deployed on existing data infrastructures, whatever they may be. Technically, a data product can be a simple file in a data lake with an SQL interface; a small star schema, complemented by a few views facilitating querying, instantiated in a relational database; or even an API, a Kafka stream, an Excel file, etc.

A data product is not defined by how it is materialized but by how it is designed, managed, and governed; and by a set of characteristics allowing its large-scale exploitation within the organization.

These characteristics are often condensed into the acronym DATSIS (Discoverable, Addressable, Trustworthy, Self-describing, Interoperable, Secure).

In addition, obtaining a DATSIS data product does not require significant investments. It involves defining a set of global conventions that domains must follow (naming, supported protocols, access and permission management, quality controls, metadata, etc.). The operational implementation of these conventions usually does not require new technological capabilities – existing solutions are generally sufficient to get started.

An exception, however, is the catalog. It plays a central role in the deployment of the data mesh by allowing domains to publish information about their data products, and consumers to explore, search, understand, and exploit these data products.

Best Practices for Data Product Design

Designing a data product is certainly not an exact science – there could be only one product, or three or four. To guide this choice, it is once again useful to leverage some best practices from distributed architectures – a data product must:

Have a single and well-defined responsibility.
Have stable interfaces and ensure backward compatibility.
Be usable in several different contexts and therefore support polyglotism.

Data Products Developer Experience

Developer experience is also a fundamental aspect of the data mesh, with the ambition to converge the development of data products and the development of services or software components. It’s not just about being friendly to engineers but also about responding to a certain economic rationality:

The decentralization of data management implies that domains have their own resources to develop data products. In many organizations, the centralized data team is not large enough to support distributed teams. To ensure the success of the data mesh, it is essential to be able to draw from the pool of software engineers, which is often larger.

The state of the art in software development relies on a high level of automation: declarative allocation of infrastructure resources, automated unit and integration testing, orchestrated build and deployment via CI/CD tools, Git workflows for source and version management, automatic documentation publishing, etc.

The development of data products should converge toward this state of the art – and depending on the organization’s maturity, its teams, and its technological stack, this convergence will take more or less time. The right approach is to automate as much as possible using existing and mastered tools, then identify operations that are not automated to gradually integrate additional tooling.

In practice, here is what constitutes a data product:

Code first – For pipelines that feed the data product with data from different sources or other data products; for any consumption APIs of the data product; for testing pipelines and controlling data quality; etc.
Data, of course – But most often, the data exists in systems and is simply extracted and transformed by pipelines. Therefore, it is not present in the source code (excluding exceptions).
Metadata – Some of which document the data product: schema, semantics, syntax, quality, lineage, etc. Others are intended to ensure product governance at the mesh scale – contracts, responsibilities, access policies, usage restrictions, etc.
Infrastructure – Or more precisely, the declaration of the physical resources necessary to instantiate the data product: deployment and execution of code, deployment of metadata, resource allocation for storage, etc.

On the infrastructure side, the data mesh does not require new capabilities – the vast majority of organizations already have a data platform. The implementation of the data mesh also does not require a centralized platform. Some companies have already invested in a common platform, and it seems logical to leverage the capabilities of this platform to develop the mesh.But others have several platforms, some entities, or certain domains having their infrastructure. It is entirely possible to deploy the data mesh on these hybrid infrastructures: as long as the data products respect common standards for addressability, interoperability, and access control, the technical modalities of their execution are of little importance.

Premium Offices Example:

To establish an initial framework for the governance of its data mesh, Premium Offices has set the following rules:

A data product materializes as a dedicated project in BigQuery – this allows setting access rules at the project level, or more finely if necessary. These projects will be placed in a “data products” directory and a sub-directory bearing the name of the domain to which they belong (in our example, “Brokerage”).
Data products must offer views to access data – these views provide a stable consumption interface and potentially allow evolving the internal model of the product without impacting its consumers.
All data products must identify data using common references for common data (Clients, Products, Suppliers, Employees, etc.) – this simplifies cross-referencing data from different data products (LEI, product code, UPC, EAN, email address, etc.).
Access to data products requires strong authentication based on GCP’s IAM capabilities – using a service account is possible, but each user of a data product must then have a dedicated service account. When access policies depend on users, the end user’s identity must be used via OAuth2 authentication.
The norm is to grant access only to views – and not to the internal model.
Access requests are processed by the Data Product Owner through workflows established in ServiceNow.
DBT is the preferred ETL for implementing pipelines – each data product has a dedicated repository for its pipeline.
A data product can be consumed either via the JDBC protocol or via BigQuery APIs (read-only).
A data product must define its contract – data update frequency, quality levels, information classification, access policies, and usage restrictions.
The data product must publish its metadata and documentation in a marketplace – in the absence of an existing system, Premium Offices decides to document its first data products in a dedicated space on its company’s wiki.

This initial set of rules will of course evolve, but it sets a pragmatic framework to ensure the DATSIS characteristics of data products by exclusively leveraging existing technologies and skills. For its pilot, Premium Offices has chosen to decompose the architecture into two data products:

Tenancy Analytics – This first data product offers analytical capabilities on lease contracts – entity, parent company, property location, lease start date, lease end date, lease type, rent amount, etc. It is modeled in the form of a small star schema allowing analysis along 2 dimensions: time and tenant – these are the analysis dimensions needed to build the first version of the dashboard. It also includes one or two views that leverage the star schema to provide pre-aggregated data – these views constitute the public interface of the data product. Finally, it includes a view to obtain the most recent list of tenants.
Entity Ratings – This second data product provides historical ratings of entities in the form of a simple dataset and a mirror view to serve as an interface, in agreement with common rules. The rating is obtained from a specialized provider, which distributes them in the form of APIs. To invoke this API, a list of entities must be provided, obtained by consuming the appropriate interface of the Tenancy analytics product.

In conclusion, adopting the mindset of treating data as a product is essential for organizations undergoing data management decentralization. This approach cultivates a culture of accountability, standardization, and efficiency in handling data across different domains. By viewing data as a valuable asset and implementing structured management frameworks, organizations can ensure consistency, reliability, and seamless integration of data throughout their operations.

In our final article, we will go over the fourth and last principle of data mesh: federated computational governance.

The Practical Guide to Data Mesh: Setting up and Supervising an Enterprise-Wide Data Mesh

Written by Guillaume Bodet, our guide was designed to arm you with practical strategies for implementing data mesh in your organization, helping you:

Start your data mesh journey with a focused pilot project.
Discover efficient methods for scaling up your data mesh.
Acknowledge the pivotal role an internal marketplace plays in facilitating the effective consumption of data products.
Learn how the Actian Data Intelligence Platform emerges as a robust supervision system, orchestrating an enterprise-wide data mesh.

Get the eBook.

About Actian Corporation

Data Intelligence

The Journey to Data Mesh – Part 2 – Building a Team & Data Platform

Actian Corporation

April 15, 2024

Young Stock Trader Shows to the Executive Managers Cryptocurrency and Trade Market Correlation Pointing at the Wall TV

While the literature on data mesh is extensive, it often describes a final state, rarely how to achieve it in practice. The question then arises:

What approach should be adopted to transform data management and implement a data mesh?

Part 1: Scoping Your Pilot Project
Part 2: Assembling a Development Team & Data Platform for the Pilot Project
Part 3: Creating Your First Data Products
Part 4: Implementing Federated Computational Governance

In the previous article, we discussed the essential prerequisites for defining the scope of your data management decentralization pilot project, by identifying domains and selecting a use case. In this article, we will explain how to establish its development team and data platform.

Building the Pilot Development Team

As mentioned, the first step in our approach is to identify an initial use case and, more importantly, to develop it by implementing the 4 principles of data mesh with existing resources. Forming the team responsible for developing the pilot project will help implement the first principle of data mesh, domain-oriented decentralized data ownership.

PREMIUM OFFICES EXAMPLE

The data required for the pilot belongs to the Brokerage domain, where the team responsible for developing the pilot will be created. This multidisciplinary team includes:

A Data Product Owner
- Should have both a good understanding of the business and a strong data culture to fulfill the following responsibilities: designing data products and managing their lifecycle, defining and enforcing usage policies, ensuring compliance with internal standards and regulations, and measuring and overseeing the economic performance and compliance of their product portfolio.
Two Engineers
- One from the Brokerage domain teams – bringing knowledge of operational systems and domain software engineering practices, and the other from the data team – familiar with DBT, GCP, and BigQuery.
A visualization developer
- Who can design and build the dashboard.

Domain Tooling: The Data Platform of the Data Mesh

One of the main barriers to decentralization is the risk of multiplying the efforts and skills required to operate pipelines and infrastructures in each domain. But in this regard, there is also a solid state-of-the-art inherited from distributed architectures.

The solution is to structure a team responsible for providing domains with the technological primitives and tools needed to extract, process, store, and serve data from their domain.

This model has existed for several years for application infrastructures and has gradually become generalized and automated through virtualization, containerization, DevOps tools, and cloud platforms. Although data infrastructure tooling is not as mature as software infrastructure, especially in terms of automation, most solutions are transferable, and capabilities are already present in organizations as a result of past investments. Therefore, nothing is preventing the establishment of a data infrastructure team, setting its roadmap, and gradually improving its service offering: simplification and automation being the main axes of this progression.

The Three Planes of the Data Mesh Platform

The data platform for data mesh covers a wide range of capabilities, broader than infrastructure services. This platform is divided into three planes:

The Data infrastructure provisioning plane – Provides low-level services to allocate the physical resources needed for big data extraction, processing, storage, real-time or non-distributed distribution, encryption, caching, access control, network, co-location, etc.
The Data product developer experience plane – Provides the tools needed to develop data products: declaration of data products, continuous build and deployment, testing, quality controls, monitoring, securing, etc. The idea is to provide abstractions above the infrastructure to hide its complexity and automate the conventions adopted on the mesh scale.
The Data mesh supervision plane – Provides a set of global capabilities for discovering data products, lineage, governance, compliance, global reporting, policy control, etc.

PREMIUM OFFICES EXAMPLE

Premium Offices has invested in a shared cloud platform – specifically, GCP (Google Cloud Platform). The platform includes experts in a central team who understand its intricacies. For its pilot project, Premium Offices simply chose to integrate one of these experts into the project team. This individual will be responsible for finding solutions to automate the deployment of data products as much as possible and identifying manual steps that can be automated later, as well as any missing tools

In conclusion, establishing a dedicated development team is essential for the success of your data management decentralization pilot project. By bringing together individuals with diverse skills and expertise, organizations can effectively implement the principles of data mesh and drive meaningful insights from their data. Moreover, leveraging existing platforms and investing in automation facilitates the development process, paving the way for scalability and long-term success.

In our next article, learn how to execute your data mesh pilot project through the design and development of your first data products.

The Practical Guide to Data Mesh: Setting up and Supervising an Enterprise-Wide Data Mesh

Written by Guillaume Bodet, our guide was designed to arm you with practical strategies for implementing data mesh in your organization, helping you:

Start your data mesh journey with a focused pilot project.
Discover efficient methods for scaling up your data mesh.
Acknowledge the pivotal role an internal marketplace plays in facilitating the effective consumption of data products.
Learn how the Actian Data Intelligence Platform emerges as a robust supervision system, orchestrating an enterprise-wide data mesh.

Download eBook.

About Actian Corporation

Data Integration

Actian Data Platform Receives Data Breakthrough Award

Actian Corporation

April 11, 2024

Data integration is a critical capability for any organization looking to connect their data—in an era when there’s more data from more sources than ever before. In fact, data integration is the key to unlocking and sustaining business growth. A modern approach to data integration elevates analytics and enables richer, more contextual insights by bringing together large data sets from new and existing sources.

That’s why you need a data platform that makes integration easy. And the Actian Data Platform does exactly that. It’s why the platform was recently honored with the prestigious “Data Integration Solution of the Year” award from Data Breakthrough. The Data Breakthrough Aware program recognizes the top companies, technologies, and products in the global data technology market.

Whether you want to connect data from cloud-based sources or use data that’s on-premises, the integration process should be simple, even for those without advanced coding or data engineering skill sets. Ease of integration allows business analysts, other data users, and data-driven applications to quickly access the data they need, which reduces time to value and promotes a data-driven culture.

Access the Autonomy of Self-Service Data Integration

Being recognized by Data Breakthrough, an independent market intelligence organization, at its 5th annual awards program highlights the Actian platform’s innovative capabilities for data integration and our comprehensive approach to data management. With the platform’s modern API-first integration capabilities, organizations in any industry can connect and leverage data from diverse sources to build a more cohesive and efficient data ecosystem.

The platform provides a unified experience for ingesting, transforming, analyzing, and storing data. It meets the demands of your modern business, whether you operate across cloud, on-premises, or in hybrid environments, while giving you full confidence in your data.

With the Actian platform, you can leverage a self-service data integration solution that addresses multiple use cases without requiring multiple products—one of the benefits that Data Breakthrough called out when giving us the award. The platform makes data easy to use for analysts and others across your organization, allowing you to unlock the full value of your data.

Making Data Integration Easy

The Actian Data Platform offers integration as a service while making data integration, data quality, and data preparation easier than you may have ever thought possible. The recently enhanced platform also assists in lowering costs and actively contributes to better decision making across the business.

The Actian platform is unique in its ability to collect, manage, and analyze data in real time with its transactional database, data integration, data quality, and data warehouse capabilities. It manages data from any public cloud, multi or hybrid cloud, and on-premises environments through a single pane of glass.

All of this innovation will be increasingly needed as more organizations—more than 75% of enterprises by 2025—will have their data in data centers across multiple cloud providers and on-premises. Having data in various places requires a strategic investment in data management products that can span multiple locations and have the ability to bring the data together.

This is another area where the Actian Data Platform delivers value. It lets you connect data from all your sources and from any environment to break through data silos and streamline data workflows, making trusted data more accessible for all users and applications.

Try the Award-Winning Platform With a Guided Experience

The Actian Data Platform also enables you to prep your data to ensure it’s ready for AI and also help you use your data to train AI models effectively. The platform can automate time-consuming data preparation tasks, such as aggregating data, handling missing values, and standardizing data from various sources.

One of our platform’s greatest strengths is its extreme performance. It offers a nine times faster speed advantage and 16 times better cost savings over alternative platforms. We’ve also made recent updates to improve user friendliness. In addition to using pre-built connectors, you can easily connect data and applications using REST- and SOAP-based APIs that can be configured with just a few clicks.

Actian Ingres Disaster Recovery

Example

Summary

About Emma McGrattan

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Types of Databases, Pros & Cons, and Real-World Examples

Types of Information Stored in Databases

Telecommunications: Verizon

E-commerce: Amazon

Finance: JPMorgan Chase

Healthcare: Mayo Clinic

Types of Databases

1. Relational Databases

2. NoSQL Databases

3. In-Memory Databases

4. Graph Databases

Factors to Consider in Database Selection

1. Data Structure and Type

2. Scalability Requirements

3. Performance Needs

4. Consistency and Transaction Needs

5. Cost Considerations

6. Ecosystem and Support

About Dee Radh

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Building a Marketplace for Data Mesh: Facilitating Data Product – Part 1

Sharing and Exploiting Data Products Through Metadata

Using a Data Marketplace to Deploy Metadata

About Actian Corporation

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Actian Life: Celebrating Our Author and Our SEO Award Winner

Graph Theory Guru and SEO Maestro

Pursuing Passions Leads to Successes

Commitments to End Goals Are Validated

Helping the Next Generation of Employees

Many Paths to Innovation

About Actian Corporation

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Top 5 Data Integration Use Cases for Data Leaders

Summary

Data Consolidation and Access

Process Automation

Sales and Marketing Alignment

Customer 360

Real-Time Reporting and Analytics

About Dee Radh

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Modernizing Data Architectures in the Public Sector

Embrace Hybrid Cloud Solutions as a First Step

Make Data Integration and Data Access Completely Seamless

Take a Future-Looking Approach to Connecting Data

About Tim Williams

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

The Journey to Data Mesh – Part 4 – Federated Computational Governance

What is Federated Computational Governance?

The Challenges of a Federated Governance in a Data Mesh

The Aspects of Governance That Can be Automated

The Practical Guide to Data Mesh: Setting up and Supervising an Enterprise-Wide Data Mesh