Dealing with large volumes of data is essential to any organization’s success. But knowing what kind of data it is, where it comes from, and how it can be used is just as important. This is the role of metadata. So, how can companies optimize and enhance it? Follow this guide.

Data is essential to have in-depth knowledge of an organization’s market, industry, customers, or products. But to exploit the full potential of this data, it is essential to focus on its metadata. This data on data is a prerequisite for knowing how to best use it. By having a precise vision of what made it possible to generate the data, at what time, and via which source, it is possible to contextualize this information. Metadata is, in a way, structured information that describes, explains, locates, or facilitates access, use, or management of an information source. 

However, the role of metadata is not limited to understanding the origin of data.

Properly managed and structured, metadata will also allow organizations to know how to get the most out of the information they have, according to the objectives they’ve set.

How is Metadata Useful?

Metadata is everywhere. Not just in client files or in website archives. When taking a picture with a smartphone, metadata is instantly attached to the image: date, time, location… All this information can be valuable when wanting to create a virtual photo album, for example.

It’s the same in the context of a company’s data project.

While metadata is necessary to truly understand where data comes from and how it can be used, it is not the only thing it is used for. In fact, when properly managed, metadata is a major lever for organizations seeking to structure and enhance their information daily. Optimal metadata management is therefore the foundation of a data-driven transformation project.

The Different Types of Metadata

If the use of the generic term for metadata is to qualify the information relative to data, it is important to know that they can be classified into different types. 

Thus, it is important to distinguish between descriptive metadata, which presents a resource in a way that facilitates the identification of the available data, and structural metadata. The latter provides information on the composition or organization of a data resource. To describe a data portfolio, there is also administrative metadata, which provides information on the date of creation or acquisition of the data, but also on its associated permissions, lifespan, and use. 

Alongside this generic metadata is a wide range of other types of metadata. They provide context on the application and business uses of information, on technical aspects, or even reinforce an information’s descriptive dimension.

The larger the volume of data you have, the more varied the sources of data acquisition and collection are, and the more companies will benefit from fine-tuned metadata management.

What Tools Manage Metadata?

To organize and optimize metadata use for all employees, it is essential to use a Data Catalog. Through this metadata management solution, organizations are able to index their data and metadata as well as quickly identify the sources of information that are available to data teams. But a Data Catalog’s mission goes even further: it will enable companies to reference all their data assets, facilitate data access when needed, and perform thematic searches.

Indeed, the quality of this metadata conditions the quality of a data description, with a direct impact on its visibility and ease of use. 

We’ve identified three types of metadata within our data catalog:

  • Technical Metadata: They describe the structure of a dataset and the information related to storage systems.
  • Business Metadata: It applies business context to datasets: descriptions (context and usage), owners and referents, tags, and properties in order to create a taxonomy over the datasets that will be indexed by our search engine. Business metadata are also present at the schema level of a dataset: descriptions, tags, or data confidentiality level per column.
  • Operational Metadata: This allows us to understand when and how the data was created or transformed: statistical analysis of the data, date of update, origin (lineage), volume, cardinality, the identifier of the processes that created or transformed the data, status of the processes on the data, etc.

Airbnb is a burgeoning enterprise. To keep pace with its rapid expansion, Airbnb needed to think about data and the extension of its operation. The Data Portal was born from this growing momentum, a fully Data-Centric tool at the disposal of employees.

This article is the first of a series dedicated to Data-Centric enterprises. We will shed light on successful examples of democratization and the mastery of data within inspiring organizations. These pioneering enterprises demonstrate the ambition of the Actian Data Intelligence Platform’s data catalog: to help each structure better understand and use their data assets.

Airbnb Today:

In a few years, Airbnb has secured its position as a leader of the collaborative economy around the world. Today, they are among the top hoteliers on the planet. In numbers [1], they represent:

  • 3 million recorded homes.
  • 65,000 registered cities.
  • 190 countries with Airbnb offers.
  • 150 million users.

France is its second-largest market behind the United States. It alone accounts for more than 300,000 homes.

The Reflections That Led to the Data Portal

During a conference held in May 2017, John Bodley, a data engineer at Airbnb, outlined new issues arising from the high growth of collaborators (more than 3,500) and the massive increase in the amount of data, from both users as well as employees (more than 200,000 tables in their Data Warehouse). This is a confusing and divided landscape that doesn’t always allow access to increasingly important information.

How to combine success with a very real management problem with data? What to do with all this information collected daily and this knowledge both at the user and collaborator level? How can they be transformed into a force for all Airbnb employees?

Those are the questions that led to the creation of the data portal. Beyond these challenges, a problem of overall vision has been imposed on the company.

Since its creation in 2008, Airbnb has always paid great attention to their data and their operations. This is why a dedicated team has positioned themselves for the battle to develop a tool that democratizes data access within the enterprise. Their work is simultaneously founded on analysts’ knowledge and their ability to understand the critical points as well as on their engineers who also offer a more concrete vision of the whole. At the heart of the project, an in-depth survey of employees and of their problems were conducted.

From this survey, one constant emerged: a difficulty of finding information, which the collaborators need in order to work. The presence of tribal knowledge, kept by a certain group of people, is both counter-productive and unreliable.

The result: The necessity of raising questions to colleagues, the lack of trust in the information (data’s validity, impossible to know if the data is up-to-date) and consequently, the creation of new, but duplicate data, which astronomically increases the already existing quantity.

To respond to these challenges, Airbnb created the Data Portal and released it to the public in 2017.

Data Portal, Airbnb’s Data Catalog

To give you a clear picture, the Data Portal could be defined as a cross between a search engine and a social network.

It was designed to centralize absolutely all incoming data, whether they come from employees or users, by the enterprise. The goal of the Data Portal is to be able to return this information, in graphic form, to whichever employee needs it.

This self-service system allows collaborators to access necessary information by themselves for the development of their projects. Beyond data itself, the Data Portal lets you obtain contextualized metadata. The information is provided with a background that allows you to valorize the data better and to understand it as a whole.

The Data Portal was designed in a collaborative approach.

With this in mind, it helps you to visualize within data all the interactions between the different collaborators of the enterprise. Thus, it is possible to know who is connected to which data.

The Data Portal and a few of its Features

The Data Portal offers different features to access data in a simple and fun way, offering the user an optimal experience. You can see pages dedicated to each data set or a significant amount of metadata linked to it.

  • Research: Chris Williams, an engineer and a member of the team in charge of developing the tool, speaks of a “Google-esque” feature. The search page allows you to quickly access data, to graphics, and also to the people, groups, or relevant teams behind the data.
  • Collaboration: All in one sharing approach and implementing a collaborative tool, data can be added to a user’s favorites, pinned on a team’s board, or shared via an external link. Just like a social network, each employee also has a profile page. As the tool is accessible to all collaborators and intended to be completely transparent, it also includes all the members in the hierarchy. Former employees continue to have a profile with all created and used data. Always in a logic of information decompartmentalization and doing away with tribal knowledge.
  • Lineage: It is also possible to explore data’s hierarchy by viewing both parent and child data.
  • Groups: Teams spend a lot of time exchanging around the same data. To enable each to share information more quickly and more easily, the possibility to create working groups was implemented in the Data Portal. Thanks to these pages, a team’s members can organize their data, easily access them, and encourage sharing.

Within the Tool

Democratizing data has several virtues. First off, this avoids creating dependence on information. An umbrella system weakens the enterprise’s equilibrium. If the information and the understanding of data are only held by one group of people, the dependency ratio becomes too high.

In addition, it is important to simplify the understanding of data so that the collaborators can operate them better.

Globally speaking, the challenge for Airbnb is also to improve the trust in data for all their collaborators. So that each can be assured they are working with the correct information, updated, etc.

Airbnb is no fool and the team behind the Data Portal knows that the handling of this tool and its wise utilization will take time. Chris Williams put it this way: “Even if asking a colleague for information is easy, it is totally counterproductive on a larger scale.”

To change these habits, take the first step to consult the portal rather than directly exchanging will require a little effort from collaborators.

The Vision of the Data Portal Over Time

To promote trust in the supplied data, the team wants to create a system of data certification. It would make it possible to certify both the data and the person who initiated the certification. Certified content will be highlighted in the search results.

Over time, Airbnb hopes to develop this tool at different levels:

  • Analysis of the network in order to identify obsolete data.
  • Create alerts and recommendations. Always with an explorative approach, the tool could possibly become more intuitive suggesting new content or updates on data accessed by a user.
  • Making data pleasant. To create an appealing setting for the employees by presenting, by example, the most viewed chart of the month, etc.

With the Data Portal, Airbnb pushes the use of data to the highest level.

The democratization of all employees makes it possible to make them more autonomous and efficient in their work and also reconstructs the enterprise’s hierarchy. And with more transparency, it will also become less dependent. The collaborative takes precedence over the notion of dedicated services. And the use of data reinforces enterprises’ strategy for their future development. A logical approach that it is a part of and is promoted among their customers.

Sources


Data stewards are the first point of reference for data and serve as an entry point for data access. They have the technical and business knowledge of data, which is why they are often called the “masters of data” within an organization! As the true guardians of data, let’s discover their roles, missions, and responsibilities.

Faced with the challenges of data exploitation and optimization, organizations are in need of specialists who can combine their actions with their knowledge of data.

In a recent article, we discussed the prerogatives and differences between Data Engineers and Data Architects. We also deciphered the missions of a Data Analyst, a Data Product Manager, and a Chief Data Officer. All of these specialists have the mission of making data speak, of giving it life, either by organizing it, by defining a strategy, or by manipulating it. To do so, they all have a common requirement: to work with quality data. 

This is the essential mission of the Data Steward, who is responsible for the quality of their data, which ultimately conditions all of the processes and decisions from a company’s data strategy.

The Data Steward’s Multiple Skills

To do so, a Data Steward must have strong communication skills and be able to distinguish the different types and formats of data. 

Acting as a point of convergence for all the data generated and used in the company, they must also ensure constant vigilance over the quality of their data in order to identify the priority data that needs to be cleaned or standardized.  

Versatile and multi-skilled, the Data Steward is considered the key contact for an organization in terms of data. So much so that they are often called the “master of data”. In order to live up to Data Stewardship requirements, this expert must be present on all fronts, as he or she plays a central role in the proper implementation of a data strategy.

What is the Role of the Data Steward in the Company?

Companies are reorganizing around their data to produce value and finally innovate from this raw material. Data Stewards are there to orchestrate the data in the company’s information systems. They must ensure the proper documentation of the data and facilitate their availability to their users such as Data Scientists or Project Managers, for example.

The essential role of the Data Steward is to supervise the life cycle of all available data, to ensure that its quality remains optimal. Behind the notion of data quality, there is also that of availability. The Data Steward, through his data quality missions, also contributes to ensuring that business teams can easily access the data they need. 

To give the notion of Data Stewardship its full meaning and scope, the “master of data” must be able to play the role of the bridge between the data and business teams. Working closely with the business lines and in constant partnership with the IT teams, the Data Steward not only helps to identify and collect data but also to validate and structure it. Their communication skills enable them to identify the people responsible for and knowledgeable about the data, to collect the associated information in order to centralize it and perpetuate this knowledge within the company. More specifically, Data Stewards provide metadata knowledge; a structured set of information describing a dataset. They transform this abstract data into concrete assets for the business.

Although there is no specific training for the Data Steward profession, the most commonly sought-after profile is that of an expert business user, familiar with data management techniques and data processing.

What are the Data Steward’s Responsibilities?

The Data Steward must fulfill a wide range of missions. In particular, they must deal with the day-to-day management of data in the broadest sense of the term, ensuring that the processes for collecting and processing information are fluid. Finding and knowing the data, imposing a certain discipline in the management of metadata, and facilitating their availability to employees – These are just some of the issues that Data Stewards must address.

Once the data is collected, it is the Data Steward who is responsible for optimizing its storage and transmission to the business teams, after having created the conditions for indexing the data. As one of the key players in ensuring data quality, the Data Steward has another critical task: cleaning up the data by removing duplicates and eliminating useless information. To accomplish this, the Data Steward must ensure that the documentation of the data they manage is up-to-date.

Finally, as the Data Steward is also responsible for providing data access for all of your teams, they are constantly monitoring the security of their data assets, both with regard to external threats and internal dangers (particularly to the blunders of certain employees). Operational supervision of data, coordination of data documentation, compliance, and risk management, the Data Steward is a multi-faceted player who contributes to optimized data governance.

 

These last few months, it has become more and more difficult to attend a meeting without hearing the expression data governance. However, this subject is nothing new! Be that as it may, with the arrival of Big Data technologies, data and its use have become the cornerstone of approaches to innovation. An old subject evolving in a very new context.

Data and Governance: One Cannot be Without the Other

The data craze over the last years is such that enterprises invest a lot of time and money to try to break down data silos and to unify their asset thanks to new, ever more efficient, and less costly storage infrastructures.

Nevertheless, enterprises understood rather quickly that the promise to innovate through data was going to be much more complicated than previously expected. Despite the latest technological advancements, data is still scattered on both sides in the enterprise with a militant legacy. New storage systems implemented are, ultimately, “only” additional technical stacks in the enterprise’s IS landscape and don’t allow, on their own, the management of data’s life cycle, guarantee rules allowing the best data usage, and thus, maximize the creation of data value. We are talking about data governance here.

The Objectives of Data Governance

In the pursuit of innovation, enterprises are rethinking their organizations to move towards a “data-driven” culture. Information systems must become the profession’s strong arm by placing refined, secure, and quality data at the center of strategic decisions.

To achieve this transformation, organizations construct what we call data governance. This project pursues quite clear objectives, among others:

  • Ensure metadata management (technical, operational, or even business) and data documentation.
  • Simplify data access and facilitate their use by as many collaborators as possible.
  • Ensure data quality and integrity.
  • Manage data security: Supervise data collection and their use, especially when it comes to personal data.

An Agile Strategy to Data Governance

The way to approach the subject of data governance is evolving. Our experiences have brought us to promote data governance based on the following four pillars:

  • Non-Invasive and Post Hoc: Data governance should not be an obstacle to innovation in your enterprise. Metadata collection and aggregation of an enterprise’s datasets, after their creation or update through various pipelines, allows you not to interfere with the owners of datasets or their users.
  • Automatic and Connected: The automation of metadata collection and governance KPIs allows your tools to accurately reflect the reality. On the other hand, this automation guarantees that such governance is up-to-date and ensures upscaling.
  • Bottom-Up et Collaborative: A strategy of bottom-up data governance wants to put individuals and their interactions in front of processes and tools. An approach to data governance cannot be successful, which involves all the collaborators in an organization, thus benefiting from collective intelligence.
  • Iterative: Construct data governance in stages to correspond as close as possible to the company’s expectations and to its operations. The adaptation to change must be at the heart of an enterprise’s data governance strategy.

Such an approach can be successful where many larger “data governance” initiatives have failed.

Agile Data Governance Conclusion

The same as how software developments have gradually shifted away from traditional methods (V-model, Waterfall, etc.) to agile methods, data governance must be rethought.

Such an approach is not only iterative but also applied incrementally to your data governance strategy allowing greater flexibility, necessary to take into consideration the ever-increasing complexity of your IS.


It is no secret that the enormous volumes of information that companies generate require the right tools in order to correctly manage them. Indeed, with great data comes great responsibility! For organizations to truly profit from their data, it is essential to be equipped with a solution that enables data-driven people to easily find, discover, manage, and above all, trust in their information assets.

A data catalog, created to unify all enterprise data, enables data managers and users to improve productivity and efficiency when working with their data.

In 2017, Gartner declared data catalogs as “the new black in data management and analytics”. In “Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders,” they state:

“The demand for data catalogs is soaring as organizations continue to struggle with finding, inventorying, and analyzing vastly distributed and diverse data assets.”

In this article, we will share everything there is to know about data catalogs for companies seeking to truly become data-driven.  

What Exactly is a Data Catalog?

Before getting into the subject of data cataloging, it is important to understand the concept of metadata management. A data catalog uses metadata – data on data – to create a searchable repository of all enterprise information assets. This metadata, collected by various data sources (Big Data, Cloud services, Excel sheets, etc.) is automatically scanned to enable users of the catalog to search for their data and get information such as the availability, freshness, and quality of a data asset. 

Therefore, by definition, a data catalog has become a standard for efficient metadata management. We broadly define a data catalog as being:

“A detailed inventory of all data assets in an organization and their metadata, designed to help data professionals quickly find the most appropriate data for any analytical business purpose.”

What is the Purpose of a Data Catalog?

Topics on data are still considered to be an extremely technical domain. However, data innovation is only possible if it is shared by as many people as possible. This is the very purpose of a data catalog: to democratize data access

A data catalog is meant to serve different people or end-users. All of these end-users – data analysts, data stewards, data scientists, business analysts, and so much more – have different expectations, needs, profiles, and ways to understand data. As more and more people are using and working with data, a data catalog must adapt to all end-users. In fact, data catalogs don’t require technical expertise to search for, discover, and understand a company’s data landscape.

What are the Benefits of a Data Catalog?

As mentioned above, a data catalog centralizes and unifies the metadata collected so that it can be shared with IT teams and business functions. This unified view of data allows organizations to:

Accelerate Data Discovery

As thousands of datasets and assets are being created each day, enterprises find themselves struggling to understand and gain insights from their information to create value. Many recent surveys still state that data science teams spend 80% of their time preparing and tidying their data instead of analyzing and reporting it. By deploying a data catalog, the speed of data discovery can increase up to 5 times. This way, data teams can focus on what’s important: delivering their data projects on time.

Sustain a Data Culture

Just like organizational or corporate culture, data culture refers to a workplace environment where decisions are made through emphatic and empirical data proof. A data catalog allows for data knowledge to no longer be limited to a group of experts: it enables organizations to better collaborate on their information assets. 

Build Agile Data Governance

Instead of deploying overly complex processes too difficult to maintain on assumed information, data catalogs enable a bottom-up, agile data governance approach. A data catalog enables data users to create a data process registry, document legal obligations, track the lifecycle of data, as well as identify sensitive information. All this is in a single centralized repository.

Maximize the Value of Data

By collecting all the data of an enterprise on a reference data tool, it becomes possible to cross-reference these assets and get value from them more easily. The collaboration of technical and professional teams within the data catalog enables innovations that meet proven market needs.

Produce Better and Faster

More than 70% of the dedicated time to data analysis is invested in “data quarrels” activities. Cataloging simplifies data retrieval, the identification of associated contacts, and therefore, data-driven decision-making.

Ensure Good Control Over Data

Misinterpreted or erroneous, enterprises expose themselves to the risk of basing their decision on incorrect information. Connected data catalogs permit access to always up-to-date data. Data users can ensure that data and their information are correct and usable.

What are a Data Catalog’s Key Features to Look Out For?

A Flexible and Adaptable Metamodel Template

A data catalog should automatically capture and update metadata from an enterprise’s data sources.  Through a flexible metamodel template, it should be possible to add, configure – at the hand of the data catalog’s administrator –  and overlay documentation properties on cataloged datasets. Via this approach, the catalog offers a simple and modular way to configure documentation templates according to the enterprise’s objectives and priorities.

A Smart Search Engine

One of the core features of a data catalog is a search engine. All indexed metadata should be searchable via a search bar. Through simple keyword searches, a data catalog should be able to show the most accurate results to a query. It should also enable users to filter their search results. A smart search engine also optimized results based on the user’s profile and preferences. A smart search engine thus, enables users to be able to quickly find their information assets.

A Knowledge Graph

The presence of a knowledge graph is essential to any data cataloging project. The knowledge graph is what represents different concepts and what links objects together through semantic or static links. A data catalog’s knowledge graph, therefore, provides users with rich and in-depth search results, optimized data discovery, smart recommendations, and more.

Data Lineage

With data lineage, it is possible to visualize in whole the origin and the transformations of one specific data over time. This allows users to understand where the data originate from, when and where they separate and fuse with other data. These transformations and treatments carried out by the data are indispensable for conforming to the GDPR and other data regulations.

A Business Glossary

A business glossary enables data consumers to manage a common business vocabulary and make it available across the entire organization. This must-have feature provides a clear meaning and context to data terms.

What are a Data Catalog’s Use Cases? And for Whom?

For Chief Data Officers

The Chief Data Officer plays a key role in the overall data strategy of an enterprise; their purpose is to master their data and facilitate their access in order to become data-driven. A data catalog helps them:

  • Ensure data reliability and value.
  • Create a data literate organization.
  • Valorize a data set’s context for data explorers.
  • Evangelize a data culture with rights and duties.
  • Start a compliance process with the European regulation (GDPR).

For Data Stewards

Known as the main contact for data inquiries thanks to their technical and operational knowledge, the Data Steward is most commonly nicknamed the “Master of data”. A data catalog enables data stewards to:

  • Centralize data knowledge in a single platform.
  • Enrich data documentation.
  • Establish communication between them and data explorers.
  • Qualify the value of data.

For Data Scientists

To achieve their missions, end-users must be able to quickly find, discover, and understand the right data asset for their use-cases. A data catalog helps them:

  • Easily find data through a search engine
  • View the history of their information: date of creation and the actions carried out on it
  • Understand the context of their data
  • Identify the associated people
  • Easily collaborate with peers.

A Representative Data Catalog Journey

A data catalog becomes extremely handy in the different phases of your projects:

A Data Catalog in the Deployment Phase

Connect to your data sources – A data catalog plugs into all your data sources. Connect your data integration, data preparation, data visualization, CRM solutions, etc in order to fully integrate all your technologies into a single source of truth. 

A Data Catalog in the Documentation Phase

Create a metamodel – A data catalog captures and updates technical and operational metadata from an enterprise’s data sources.  It allows you to add and configure – at the hand of the data catalog’s administrator –  or overlay information (information that can be mandatory or not) on its cataloged datasets. 

A Data Catalog in the Discovery Phase

Understand your data – With a data catalog, data citizens – with technical capabilities or not – are able to fully understand their enterprise data. A data catalog allows users to have access to and easily search for any information within the catalog. 

Define your data – A data catalog allows data leaders, such as data stewards or chief data officers, to correctly define the pertinent data to be used. Through metadata, data managers can easily document their datasets, allowing their data teams to access contextualized data. 

Explore your data – Discover and collect available data in a data catalog. By cataloging all enterprise data in a central repository, data citizens are able to ensure that their data is reliable and usable.

A Data Catalog in the Collaboration Phase

Communicate with data – A data catalog allows users to become data fluent. Both the IT & business departments are able to understand and communicate around different data projects. Through collaborative features such as discussions, data becomes a topic for all to share across the enterprise. 

Start Your Cataloging Journey

Actian Data Intelligence Platform is a 100% cloud-based solution, available anywhere in the world with just a few clicks. By choosing the Actian Data Intelligence Platform, you give your data teams the best next-generation environment to find, understand and use your data assets.

Check out our two applications:

  • Actian Studio – Enable your data management teams to manage, maintain and enrich the documentation of their company’s data assets.
  • Actian Explorer – Provide your data teams with a user-friendly interface and customized exploration paths to make their data discovery more efficient.

FAQ

A data catalog is a detailed inventory of all data assets in an organization and their metadata, designed to help data professionals quickly find the most appropriate data for any analytical business purpose.

A data catalog democratizes data access, accelerates data discovery up to 5 times, and enables organizations to better collaborate on information assets while reducing the time data teams spend preparing data instead of analyzing it.

Key features include a flexible metamodel template for capturing metadata, a smart search engine for finding data assets, a knowledge graph for linking data concepts, data lineage for tracking data transformations, and a business glossary for managing common vocabulary.

A data catalog provides a comprehensive, searchable inventory of all data assets with features like search, lineage, and governance, while a data dictionary focuses mainly on technical metadata for data modeling and database design.

A data catalog enables agile, bottom-up data governance by allowing users to create a data process registry, document legal obligations, track data lifecycle, identify sensitive information, and ensure GDPR compliance—all in a single centralized repository.

Chief Data Officers use it to ensure data reliability and create data-literate organizations, Data Stewards use it to centralize knowledge and enrich documentation, and Data Scientists use it to quickly find, understand, and collaborate on the right data for their projects.

Data lineage visualizes the origin and transformations of specific data over time, allowing users to understand where data comes from and how it changes, which is essential for GDPR compliance and other data regulations.

By centralizing metadata in a searchable repository with smart search capabilities, a data catalog can increase the speed of data discovery up to 5 times, allowing data teams to focus on analysis rather than data preparation.

Blog | Data Integration | | 4 min read

How Do I Know What I Need to Connect?

hybrid-integration platform lines connecting

With the digital transformation of business, companies are learning about the importance of data when fueling their digital business processes. Data from on-premises, cloud, infrastructure systems, internet of things (IoT), and the edge all have parts to play in enabling decision-making and the efficient performance of business processes. Emerging technologies, such as artificial intelligence (AI), machine learning (ML), blockchain, and IoT, are increasing the demand for real-time data integration and harmonization.

However, machine learning and AI capabilities won’t help you make informed decisions if they don’t have access to quality data.

This begs an interesting question for data management professionals and IT staff: “How do I know what I need to connect?” With hybrid-integration platforms like Actian DataConnect, companies can empower anyone to connect anything, anytime, anywhere.

Before you connect anything, you should have a plan; assess your data integration needs; and determine what data sources you have and what systems you have in place on-premises, in the cloud, or in a hybrid environment. Just because you can connect everything, doesn’t mean you should – at least all at once.

You may achieve this goal eventually, and it’s okay to use that as your North Star vision, but data integration (like digital transformation) is a journey, and every journey must start somewhere.

Start With a Foundation of Core Data

Every company has a core set of master data that is used throughout the organization.

Customer records, product records, and employee records are great examples of master data that are used in many different places, but there is only one real source of truth. There are also likely to be a few core transactional platforms, such as sales, customer service, manufacturing, and finance systems, that are critical for enabling operational decision-making. Master data and core platforms are a great first step when building a foundation of connected data to support your digital business processes.

Every company wants to achieve a 360-degree customer view to understand better customer behavior to improve service for them. A 360-degree customer view starts with master data or a single view of the master data. Master data is a result of clean quality data. Data integration is the underlying technology/foundation or process to achieve that end goal. The data must be gathered from many different sources; cleaned, transformed and harmonized; and then distributed.

Focus on Solving Specific Problems

Once a foundation is established, it is tempting for IT staff to start compiling and prioritizing lists of source systems that should be connected and made available throughout the company. Unfortunately, this “build-it-and-they-will-come” approach often leads to increased implementation and maintenance costs in addition to adoption issues. Alternatively, successful companies have learned that it is better to focus on how data is consumed, not where it is being created – what business questions must be answered and what data is needed to provide actionable insights? By prioritizing connection activities that support addressing specific questions, the ROI of data-integration activities will be clear and adoption won’t be an issue.

Look for Cross-Technology Integrations

A great benefit of using a hybrid-integration platform like Actian DataConnect is the ability to integrate components built on different technologies. This was difficult in the past, but to leverage fully the power of digital transformation, you must remove the cross-technology barriers. As you still keep in mind the idea of solving specific business problems, look for opportunities to gather data from different types of components.

For example, if your manufacturing system is already connected, then consider adding data from your logistics-related IoT devices to provide visibility to the end-to-end supply chain.

Enabling Digital Ecosystems

Most companies today don’t do everything in-house – they leverage a digital ecosystem of suppliers and partners to support some business activities. Often, these third parties have their own systems for managing service delivery. Connecting 3rd party systems with your hybrid-integration platform can allow you to see what’s occurring throughout your business ecosystem (regardless of who is doing the work).

Examples of this include component suppliers, transportation companies, facilities services, cloud service providers, and reseller sales partners.

Focus on Value, Not Volume

There is no right answer to what systems you should connect with your hybrid data-integration platform. Actian DataConnect provides an unlimited set of possibilities and the tools to make establishing new connections and managing existing ones simple. The key to a successful data-integration journey is focusing on value instead of volume. Find your biggest business problems and help solve those first! Near-term successes will provide momentum for future growth.


Blog | Data Architecture | | 4 min read

Enabling Agility With Real-Time Data

Enabling Agility with Real-Time Data

The speed of change in modern business is accelerating.

New technologies, globalization, regulatory changes and competitors entering and leaving the market are just a few examples of the types of environmental forces that bombard companies every day.

Developing a sustainable, competitive advantage requires company leaders to identify and respond to opportunities and threats quickly, directing both strategic and operational course corrections. Management and employees must then be able to respond to leaders’ directions and implement changes quickly and effectively to capture opportunities before they are gone.

This is the essence of business agility.

Imagine trying to drive a car while only being able to see from the rear-view mirror. You wouldn’t be able to anticipate what is ahead of you, or what direction or how fast you are going. You wouldn’t know that course corrections are required until it is too late. If you are somehow able to avoid running off the road or hitting something, then you still aren’t likely to arrive at your destination very quickly.

Managing a modern business is very similar. Historical reports and batch data from last night or last week don’t provide leaders with the information and actionable insights they need to lead the company effectively – they need real-time data (and plenty of it!)

What it Means to Be Agile

Business agility is all about being able to identify opportunities and threats effectively in the environment (both internally and externally), determine an appropriate response and implement change quickly.

The optimal window of opportunity to achieve maximum benefit in some companies may be a few weeks or a few days – in other cases, companies may only have a few minutes or hours to assess a situation and execute a change.

Take for example a manufacturing company – commodity prices, government regulations, supply-chain issues and shifting customer demands all impact the price and quantity of products that are produced. A change to any of these factors could require an increase or decrease to production, change of suppliers, pricing changes or impact future sales forecasts.

Agility Requires Real-Time Data

It is easy to recognize that company leaders need access to real-time data for strategic decision-making, but they aren’t the only ones. Agility requires employees at all levels of an organization to use real-time data to make decisions about areas under their control.

Customer-service employees need real-time data about customer orders, known issues and current promotions to provide accurate guidance to customers. Operations managers need real-time data about business processes, systems health, transactional workflows, staff productivity, cost and quality to fine-tune operations for efficiency and profitability and to identify potential issues. The finance staff needs real-time data about revenue, expenses, investments and asset utilization to direct company resources effectively. IT teams need real-time data about system status, outages, performance issues, capacity utilization and security threats to provide service assurance and prevent business disruption.

Armed with real-time data about operations and forces in the external environment (including leadership-direction changes), employees can quickly respond to potential opportunities and threats and implement changes safely and effectively. Without real-time data (many companies still rely on nightly data refreshes to a data warehouse), employees find themselves making decisions based on old/incorrect information, waiting for data to refresh (slowing things down), or guessing.

None of these provides the agility that modern businesses need.

Business agility isn’t a nice-to-have, it is an imperative for modern companies. Sustainable, competitive advantage requires everyone in your organization to have complete, accurate and up-to-date information to make informed decisions and execute change with confidence. Vector from Actian provides tools for companies to unlock real-time data and make it available to employees throughout the organization. To learn more, visit www.actian.com.


Blog | Data Integration | | 4 min read

Why Hybrid Integration is the Key to Digital Transformation

Digital Transformation

Digital transformation of business is one of the biggest trends of the IT industry during the past few years and is projected to continue to be an important part of companies’ business strategies through 2020. With digital transformation initiatives, companies are transforming business processes to leverage modern technology in new ways – deeply integrating technology-enabled automation with human-driven tasks to create hybrid business/technology processes. As a result of these new processes, companies are not only seeing incremental productivity improvements (as one would expect from any automation initiative), but also transformative changes in the way companies operate and how they use data to drive all aspects of the business – from strategic decision-making to low-level optimization of workflows.

The Diversity Challenges

While the benefits of digital transformation initiatives are clear and easily quantifiable, digital business processes create a unique challenge for IT organizations tasked with ensuring the resiliency and agility of the technology services on which the business users depend. The technology components that underpin modern digital business processes are not homogeneous pieces of infrastructure, hardware, and installed software packages. The technology that runs modern business operations is often a combination of on-premises hardware, cloud services, installed software, SaaS offerings, user-provided devices (BYOD), and IoT components distributed throughout the enterprise. The broad diversity of components makes assembling the big-picture view of operational data both very challenging and intensely important.

Overcoming Diversity With Data Integration

Integration of diverse data sources isn’t just a “nice-to-have” feature with digital transformation – it is an essential capability. IT teams need the ability to connect anything, anytime, and anywhere through a common platform to enable operational data to be aggregated (showing the big picture), integrated (to see dependencies), and shared across the organization (to drive operational insights). Data is the connective tissue of your enterprise, and data integration is an important part of your digital-transformation journey. For business processes to leverage fully available technology and achieve the high levels of scalability, efficiency, and agility that modern companies require, data must flow freely throughout your organization.

The Need for a Hybrid Integration Platform

It isn’t enough to have data from your cloud services, IoT devices and on-premises systems managed separately – your business processes don’t care what technology is being used, who provides it or where it is located. For digital business processes to operate successfully, the technology and data can’t be managed as a box of independent parts – it must be fully-integrated. That’s why Actian developed the DataConnect platform – a hybrid integration platform that connects all of the various components you use across the enterprise, so the data can be integrated, and digital business processes can just work.

Emerging Technology Requires Integrated Data

As companies begin to look beyond basic digital transformation towards a new future, emerging technologies, such as machine learning (ML), artificial intelligence (AI), Blockchain and next-generation IoT devices, are highlighting an increasing need for data integration. The number of data sources a company manages is increasing rapidly. This is good, because it provides a more complete and diverse perspective about how the business is running. Technologies, such as ML and AI, can help take the newly available data sources and translate them into actionable insights – if the company is able to integrate effectively the data, so ML and AI capabilities can access it. Data integration is increasingly becoming the key barrier to realizing the complete value of emerging technologies.

Actian DataConnect is a hybrid integration platform to help your company solve the data-integration challenges of both digital transformation and emerging technologies. It provides you with a set of tools to gather the data from a wide variety of source systems, so it can be integrated and distributed to all of the places across your company where you need to use it. To learn more, visit www.actian.com.


Modern businesses move fast and the speed of business is accelerating every day with no sign of slowing. To survive, companies must find ways to remove friction in their systems and business processes – leveraging real-time operational data and translating it into actionable insights that drive activities across the company. From strategic decision-making to low-level operations and customer experience, your entire company must have up-to-date information and insights to keep pace with the speed of business. It isn’t okay for your business to be waiting on daily batch updates.

Leaders Need Real-Time Insights to Make Informed Decisions

Technology innovations, customer preferences, global economics, and market changes are causing the environments in which companies operate to change quickly and dramatically. Business agility is a necessity to survive and thrive in modern commerce. Market opportunities are short-lived, and threats are more impactful than ever. For leaders to be effective in recognizing changes in the environment and making informed decisions that lead to favorable outcomes, they need not only complete and accurate data, but also current data, so they can respond to changes in the moment. Competitors are looking at real-time data and making decisions. If your leaders are waiting for nightly batch processing, then opportunities may disappear before they can act.

Management Needs Real-Time Insights to Achieve Productivity, Profitability, and Quality Goals

Sales, customer service, HR, finance, manufacturing, and logistics – almost every business process in modern companies are technology-enabled. This can be good if the systems and people involved in operations are working smoothly together and everything is going well. Just because a business process has been digitally transformed, however, doesn’t mean it is operating at peak performance. Managers depend on data-driven insights about these business processes to understand operational performance, process quality, and cost drivers, enabling them to see where problems exist that require attention. The faster insights can be provided to managers, the faster they can respond and fine-tune operations to achieve company objectives.

Employees Need Real-Time Insights to Do Their Jobs Effectively

Modern businesses are complex, with operations spread across teams, IT systems and often geographic locations. For employees to be effective in their individual roles, they must understand what is occurring in the other parts of the company with which they interact. Manufacturing employees and planners need visibility of the sales-and-order-management pipeline. Sales teams need visibility to delivery schedules and logistics. Customer service agents need visibility of customers’ orders. To manage this complexity and make informed, tactical decisions, these employees need accurate and real-time data insights. Data workflow delays lead to misinformed decisions and slow business processes. Modern businesses that need to move quickly can’t afford this.

Customers Expect Real-Time Insights as a Part of the Modern Customer Experience

Employees and company leaders aren’t the only people who have a need for real-time data insights. Modern customer experiences are highly automated, and customers expect the data they view on the company’s Website to be current. Product availability, order status, shipping data and returns processing are where real-time operational data drive digital customer experiences. If there is a change, then customers expect to see the change reflected immediately – they have little tolerance for waiting until the next day for data to be refreshed.

Businesses evolve quickly, in big strategic ways and in small tactical ways. Real-time data and information insights are what enable all parts of your business to identify, understand and respond to changes quickly and decisively. Vector from Actian is a data-analytics-database platform that enables you to collect and harvest data insights in near real-time and at enterprise scale. This can help you accelerate your business-process execution, monitor and better respond to opportunities and threats and provide employees and customers with the data they need to be informed and effective. To learn more, visit www.actian.com.


Face the Inevitable: Local Persistent Data at the Edge Will Happen

It’s indisputable that edge intelligence will grow, whether that’s mobile applications running on smartphones, IoT applications running in smart cars (or the underlying sensors), the entertainment center, navigation systems, etc. There will be countless mobile and IoT – taken as a whole, edge scenarios – where a native application will be a better approach than a web-based application, or where it would be inefficient/potentially less secure to send raw data back from IoT collection points – rather than process the data and locally store or erase the input data.

The complexity of process and workflow at the edge, the ability to run analytics at the point of action, and working in disconnected modes or with spotty connections are all examples of why you will need local data storage and therefore a local database. It’s a foregone conclusion that data associated with these applications will mushroom.

Unfortunately, what’s equally unavoidable is that security vulnerabilities and opportunistic attacks that target these weaknesses will increase for the foreseeable future.  There have been several studies and surveys undertaken over the last few years that clearly show a far larger number of security vulnerabilities in IoT and mobile device-based software than on more mature desktop or laptop platforms, let alone software running on servers in the data center. Let’s not forget that 10 years ago, each security breach in the cloud generated a sense of panic and perhaps slowed the adoption of cloud services. This could very well be where we are now with localized and embedded data management for edge devices.

Case in point, over the weekend a very serious security vulnerability was discovered in SQLite and the web-bundled version of SQLite in Chromium (the Open Source roots for Google Chrome). While this is not the first or the largest potential breach point found in Open Source data management – after all, the Heartbleed virus in 2014 that took advantage of OpenSSL probably holds both of these records – because this vulnerability is associated with SQLite, a database that is near ubiquitous in mobile native and web-based apps, and its APIs, we should brace for the knee-jerk reaction: perhaps data shouldn’t be stored locally on edge devices and everything should be done in the cloud, where it’s assumed to be more secure (my, how times have changed).

Retrenchment Would Be An Overreaction

First off, SQLite is far better than a combination of temporary memory allocation and flat file systems, an approach I’d never recommend to anyone I call a friend. Why? Unlike memory allocation and the use of flat files, which provide little standardization, built-in indexing, or other real data manipulation, SQLite provides baseline database support for edge intelligence.

SQLite is able to run on a device to support fully optimized use of local compute resources, providing an application with the ability to handle local data management – yet offering the same set of APIs calls for a web-based version of that same app, or even work on both the native and web components of a more complex app. It handles most SQL API calls, so it’s also standard.

Settling Would Be An Equally Poor Choice

However, SQLite has many drawbacks compared to a commercial, enterprise-grade embedded database.  Most notably, it doesn’t have built-in encryption for data at rest or in transit, let alone at 128-bit or above. It also can only embed in a single application and single instance, therefore can’t be scaled up to support multiple users that need to send or receive data from that SQLite image.

For example, if you were to put SQLite on a gateway and then have multiple downstream IoT devices attempt to write data to that SQLite instance, there is no way to manage more than one client (downstream IoT device) writing to the SQLite database at a time – a requirement in an IoT environment often with tens, hundreds, or even thousands of devices downstream. However, client-server databases are capable of handling hundreds or thousands of active downstream clients; therefore, flat file and SQLite users must always pair their applications that send or receive data with MS SQL, mySQL, Oracle, or some other client-server database. This pairing guarantees that data reformatting or ETL (Extract, Transform, Load) is a necessary evil.

There are three major drawbacks of ETL that we find most data architects and developers struggle with: integration cost, performance and data security. I’ll save the cost and performance penalties for another blog, but data security is worth discussing here. In the absence of a single architecture across client and server database management, even if you had built-in encryption, you would have no choice but to decrypt and re-encrypt so that you could perform ETL functions – even if you had no other data manipulation to perform. The requirement to decrypt means your data payloads are – even if temporarily – exposed to hackers.

A Superior Way to Securely Manage Data at the Edge

The Actian Zen database family is based on a single, scalable secure architecture that allows Zen to run on VMs in the cloud, virtually any operating environment, from Windows, Linux, and Mac OS as a full-fledged client-server database to Windows IoT Core, Raspbian Linux distributions, Android, and iOS as a pared down 2MB client-only edge data management platform. Since Actian Zen runs on virtually anything with completely transferrable APIs (you can use SQL directly or NoSQL/SQL APIs programmatically from most popular programming languages), engine, and underlying file storage, it requires Zero-ETL. It also has 192-bit encryption at rest and in transit thereby removing both integration cost, data security vulnerabilities, and boost performance.

Summary

When it comes to SQLite and the recent security vulnerabilities uncovered, the response must be to plug the security vulnerabilities and reduce the risk by fixing SQLite or going to a superior enterprise class solution like Actian Zen. The answer is not to avoid or severely constrain placement of data on local devices. These constraints will throttle innovation and improved outcomes that will undoubtedly come from intelligence embedded at the point of action. Cloud security has seen marked improvements because vendors, industry customers and standards bodies, as well as government (NIST specifications, FEDRamp, etc.) have taken on the challenge, not run back to legacy environments. There is always going to be risk, but the point is to manage that risk by moving from static, reactionary and periodic checks on security to a risk-based, continuous diagnostics and monitoring approach. Expect nothing less over time for Mobile and IoT data security as vendors – like Actian – work together to help customers stay calm and keep their data at the Edge secure.

Ready to reconsider SQLite, learn more about Actian Zen.  Or, you can just kick the tires for free with Zen Core which is royalty-free for development and distribution.


Blog | Insights | | 4 min read

When Fresh Data Matters

Data collection with AI

How quickly does your business environment change? Are your leaders using outdated information, and do they know it? Do you find that your data is slowing your decision-making processes and preventing you from being truly agile? Imagine what you could do if you were to harness the power of real-time data.

Modern businesses operate in a constantly changing, intensely complex, and data-rich environment. The term “dynamic” doesn’t come close to expressing how fast things are changing. Business environments are fluid – as soon as one thing moves, everything else shifts to adapt to the change. Just because you haven’t recognized the change yet, doesn’t mean it isn’t happening. This is important for leaders to understand because it means the information they use for decision-making, unless it is real-time data, doesn’t likely reflect the current reality. If they don’t have accurate information about how things are presently, then how can they be expected to make informed decisions about the future? Guessing is a scary strategy to run a business.

New Technology for Managing Operational Data

For many years, the method companies used to move data from one place to another was through batch processing and data warehousing. Data warehousing emerged as a response to three technical constraints:

  1. Running analytics on operational systems slowed critical transactional performance.
  2. Data needed to be consolidated from different operational sources to become a single source of truth.
  3. Analytic workloads had different performance-tuning requirements than operational systems.

The good news is that many of these technical constraints have now been relieved through advances in IT infrastructure, increased computing capacity, and modern analytics tools. Unfortunately, even modern data warehousing tools have their shortcomings.

  • Batch data loads lead to delays in the current data.
  • IT change-management policies meant to ensure data quality and security increase the development time for new insights.
  • Tuning optimized for batch reporting doesn’t address ad-hoc query performance for discovery.

Tools like Vector from Actian can now enable you to access, integrate, and analyze your operational data in near real-time – creating an operational data warehouse with the scale and economy of a data lake and the consistency and performance of a data warehouse. Unique features, such as vector processing on commodity servers, multi-cloud deployment and zero-performance overhead updates, make Vector the most capable foundation for an operational data warehouse. Instead of waiting for overnight batch processing, your business processes and decision makers can access fresh data to help them understand what is occurring in your company now.

New Capabilities for Driving Increased Impact

It may not be surprising that technology has advanced and enabled some new capabilities, which weren’t available a few years ago – this is the case in all industries and all facets of business. What is exciting is the impact these capabilities can have on your company:

  • Accelerate business-process execution by avoiding data-replication delays.
  • Monitor real-time service availability and performance to prevent business disruption.
  • Fine-tune operations with real-time optimization to see immediate productivity and quality impacts.
  • Increase security and exposure to risk through real-time threat analysis.
  • Become more responsive and proactive with improved data for decision-making.

The speed of change in your business environment is accelerating. To succeed in a fluid, complex and data-rich environment, your company needs the tools to manage your operational data more effectively, so you can transform it into meaningful information and actionable insights. By removing the data-processing delay, leaders can direct quick, informed and decisive actions that enable you to minimize risks and make the most of opportunities. Fresh data is the key to becoming a truly agile organization – you can’t reach that goal if you must wait 12–24 hours for your data to refresh.

Actian is the industry leader in hybrid data management, data integration and analytics. These solutions enable you to connect seamlessly and manage your operational and analytics data for superior performance, insights.


Blog | Data Integration | | 4 min read

Data Integration: The Connective Tissue of Business

data integration with Actian

People, products, processes and systems may come and go, but, regardless, of the structural components of your business, data is the one thing that brings all the pieces together, so you can perform as a cohesive unit.

There is no question modern businesses are evolving quickly. Market opportunities are short-lived, technology advancements are happening at a startling pace and customer preferences are constantly shifting –requiring companies to become more agile in how they identify and respond to both opportunities and threats.

Business strategies, supply-chain relationships, organizational structures and business processes must all be nimble to adapt to whatever changes the company encounters.

Modern Businesses Need Agility to Survive

With all this change occurring, it isn’t surprising most companies and employees experience what seems like endless churn. People and processes that were critical yesterday are not needed tomorrow. IT systems enter and exit the company’s technology environment like a revolving door. Supplier and partner relationships continuously change, with new players appearing and others disappearing almost daily.

With all this churn, it raises the question: “How can I match the pace of change?” Companies must constantly change and adapt to survive in the age of digital disruption. They must integrate diverse sources of data, ever-growing volume, and velocity and veracity of data types. New data sources are emerging from on-premises applications, cloud and IoT, and companies need to access this data.

Agility in connecting to diverse data sources and integrating them so you can transform, manage and syndicate data to everyone who needs it is essential.

If Data is the New Oil, Then Integration is the Pipeline That Will Deliver it

It was once thought data existed to support people and processes and as an artifact of IT systems. This understanding was backward – data is the “thing” that makes companies run. People, processes and systems play the supporting role –creating, changing, merging, analyzing, moving and transforming data.

Data is the object or asset, the rest are the tools to help data flow throughout the company, so decisions can be made, products can be produced and company goals can be achieved. Think of it like cars on a highway supporting the goal of helping people travel from their various starting points to their desired destinations. The starting and ending points may change, a bridge or a stoplight may be replaced with something else, but what doesn’t change is a bunch of people on the move.

Data is what fuels your company growth and integration is what connects all the parts of your company.

How Data Can Help You Manage Change

The operational data your company produces (every second of every day) contains a wealth of information about what changes are occurring both within your business and in the environment where you operate. This information can give you insights into how you may need to adapt to take advantage of opportunities and protect your company from threats.

People, process and systems are constantly changing anyway (or so it seems) – data can help you direct this change in structured and impactful ways. Some of this data is sourced from inside your company through your day-to-day operations while other data is gathered from your environment.

The Need for a “Universal Connectivity Platform” to Connect Anything, Anywhere and Anytime

With the great diversity of IT systems your company operates directly, accesses through cloud providers, or connect to as part of the broader business ecosystem (partners, suppliers, customers, governments, banks, etc.), it can be difficult to manage all of the pieces that are coming and going to ensure your employees and leaders have access to all the data they need to do their jobs. That’s why Actian developed the DataConnect integration solution.

DataConnect is that Universal Connect™ platform that empowers anyone to integrate anything, anywhere and anytime. On-premise software, cloud services, partner systems, IoT devices and other data sources can be connected to reporting, analytics and decision-support systems, enabling seamless sharing of data across your company.

As your business evolves to match the dynamic needs of the marketplace, old systems can be removed, new systems can be added and business processes and supplier relationships can change, but your data will remain, to help you understand where you’ve been, where you are now and where you must move during the future.

Actian is the industry leader in hybrid data management, integration and analytics.

These solutions enable you to manage seamlessly and connect your operational and analytics data for superior performance, insights and business outcomes. Actian DataConnect is the universal connection platform that can help you manage your data – the connective tissue of your business, so you are better prepared to respond to the changes ahead. To learn more, visit www.actian.com.