Data Intelligence

5 Product Values That Strengthen Team Cohesion & Experience

Actian Corporation

March 14, 2022

5 Product Values that Strangthen Zeeneas Team cohension customer experience

To remain competitive, organizations must make decisions quickly, as the slightest mistake can lead to a waste of precious time in the race for success. Defining the company’s reason for being, its direction, and its strategy makes it possible to build a solid foundation for creating an alignment – subsequently facilitating decisions that impact product development. Aligning all stakeholders in product development is a real challenge for Product Managers. Yet, it is an essential mission to bring up a successful product and an obvious prerequisite to motivate teams who need to know why they get up each morning to go to work.

The Foundations of a Shared Product Vision Within the Company

Various frameworks (NorthStar, OKR, etc.) have been developed over the last few years to enable companies and their product teams to lay these foundations, disseminate them within the organization, and build a roadmap that creates cohesion. These frameworks generally define a few key artifacts and have already given rise to a large body of literature. Although versions may differ from one framework to another, the following concepts are generally found:

  • Vision: The dream, the true North of a team. The vision must be inspiring and create a common sense of purpose throughout the organization.
  • The Mission: It represents an organization’s primary objective and must be measurable and achievable.
  • The Objectives: These define measurable short and medium-term milestones to accomplish the mission.
  • The Roadmap: A source of shared truth – it describes the vision, direction, priorities, and progress of a product over time.

With a clear and shared definition of these concepts across the company, product teams have a solid foundation for identifying priority issues and effectively ordering product backlogs.

Product Values: The Key to Team Buy-in and Alignment Over Time

Although well-defined at the beginning, these concepts described above can nevertheless fall into oblivion after a while or become obsolete! Indeed, the company and the product evolve, teams change, and consequently the product can lose its direction… A work of reconsideration and acculturation must therefore be carried out continuously by the product teams in order for it to last.

Indeed, product development is both a sprint and a marathon. One of the main difficulties for product teams is to maintain this alignment over time. In this respect, another concept in these frameworks is often under-exploited when it is not completely forgotten by organizations: product values.

Jeff Steiner, Executive Chairman at LinkedIn, particularly emphasized the importance of defining company values through the Vision to Values framework. LinkedIn defines values as “The principles that guide the organization’s day-to-day decisions; a defining element of your culture”. For example “be honest and constructive”, “demand excellence”, etc.

Defining product values in addition to corporate values can be a great way for product teams to create this alignment over time and this is exactly what we do at the Actian Data Intelligence Platform.

From Corporate Vision to Product Values: A Focus on a Data Catalog

Organization & Product Consistency

We have a shared vision – “Be the first step of any data journey” – and a clear mission – “To help data teams accelerate their initiatives by creating a smart & reliable data asset landscape at the enterprise level”.

We position ourselves as a data catalog pure-player and we share the responsibility of a single product between several Product Managers. This is why we have organized ourselves into feature teams. This way, each development team can take charge of any new feature or evolution according to the company’s priorities, and carry it out from start to finish.

If we prioritize the backlog and delivery by defining and adapting our strategy and organization according to the objectives, three problems remain:

  • How do we ensure that the product remains consistent over time when there are multiple pilots onboard the plane?
  • How do we favor one approach over another?
  • How do we ensure that a new feature is consistent with the rest of the application?

Indeed, each product manager has his or her own sensitivity, his or her own background. And if the problems are clearly identified, there are usually several ways to solve them. This is where product values come into play…

Actian Data Intelligence Platform’s Product Values

If the vision and the mission help us to answer the “why?”, the product values allow us to remain aligned with the “how?”. It is a precious tool that challenges the different possible approaches to meet customer needs. And each Product Manager can refer to these common values to make decisions, prioritize a feature or reject it, and ensure a unified & unique user experience across the product.

Thus, each new feature is built with the following 5 product values as guides:

Simplicity

This value is at the heart of our convictions. The objective of a Data Catalog is to democratize data access. To achieve this, facilitating catalog adoption for end users is key. Simplicity is clearly reflected in the way each functionality is proposed. Many applications end up looking like Christmas trees with colored buttons all over the place that no one knows how to use; others require weeks of training before the first button is clicked. The use of the Data Catalog should not be reserved to experts and should therefore be obvious and fluid regardless of the user’s objective. This value was reflected in our decision to create two interfaces for our Data Catalog: one dedicated to search and exploration, and the other for the management and monitoring of the catalog’s documentation.

Empowering

Documentation tasks are often time-consuming and it can be difficult to motivate knowledgeable people to share and formalize their knowledge. In the same way, the product must encourage data consumers to be autonomous in their use of data. This is why we have chosen not to offer rigid validation workflows, but rather a system of accountability. This allows Data Stewards to be aware of the impacts of their modifications. Coupled with an alerting and auditing system after the fact, it ensures better autonomy while maintaining traceability in the event of a problem.

Reassuring

It is essential to allow end-users to trust in the data they consume. The product must therefore reassure the user by the way it presents its information. Similarly, Data Stewards who maintain a large amount of data need to be reassured about the operations for which they are responsible: have I processed everything correctly? How can I be sure that there are no inconsistencies in the documentation? What will really happen if I click this button? What if it crashes? The product must create an environment where the user feels confident using the tool and its content. This value translates into preventive messages rather than error reports, a language type, idempotency of import operations, etc.

Flexibility

Each client has their own business context, history, governance rules, needs, etc. The data catalog must be able to adapt to any context to facilitate its adoption. Flexibility is an essential value to enable the catalog to adapt to all current technological contexts and to be a true repository of data at enterprise level. The product must therefore adapt to the user’s context and be as close as possible to their uses. Our flat and incremental modeling is based on this value, as opposed to the more rigid hierarchical models offered on the market.

Deep Tech

This value is also very important in our development decisions. Technology is at the heart of our product and must serve the other values (notably simplicity and flexibility). Documenting, maintaining, and exploiting the value of enterprise-wide data assets cannot be done without the help of intelligent technology (automation, AI, etc.). The choice to base our search engine on a knowledge graph or our positioning in terms of connectivity are illustrations of this “deep tech” value.

The Take Away

Creating alignment around a product is a long-term task. It requires Product Managers – in synergy with all stakeholders – to define from the very beginning: the vision, the mission, and the objectives of the company. This enables product management teams to effectively prioritize the work of their teams. However, to ensure the coherence of a product over time, the definition and use of product values  are essential. With the Actian Data Intelligence Platform, our product values are simplicity, autonomy, trust, flexibility and deep-tech. They are reflected in the way we design and enhance our Data Catalog and allow us to ensure a better customer experience over time.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Security

Hybrid Cloud Security

Actian Corporation

March 7, 2022

Hybrid Cloud Security padlock

One of the biggest fears of cloud adoption is the security of organizational data and information. IT security has always been an issue for all organizations, but the thought of not having total control over corporate data is frightening. One of the factors for organizations not moving everything to the cloud and adopting a hybrid cloud approach is security concerns. Hybrid cloud security architectures still have security risks related to a public cloud; however, hybrid cloud risks are higher simply because there are more clouds to protect. The trust boundary is extended beyond the organization for access to its essential critical data with hybrid cloud architectures.

Sensitive data can be kept off the public cloud to help manage risk. Doing so today may be helpful, but hybrid cloud solutions are integrations between public and private clouds. This integration without the appropriate security could still make your private cloud solution vulnerable to attacks originating from the public cloud. Secure hybrid clouds have significant benefits to organizations today. Along with the great benefits of the cloud are the negative aspects and challenges faced with securing the organizations’ data. The negative aspects are continually being addressed to help realize the incredible benefits that hybrid cloud architectures can provide for organizations today.

What is Hybrid Cloud Security?

Organizational IT infrastructures have increased in complexity, especially with hybrid cloud implementations. This complexity, combined with the benefits of cloud having characteristics of broad network access, and on-demand everywhere access capabilities, complicates how securing a hybrid cloud can be done. Securing the data, applications, and infrastructure internally and externally from hackers’ malicious adversary tactics and inadvertent, unintentional activities are compounded.

Many cloud vendors have adopted industry compliance and governance security standards, especially those created by the US government, to ease the security threats and risks that an organization may experience in the cloud. The Federal Risk and Authorization Program (FedRAMP) provides standards and accreditations for cloud services. The Security Requirement Guide (SRG) provides security controls and requirements for cloud service in the Department of Defense (DOD). These standards and others help cloud vendors and organizations improve their hybrid cloud security.

Securing the cloud, an organization should consider the cloud architecture components that consist of applications, data, middleware, operating systems, virtualization, servers, storage, and networking components. Security concerns are specific to the service type. Organizations have a shared responsibility with the cloud service provider for security with hybrid cloud security.

The responsibility for hybrid cloud security should include specific disciplines. Some essential discipline areas for managing risk and securing hybrid cloud are:

  • Physical controls to deter intruders and create protective barriers to IT assets are just as important as cybersecurity for protecting assets.
    • Security parameters, cameras, locks, alarms.
    • Physical controls can be seen as the first line of defense for protecting organizational IT assets. Not only from security threats but from overall harm from environmental challenges.
    • Biometrics (one or more fingerprints, possibly retina-scans) where system access ties to extremely sensitive data.
  • Technical controls.
    • Cloud patching fixes vulnerabilities in software and applications that are targets of cyber-attacks. Besides overall keeping systems up to date, this helps reduce security risk for hybrid cloud environments.
    • Multi-tenancy security each tenant or customer is logically separated in a cloud environment. This means each tenant has access to the cloud environment, but the boundaries are purely virtual, and hackers can find ways to access data across virtual boundaries if resources are improperly assigned and data overflows from one tenant can impinge on another. Data must be properly configured and isolated to avoid interference between tenants.
    • Encryption is needed for data at rest and data in transit. Data at rest is sitting in storage, and data in transit, going across the network and the cloud layers (SaaS, PaaS, IaaS). Both have to be protected. More often than not, data at rest isn’t encrypted because it’s an option that is not turned on by default.
    • Automation orchestration is needed to remove slow manual responses for hybrid cloud environments. Monitoring, checking for compliance, appropriate responses, and implementations should be automated to eliminate human error. These responses should also be reviewed and continuously improved.
    • Access controls – People and technology accesses should always be evaluated and monitored on a contextual basis including date, time, location, network access points, and so forth. Define normal access patterns and monitor for abnormal patterns and behavior, which could be an alert to a possible security issue.
    • Endpoint security for remote access has to be managed and controlled. Devices can be lost, stolen, or hacked, providing an access point into a hybrid cloud and all of its data and resources. Local ports on devices that allow printing or USB drives would need to be locked for remote workers or monitored and logged when used.
  • Administrative controls to account for human factors in cloud security.
    • Zero trust architecture (ZTA), principles and policy continually evaluate trusted access to cloud environments to restrict access for only minimum privileges. Allowing too much access to a person or technology solution can cause security issues. Adjustments to entitlements can be made in real-time, for example, is a user suddenly downloading far more documents? Are those documents outside his or her normal scope of work or access?  Of course, this requires data governance that includes tagging and role-based access that maps entitlements to tagging.
    • Disaster recovery – Performing business impact analysis (BIA) and risk assessments are crucial for performing disaster recovery and deciding how hybrid cloud architectures should be implemented. Including concerns related to data redundancy and placement within a cloud architecture for service availability and rapid remediation post attack.
    • Social engineering education and technical controls for phishing, baiting, etc. Social engineering is an organizational issue and a personal issue for everyone.  Hackers can steal corporate data and personal data to access anything for malicious purposes.
    • A culture of security is critical for organizations. The activities of individuals are considered one the most significant risk to the organization. Hackers target their access to any organization through the organization’s employees as well as partners and even third-party software vendors and services contractors. The employees, contractors, and partners need to be educated continuously to help avoid security issues that can be prevented with training and knowledge.
  • Supply chain controls.
    • Software, infrastructure, and platform from 3rd parties have to be evaluated for security vulnerabilities. Software from a 3rd party supplier, when installed, could have security vulnerabilities or have been hacked that allow criminals complete access to an organization’s hybrid cloud environment. Be sure to check how all 3rd party software vendors approach and practice safe security controls over their products.

Security in the cloud is a shared responsibility that becomes more complex as deployments are added. Shared Services are a way to deliver functions such as security, monitoring, authorization, backups, patching, upgrades, and more in a cost-effective, reliable way to all clouds. Shared services reduce management complexity and are essential to achieve a consistent security posture across your hybrid cloud security architecture.

Configuration Management and Hybrid Cloud Security

Hybrid cloud security architecture risks are higher simply because there are more clouds to protect. For this reason, here are a few extra items that you should put on your hybrid cloud security best practices list, including visibility, shared services, and configuration management. First, you can’t secure what you can’t see. Hybrid cloud security requires visibility across the data center and private and public cloud borders to reduce hybrid cloud risks resulting from blind spots.

Another area to focus on is configuration management since misconfigurations are one of the most common ways for digital criminals to land and expand in your hybrid cloud environments. Encryption isn’t turned on, and access hasn’t been restricted; security groups aren’t set up correctly, ports aren’t locked down. The list goes on and on. Increasingly, hybrid cloud security teams need to understand cloud infrastructure better to secure it better and will need to include cloud configuration auditing as part of their delivery processes.

One of the Hybrid cloud security tools that can be utilized is a Configuration Management System (CMS) using configuration management database (CMDB) technology as the foundation that can help organizations gain visibility into hybrid cloud configurations and the relationships of all cloud components. The first activity with a CMS involves discovering all cloud assets or configuration items that make up the services being offered. At this time, a snapshot of the environment is made with essential details of the cloud architecture. Once discovering their hybrid cloud architecture, many organizations immediately look for security concerns that violate security governance.

Once the CMS is in place, other hybrid cloud security tools such as drift management and monitoring changes in the cloud architecture can alert to cloud attacks. Once the unauthorized drift is detected, other automation tools to correct and alert can be implemented to counterattack the attack. The CMS and the CMDB support cloud security operations and other service management areas, such as incident, event, and problem management, to help provide a holistic solution for the organization’s service delivery and service support.

Conclusion

Security issues in hybrid cloud computing aren’t that different from security issues in cloud computing. You can review the articles on Security, Governance, and Privacy for the Modern Data Warehouse, Part 1 and Part 2, that provide a lot of pointers on how to protect your data and cloud services.

Hybrid cloud security risks and issues will be one of those IT organizational business challenges that will be around for a long time. Organizations need to stay informed and have the latest technologies and guidance for combating the hybrid cloud security issues and threats. This includes partnering with hybrid cloud solution providers such as Actian. It is essential for the organization’s ability to function with consistently changing cloud security needs.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Interview With Ruben Marco Ganzaroli – CDO at Autostrade per l’Italia

Actian Corporation

March 3, 2022

We are pleased to have been selected by Autostrade per l’Italia – a European leader among concessionaires for the construction and management of toll highways – to deploy the Actian Data Intelligence Platform’s data catalog at the group level. We took this opportunity to ask Ruben Marco Ganzaroli a few questions, who joined the company in 2021 as a Chief Data Officer to support the extensive Next to Digital program, to digitally transform the company. A program with a data catalog as its starting point.

Q: CDOs are becoming critical to a C-level team. How important is data to the strategic direction of Autostrade per l’Italia?

Data is at the center of the huge Digital Transformation program started in 2021, called « Next to Digital », which aims at transforming Autostrade per l’Italia into a Sustainable Mobility Leader. We wanted to protect whoever is traveling on our highways, execute decisions faster, as well as be agile and fluid. We not only want to react immediately to what is happening around us, but also to be able to anticipate events and take action before they occur. The Company was started in the early 50s – last century, and we realized that all the data we collected throughout the years could be a unique advantage and a strong lever to transform the company.

Q: What are the main challenges you want to address by implementing a data catalog in your organization?

We think that only the business functions of the Autostrade group can truly transform the company into a data-driven one. To do this, business functions need to be supported by the right tools – efficient and usable – and they must be fully aware of the data they have available. Ideas, and therefore value, are generated only if you have a clear idea of ​​the environment in which you are moving within, and the objective you are aiming for. If, without knowing it, you have a gold bar under your mattress, you will sleep uncomfortably and realize that you could do something to improve your situation – probably by changing mattresses, for example. However, if you are aware that you have that gold bar, you will lift the mattress, take the bar, and turn it into a jewel – maximizing its value.

The data catalog builds the bridge between business and data at Autostrade. It is the tool that allows business users to have knowledge on the fact that there are many gold bars available and to know where they can be found.

Q: What features were you looking for in a data catalog and that you found in the platform?

From a business perspective, a data catalog is the access point to all data. It must be fast, complete, easy to understand and user friendly, and represent a lever (not an obstacle). Business users must not be forced to spend the majority of their time on it. Whereas from an IT perspective, a data catalog must be agile, scalable, as well as quickly and continuously upgradeable as data is continuously being ingested or created.

Q: What is your vision of a data catalog in the data management solutions’ ecosystem?

We don’t think of the catalog as a tool, but as a part of the environment we need, as IT, to make available to the business functions. This ecosystem naturally includes tools, but what’s also important is the mindset of its users. To lead this mindset change, business functions must be able to work with data, and that’s the reason Self-BI is our main goal for 2022 as CDO Office. As mentioned previously, the catalog is the starting point for all of that. It is the door that lets the business in the data-room.

Q: How will you drive catalog adoption among your data teams?

All leaders from our team, Leonardo B. for the Data Product, Fulvio C. for Data Science, Marco A. and Andrea Q. for Data Engineering and Cristina M. as Scrum (super)Master are focused on managing the program. This program foresees an initial training phase for business users, an on-the-job dedicated support and an on-the-room support. Business users will participate in the delivery of their own analysis. We will onboard business functions incrementally, to focus the effort and maximize the effectiveness of each business function. The goal is to onboard all business functions within 2022: it represents a lot of work, but is made easier by knowing that there is a whole company behind that supports us and strongly believes that we are going in the right direction.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Architecture

Enterprise Data Warehouse

Teresa Wingfield

March 2, 2022

Enterprise Data Warehouse

Do you need a data warehouse or Enterprise Data Warehouse (EDW) in your organization today? Its functional units, people, services, and other organizational assets support the vision and mission of an organization. An organization can have strategies, tactics, and operational activities that are performed to meet the overall vision and mission of the organization for service to its customers. To be high-performing today requires utilizing the power of data supported by innovations in information technology. People and advanced technologies are making decisions every day to support the organization’s success. Data drives decisions.

Today, one of the essential tools is an Enterprise Data Warehouse to support effective decisions using a single source of truth across the organization. The organization has to work as a high-performing team, exchanging data, information, and knowledge for decisions, and the EDW plays a central role. As organizations move their IT to the Cloud, the EDW is also transforming and moving there as well, further improving the organization’s business decision-making.

What is an Enterprise Data Warehouse?

An Enterprise Data Warehouse is a central repository of data to support a value chain of business practice interactions between all functional units within the organization. Also known as a EDW Data Warehouse, where data is collected from multiple sources and normalized for data-driven insightful analytical decisions across the organization for the services and products delivered and supported for its customers. Data is transformed into information, information into knowledge, and knowledge into decisions for analytics and overall Business Intelligence (BI). Enterprise Data Warehouse Reporting capabilities take advantage of the EDW to provide the organization with needed business and customer insights.

Enterprise Data Warehouse architecture makes use of an Extract, Transform, and Load (ETL) process to ingest, consolidate, and normalize data for organizational use. The data is modeled based on decisions that need to be made by the stakeholders and put in a consistent format for usage and consumption with integrated technologies and applications. The organization’s sales, marketing, and other teams can use the EDW for an end-to-end perspective of the organization and the customers that are being serviced. Enablement of the EDW utilizing the cloud is a plus because of the power of cloud technologies today in making data accessible anywhere and anytime.

The basic Enterprise data warehouse requirements that make up the EDW system include but are not limited to the following components:

  • Data sources – databases, including a transactional database, and other files with various formats
  • Data transformation engine – ETL tools (typically an external third-party tool)
  • The EDW database repository itself
  • The database administration for creation, management, and deletion of data tables, views, and procedures
  • End-user tools to access data or perform analytics and business intelligence

Leveraging data from multiple data silos into one unified data repository that contains all business data is powerful. An EDW is a database platform of multidimensional business data that different parts of the organization can use. The EDW has current and historical information that can be easily modified, including the model to support changes in business needs. EDWs can support additional sources of data quickly without redesigning the system. As the organization learns how to use the data and gives feedback, the solution can transform rapidly to support the organization and the data stakeholders. As the organization matures, so can the data in the EDW rapidly mature.

Enterprise Data Warehouse vs. Data Mart

An Enterprise Data Warehouse becomes a single source of truth for organizational decisions that need collaboration between multiple functional areas in the organization. The EDW can be implemented as a one-tier architecture with all functional units accessing the data in the warehouse. EDW can also be implemented with the addition of Data Marts (DM). The difference between DM and EDW is the DW is much smaller and focused than an enterprise data warehouse. Enterprise Data Warehouse services are also for the entire organization, whereas a Data Mart is usually for a single line of business within the organization.

Data Marts contain domain or unique functional data, such as only sales data or marketing data. The data mart can extend the usage of the EDW using a two-tier architecture leveraging on-premise and/or the cloud capabilities that use the EDW as a source of data for specific use cases. Data marts typically involve integration from a limited number of data sources and focus on a single line of business or functional unit. The size of a data mart is in gigabytes versus terabytes for an EDW. Data Marts do not have to use an EDW as a data source but can use other sources specific to needs.

Organizations may want to use a data mart or multiple data marts to help increase the security of the EDW by limiting access to only domain-specific data through the data mart if using a two-tier architecture. An organization may also use the data mart to reduce the complexity of managing access to EDW data for a single line of business.

Choosing between EDW and a data mart does not have to be one or the other. Both are valuable. Remember, the outcome is to provide data for high performing decision support within the organization. EDW helps bring the bigger organization perspective for delivering and supporting business services. Data marts can complement the EDW to optimize performance and data delivery. Overall, enterprise-wide performance for decisions, reporting, analytics, and business intelligence is best done with a solution that spans the organization. A complete end-to-end value view of customers, products, tactics, and operations that support the organizational vision and mission will benefit everyone in the organization, including the customers.

Data Marts are easier and quicker to deploy than an EDW and cost less. A line of business can derive value quickly with a solution that can be deployed faster with a limited scope, fewer stakeholders, less modeling, and integration complexity than an EDW. The data mart will be designed specifically for that line of business to support their ability to work in a coordinated, collaborative way within their function. This can help create a competitive advantage against competitors by enabling better data analytics for decision support within a specific line of business or functional unit.

Enterprise Data Warehouse and the Cloud

Cloud Enterprise Data Warehouse (EDW) takes advantage of the value of the cloud in the same manner as many other cloud services that are becoming the norm for many organizations. The EDW itself may be better suited to reside in the cloud instead of on-premise. The cloud provides:

  • The flexibility to build out and modify services in an agile manner.
  • The potential to scale almost infinitely.
  • The assurance of enhanced business continuity.
  • The ability to avoid capital expenditures (CapEx).

Organizations can still choose to architect hybrid-cloud solutions for EDW that take advantage of on-premise organizational capabilities and vendor cloud capabilities. EDW should be planned using expertise focused on organizational constraints and business objectives for best long-term solutions that can take advantage of the ease of use with continuous improvement of the EDW solution. This use of expertise includes using Data Marts in the solution for maximum benefit to the organization.

Conclusion

EDWs architecture can be challenging for bringing the organization’s data into one database, especially all simultaneously. Organizations should design for the big picture and deploy incrementally, starting with specific business challenges or specific lines of business. This will create patterns of success for improving the next increment. This will also help with the faster delivery of a solution that can benefit the organization without the complete solution being finished.

In many instances, an organization can’t simply rely on silos of line-of-business data marts. They need enterprise data warehouse reporting to get a complete view of customers, products, operations, and more to make decisions that best benefit the whole company. Yes, enterprise data warehouse architectures can be painful. In most instances, you can deploy incrementally, starting with specific domains or business challenges. This will help you deliver value faster and evolve into the holistic purpose your EDW is intended to serve.

The power of having a cross-organizational repository of meaningful data to enable better decision-making and overall better service delivery and support for the customer outweigh the challenges with the architecture. An organization that does this successfully will gain improved marketability, sales, and overall better relationships with its customers. The business data insights will also enable the organization to position its internal assets more appropriately based on the improvements in data insights, analytics, and business intelligence.

Managing and utilizing data for an organization has to be done effectively, efficiently, and economically for value. Data is the organization’s lifeblood that supports the long-term viability of the organization itself. An organization that is not informed and does not view data as a point of contention for business service performance and decisions may find themselves optional in the marketplace. An EDW can help with the organization’s current and future business decision needs.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Management

5 Uses Cases for Hybrid Cloud Data Management

Traci Curran

February 25, 2022

Hybrid Cloud Data Management

With the rise of cloud computing, many organizations are opting to use a hybrid approach to their data management. Even though many companies still rely on on-premises storage, the benefits of having cloud storage as a backup or disaster recovery plan can be significant. This post will give you five of the most popular use cases for hybrid cloud data management.

Why Hybrid Cloud Data Management?

Hybrid cloud data management isn’t a new concept, but it’s finally starting to hit its stride as a viable option for enterprise data management.  It utilizes a mixture of on-premises and cloud storage and cloud computing to handle all aspects of a company’s data needs. Often, it’s the merger of on-premises databases or enterprise data warehouses (EDW) with cloud storage, SaaS application data and/or a cloud data warehouse (CDW). The benefits of this hybrid approach are twofold: it provides a backup plan for disaster recovery situations, and it gives an organization the ability to scale up as needed without purchasing additional hardware.

Backup and Disaster Recovery

One of the most obvious benefits of hybrid cloud data management is that it provides a backup for your data. If your on-premises storage system fails or you lose some important data, you can rely on your cloud storage to get it back. It will act as an additional fail-safe plan in case anything happens to your on-site server.

Data Accessibility

Data is not just one homogeneous entity. Many companies can feel hampered by data access. They may not have the in-house expertise or budget to handle the IT demands of data storage and real-time access. Through a hybrid cloud environment, your business can access data and applications stored in both on-premises and off-site locations. Global companies can store data closer to applications or users to improve processing time and reduce latency without having to have local data centers or infrastructure.

Data Analytics

Currently, many businesses are combining internal data sources with external data sources from partners or public sources for improved data analytics. A hybrid data warehouse can allow data teams to combine this third-party data with internal data sources to gain greater insights for decision-making. Data engineers can reduce the amount of effort required to source and combine data needed for users to explore new analytical models.

Data Migration

When an organization migrates their storage to the cloud, they can take advantage of public, private, and hybrid cloud solutions. This means utilizing a host of services, including backup storage, disaster recovery solutions, analytics, and more. All while paying less money on infrastructure costs and avoiding large capital expenses.

Data Compliance

The adoption of a hybrid data warehouse can relieve some of the compliance burdens that can often accompany stored data. For example, retired systems may leave behind orphaned databases, often with useful, historic data. This can create a data gap for analytic teams, but it can also pose a security and compliance risk for the business. Cloud service providers have teams of experts that work with governments and regulators globally to develop standards for things such as data retention times and security measures. Additionally, leveraging the cloud for data storage can also help address the challenges of data residency and data sovereignty regulations, which can become complex as data moves across geographical boundaries.

Regardless of where you are on your cloud journey, data is the most valuable asset to any organization. The cloud is an increasingly important component as businesses look for ways to leverage their data assets to maintain competitive advantage. Learn more about how the Actian Data Platform is helping organizations unlock more value from their data.

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Leader

Does Your Organization Have a Data Platform Leader? It Could Soon.

Teresa Wingfield

February 17, 2022

data platform leader

There’s no one-size-fits-all solution for a modern data platform, and there likely never will be with the proliferation of multiple public and private cloud environments, entrenched on-premises data centers, and the exponential rise in edge computing – data sources are multiplying almost at the rate of data itself.

Today’s data platforms increasingly take a broad multi-platform approach that incorporates a wide range of data services (e.g. data warehouse, data lake, transactional database, IoT database and third-party data services),  and integration services that support all major clouds and on-premise platforms and applications that run on and across these environments. Modern data platforms need a data fabric – technology that enables data that is distributed across different areas to be accessed in real-time in a unifying data layer,  – to drive data flow orchestration, data enrichment, and automation To meet the varied requirements of users across an organization including data engineers, data scientists, business analysts and business users, the platform should also incorporate shared management and security services, as well as support a wide range of application development and analytical tools.

However, these needs create a singular challenge: who’s going to manage the creation and maintenance of such a platform? That’s where the role of the platform leader comes in. Just as we’ve seen the creation of roles like Chief Data Officer and Chief Diversity Officer in response to critical needs, organizations require a highly skilled individual to manage the creation and maintenance of their platform(s). Enter the data platform leader – someone with a broad understanding of databases and streaming technologies, as well as a practical understanding of how to facilitate frictionless access to these data sources, how to formulate a new purpose, vision and mission for the platform and how to form close partnerships with analytics translators. We’ll get to those folks in a minute.

Developing a New Purpose, Vision and Mission

Why must a data platform leader develop a new purpose, vision and mission? Consider this: data warehouse users have traditionally been data engineers, data scientists and business analysts who are interested in complex analytics. These users typically represent a relatively small percentage of an organization’s employees. The power and accessibility of a data platform capable of running not just in the data center, but also in the cloud or at the edge, will invariably bring in a broader base of business users who will use the platform to run simpler queries and analytics to make operational decisions.

However, accompanying these users will be new sets of business and operational requirements. To satisfy this ever-expanding user base and their different requirements, the data platform leader will need to formulate a new purpose for the platform (why it exists), a new vision for the platform (what it hopes to deliver) and a new mission (how will it achieve the vision).

Facilitating Data Service Convergence

Knowledge of relational databases with analytics-optimized schemas and/or analytic databases has long been part of a data warehouse manager’s wheelhouse. However, the modern data platform extends access much further, enabling access to data lakes and transactional and IoT databases, and even streaming data. Increasing demand for real-time insights and non-relational data that can enable decision intelligence are bringing these formerly distinct worlds closer together. This requires the platform leader to have a broad understanding of databases and streaming technologies as well as a practical understanding of how to facilitate frictionless access to these data sources.

Enabling Frictionless Data Access

A data warehouse typically includes a semantic layer that represents data so end users can access that data using common business terms. A modern data platform, though, demands more. While a semantic layer is valuable, data platform leaders will need to enable more dynamic data integration than is typically sufficient to support a centralized data warehouse design. Enter the data fabric to provide a service layer that enables real-time access to data sourced from the full range of the data platform’s various services. The data fabric offers frictionless access to data from any source located on-premises and in the cloud to support the wide range of analytic and operational use cases that such a platform is intended to serve.

Working With Analytics Translators

I mentioned earlier that data platform leaders would need the ability to form close partnerships with analytics translators. Let’s start with what an analytics translator does and then we’ll get to why a close relationship is important.

According to McKinsey & Company, the analytics translator serves the following purpose:

“At the outset of an analytics initiative, translators draw on their domain knowledge to help business leaders identify and prioritize their business problems, based on which will create the highest value when solved. These may be opportunities within a single line of business (e.g., improving product quality in manufacturing) or cross-organizational initiatives (e.g., reducing product delivery time).”

I expect the analytics translator and the data platform leader will become important partners. The analytics translator will be invaluable in establishing data platform priorities, and the platform leader will provide the analytics translator with key performance indicators (KPIs) on mutually-agreed-upon usage goals.

In conclusion, the data platform leader has many soft and hard skillset requirements in common with a data warehouse manager, but there are a few fundamental and significant differences. The key difference includes developing a new purpose, vision and mission, having expertise in new data services and data fabrics, knowing how best to access those services, and possessing the ability to form close partnerships with analytics translators.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Intelligence

What Makes a Data Catalog “Smart”? #5 – User Experience

Actian Corporation

February 16, 2022

smart-data-catalog-5-user-experience

A data catalog harnesses enormous amounts of very diverse information, and its volume will grow exponentially. This will raise 2 major challenges:

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
  • How to find the most relevant datasets for any specific use case?

A data catalog should be Smart to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect, we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

  1. Metamodeling
  2. The data inventory
  3. Metadata management
  4. The search engine
  5. User experience

A data catalog should also be smart in the experience it offers to its different pools of users. Indeed, one of the main challenges with the deployment of a data catalog is its level of adoption from those it is meant for: data consumers. And user experience plays a major role in this adoption.

User Experience Within the Data Catalog

The underlying purpose of user experience is the identification of personas whose behavior and objectives we are looking to model in order to provide them with a slick and efficient graphic interface. Pinning down personas in a data catalog is challenging – it is a universal tool that provides added value for any company regardless of its size, across all sectors of activity anywhere in the world.

Rather than attempting to model personas that are hard to define, it’s possible to handle the situation by focusing on the issue of data cataloging adoption. Here, there are two user populations that stand out:

  • Metadata producers who feed the catalog and monitor the quality of its content – this population is generally referred to as Data Stewards.
  • Metadata consumers who use the catalog to meet their business needs – well will call them Users.

These two groups are not totally unrelated to each other of course: some Data Stewards will also be Users.

The Challenges of Enterprise-Wide Catalog Adoption

The real value of a data catalog resides in large-scale adoption by a substantial pool of (meta) data consumers, not just the data management specialists.

The pool of data consumers is very diverse. It includes data experts (engineers, architects, data analysts, data scientists, etc.), business people (project managers, business unit managers, product managers, etc.), compliance and risk managers. And more generally, all operational managers are likely to leverage data to improve their performances.

Data Catalog adoption by Users is often slowed down for the following reasons:

  • Data catalog usage is sporadic. They will log on from time to time to obtain very specific answers to specific queries. They rarely have the time or patience to go through a learning curve on a tool they will only use periodically – weeks can go by between catalog usage.
  • Not everyone has the same stance on metadata. Some will focus more on technical metadata, others will focus heavily on the semantic challenges, and others might be more interested in the organizational and governance aspects.
  • Not everybody will understand the metamodel or the internal organization of the information within the catalog. They can quickly feel put off by an avalanche of concepts that feel irrelevant to their day-to-day needs.

The Smart Data Catalog attempts to jump these hurdles in order to accelerate catalog adoption. Here is how the Actian Data Intelligence Platform meets these challenges.

How the Actian Data Intelligence Platform Facilitates Catalog Adoption

The first solution is the graphic interface. The Users’ learning curve needs to be as short as possible. Indeed, the User should be up and running without the need for any training. To make this possible, we made a number of choices.

The first choice was to provide two different interfaces, one for the Data Stewards and one for the Users:

Studio: The management and monitoring tool for the catalog content – an expert tool solely for the Data Stewards.

Explorer: For the Users, it provides them with the simplest search and exploration experience possible.

Our approach is aligned with the user-friendly principles of marketplace solutions – the recognized specialists in catalog management (in the general sense). These solutions usually have two applications on offer. The first, a “back office” solution, which enables the staff of the marketplace (or its partners) to feed the catalog in the most automated manner possible and control its content to ensure its quality. The second application, for the consumers, usually takes the form of an e-commerce website and enables end-users to find articles or explore the catalog. Studio and Explorer reflect these two roles.

The Information is Ranked in Accordance With the Role of the User Within the Organization

Our second choice is still at the experimental stage and consists in dynamically adapting the information hierarchy in the catalog according to User profiles.

This information hierarchy challenge is what differentiates a data catalog from a marketplace type catalog. Indeed, a data catalog’s information hierarchy depends on the operational role of the user. For some, the most relevant information in a dataset will be technical: location, security, formats, types, etc. Others will need to know the data semantics and their business lineage. Others still will want to know the processes and controls that drive data production – for compliance or operational considerations.

The Smart Data Catalog should be able to dynamically adjust the structure of the information to adapt to its different prisms. 

The last remaining challenge is the manner in which the information is organized in the catalog in the form of exploration paths by theme (something similar to shelving in a marketplace). It is difficult to find a structure that agrees with everybody. Some will explore the catalog along technical lines (systems, applications, technologies, etc.). Others will explore the catalog from a more functional perspective (business domains), others still from a semantic angle (through business glossaries, etc.).

The challenge of having everyone agree on a sole universal classification seems (to us) insurmountable. The Smart Data Catalog should be adaptable and should not ask Users to understand a classification that makes no sense to them. Ultimately, user experience is one of the most important success factors for a data catalog.

For more information on how a Smart search engine enhances a Data Catalog, download our eBook: What is a Smart Data Catalog?”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What Makes a Data Catalog “Smart”? #4 – The Search Engine

Actian Corporation

February 16, 2022

smart-data-catalog-4-search-engine

A data catalog harnesses enormous amounts of very diverse information, and its volume will grow exponentially. This will raise 2 major challenges:

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
  • How to find the most relevant datasets for any specific use case?

We think that a data catalog should be Smart to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect, we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

  1. Metamodeling
  2. The data inventory
  3. Metadata management
  4. The search engine
  5. User experience

A Powerful Search Engine for an Efficient Exploration

Given the enormous volumes of data involved in an enterprise catalog, we consider the search engine the principal mechanism through which users can explore the catalog. The search engine needs to be easy to use, powerful, and, most importantly, efficient – the results must meet user expectations. Google and Amazon have raised the bar very high in this respect, and the search experience they offer has become a reference in the field.

This second-to-none search experience can be summed up thus:

  • I write a few words in the search bar, often with the help of a suggestion system that offers frequent associations of terms to help me narrow down my search.
  • The near-instantaneous response provides results in a specific order and I fully expect to find the most relevant one on page one.
  • Should this not be the case, I can simply add terms to narrow the search down even further or use the available filters to cancel out the non-relevant results.

Alas, the best currently on offer in the data cataloging market in terms of search capabilities seems to be limited to capable systems indexations, scoring, and filtering. This approach is satisfactory when the user has a specific idea of what they are looking for (high intent search) but can prove disappointing when the search is more exploratory (low intent search) or when the idea is simply to spontaneously suggest relevant results to a user (no intent).

In short, simple indexation is great for finding information whose characteristics are well known but falls short when the search is more exploratory. The results often include false positives and the order in which the search comes out is over-represented with exact matches.

A Multidimensional Search Approach

We decided from the get-go that a simple indexation system would prove limited and would fall short of providing the most relevant results for the users. We, therefore, chose to isolate the search engine in a dedicated module on the platform and to turn it into a powerful innovation (and investment) zone.

We naturally took an interest in the work of the founders of Google on Page Rank, their algorithm. Page Rank takes into account several dozen aspects (called features), amongst which are the density of the relation between different graph objects (hypertext links in the case of internet pages), the linguistic treatment of search terms, or the semantic analysis of the knowledge graph.

Of course, we do not have the means Google has, nor its expertise in terms of search result optimization. But we have integrated into our search engine several features that provide a high level of relevant results, and those features are permanently evolving.

We have integrated the following core features:

  • Standard, flat, indexation of all the attributes of an object (name, description, and properties) weighing it up in accordance with the type of property.
  • An NLP layer (Natural Language Processing) that takes into account the near misses (typing or spelling errors).
  • A semantic analysis layer that relies on the processing of the knowledge graph.
  • A personalization layer that currently relies on a simple user classification according to their uses, and will in the future be enriched by individual profiling.

Smart Filtering to Contextualize and Limit Search Results

To complete the search engine, we also provide what we call a smart filtering system. Smart filtering is something we often find on e-commerce websites (such as Amazon, booking.com, etc.) and it consists in providing contextual filters to limit the search result. These filters work in the following way:

  • Only those properties that help reduce the list of results are offered in the list of filters – non-discriminating properties do not show up.
  • Each filter shows its impact – meaning the number of residual results once the filter has been applied.
  • Applying a filter refreshes the list of results instantaneously.

With this combination of multi-dimensional search and smart filtering, we feel that we offer a superior search experience to any of our competitors. And our decoupled architecture enables us to explore new approaches continuously, and rapidly integrate those that seem efficient.

For more information on how a Smart search engine enhances a Data Catalog, download our eBook: What is a Smart Data Catalog?”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What Makes a Data Catalog “Smart”? #3 – Metadata Management

Actian Corporation

February 16, 2022

smart-data-catalog-3-metadata-management

A data catalog harnesses enormous amounts of very diverse information, and its volume will grow exponentially. This will raise 2 major challenges:

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
  • How to find the most relevant datasets for any specific use case?

A data catalog should be Smart to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect, we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

  1. Metamodeling
  2. The data inventory
  3. Metadata management
  4. The search engine
  5. User experience

It is in the field of metadata management that the notion of the Smart Data Catalog is most commonly associated with algorithms, machine learning, and AI.

How is Metadata Management Automated?

Metadata management is the discipline that consists of valuing the metamodel attributes for the inventoried assets. The workload required is usually proportional to the number of attributes in the metamodel and the number of assets in the catalog.

The role of the Smart Data Catalog is to automate this activity as much as possible, or at the very least to help the human operators (Data Stewards) do so in order to ensure greater productivity and reliability.

As seen in our last article, a smart connectivity layer enables the automation of part of the metadata but this automation is very much restricted to a limited subset of the metamodel – mostly technical metadata. A complete metamodel, even a modest one, also has dozens of metadata that cannot be extracted from the source systems registries (because they are not there, to begin with).

To solve this equation, several approaches are possible:

Pattern Recognition

The most direct approach consists in looking to identify patterns in the catalog in order to suggest metadata values for new assets.

Put simply, a pattern will include all the metadata of an asset and the metadata of its relations with other assets or other catalog entities. Pattern recognition is typically done with the help of machine learning algorithms.

The difficulty with the implementation of this approach is precisely qualifying the information assets in a numerical form in order to feed the algorithms and select the relevant patterns. A simple structural analysis is not enough: two datasets can contain identical data but in different structures. Relying on the identity of the data isn’t efficient either: two datasets can contain identical information but with different values. For example, 2020 client invoicing in one dataset, 2021 client invoicing in the other.

In order to solve this problem, the Actian Data Intelligence Platform relies on a technology called fingerprinting. In order to build the fingerprint, we pull up 2 types of features from our clients’ data:

  • A group of features adapted to the numerical data (mostly statistical indicators).
  • Data emanating from word embedding models (word vectorization) for the textual data.

Fingerprinting is at the heart of our intelligent algorithms.

The Other Embedded Approaches in a Suggestion Engine

While pattern recognition is indeed an efficient approach for suggesting the metadata of a new asset in a catalog, it rests on an important prerequisite: in order to recognize a pattern, there has to be one to recognize. In other words, this only works if there are a number of assets in the catalog (which is obviously not the case at the start of a project).

And it’s precisely in these initial phases of a catalog project that the metadata management load is the highest. It is, therefore, crucial to include other approaches likely to help the Data Stewards in these initial phases, when a catalog is more or less empty.

The the Actian Data Intelligence Platform suggestion engine, which provides intelligent algorithms to assist the management of the metadata, also provides other approaches (which we enrich regularly). 

Here are some of these approaches:

  • Structural similarity detection.
  • Fingerprint similarity detection.
  • Name approximation.

This suggestion engine, which analyzes the catalog content in order to determine the probable values of the metadata from the assets that have been integrated, is an everlasting subject of experimentation. We regularly add new approaches, sometimes very simple and sometimes much more sophisticated. In our architecture, it is a dedicated service whose performances improve as the catalog grows and as we enrich our algorithms.

Actian Data Intelligence Platform has chosen to use the lead time as our main measuring metric for the productivity of the Data Stewards (which is the ultimate objective of smart metadata management). Lead time is a notion that stems from lean management and which measures, in a data catalog context, the time elapsed between the moment an asset is inventoried and the moment all its metadata has been valued.

For more information on how Smart metadata management enhances a Data Catalog, download our eBook: What is a Smart Data Catalog?”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What Makes a Data Catalog “Smart”? #2 – The Data Inventory

Actian Corporation

February 16, 2022

smart-data-catalog-2-data-inventory

A data catalog harnesses enormous amounts of very diverse information, and its volume will grow exponentially. This will raise 2 major challenges:

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
  • How to find the most relevant datasets for any specific use case?

A data catalog should be Smart to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect, we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

  1. Metamodeling
  2. The data inventory
  3. Metadata management
  4. The search engine
  5. User experience

The second way to make a data catalog “smart“ is through its inventory. A data catalog is essentially a thorough inventory of information assets that include a bunch of metadata, which helps harness the information as efficiently as possible. Setting up a data catalog, therefore, depends first of all on an inventory of the assets from the different systems.

Automating the Inventory: The Challenges

A declarative approach to building the inventory doesn’t strike us as particularly smart, however well thought out it may be. It involves a lot of work at the launching and the up-keeping of the catalog – in a fast-changing digital landscape, the initial effort quickly becomes redundant.

The first step in creating a smart inventory is of course to automate it. With a few exceptions, enterprise datasets are managed by system specialists (involving distributed filing systems, ERPs, relational databases, software packages, data warehouses, etc.). They manage all these systems along with all the metadata required for them to work properly. There is no need to recreate this information manually: you just need to connect to the different registries and synchronize the catalog content with the source systems.

In theory, this should be straightforward but putting it into practice is actually rather difficult. The fact is, there is no universal standard to which the different technologies conform for a universal means of access to their metadata.

The Essential Role of Connectivity to the System Sources

A smart connectivity layer is a key part of the Smart Data Catalog. For a more detailed description of the Actian Data Intelligence Platform’s connectivity technology, I recommend reading our previous eBook, The 5 Technological Breakthroughs of a Next-Generation Catalog, but its main characteristics are:

  • Proprietary – We do not rely on third parties so as to maintain a highly specialized extraction of the metadata.
  • Distributed – In order to maximize the reach of the catalog.
  • Open – Anyone looking to enrich the catalog can develop their own
  • connectors with ease.
  • Universal – It can synchronize any source of metadata.

This connectivity can not only read and synchronize the metadata contained in the source registries, it can also produce metadata.

This production of metadata requires more than simple access to the source system registries. It also requires access to the data itself, which will be analyzed by our scanners in order to enrich the catalog automatically.

To date, we produce 2 types of metadata:

  • Statistical analysis: To build a profile of the data – value distribution, rate of null values, top values, etc. (the nature of the metadata depends obviously on the native type of the data being analyzed).
  • Structural analysis: To determine the operational type of specific textual data (email, postal address, social security number, client code, etc. – the system is scalable and customizable).

The Inventory Mechanism Must Also be Smart

Our inventory mechanism is also smart in several ways:

  • Dataset detection relies on extensive knowledge of the storage structures, particularly in a Big Data context. For example, an IoT dataset made up of thousands of files of time series measures can be identified as a unique dataset (the number of files and their location being only metadata).
  • The inventory is not integrated into the catalog by default to prevent the import of technical or temporary datasets that would be of little use (either because the data is unexploitable, or because it is duplicated data).
  • The selection process for the assets that should be imported into the catalog also benefits from some assistance – we strive to identify the most appropriate objects for integration in the catalog (with a variety of additional approaches to make this selection).

For more information on how Smart Data Inventorying enhances a Data Catalog, download our eBook: What is a Smart Data Catalog?”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What Makes a Data Catalog “Smart”? #1 – Metamodeling

Actian Corporation

February 16, 2022

smart-data-catalog-1-metamodeling

A data catalog harnesses enormous amounts of very diverse information, and its volume will grow exponentially. This will raise 2 major challenges:

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
  • How to find the most relevant datasets for any specific use case?

We think that a data catalog should be Smart to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect, we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

  1. Metamodeling
  2. The data inventory
  3. Metadata management
  4. The search engine
  5. User experience

A Universal and Static Metamodel Cannot be Smart

At an enterprise scale, the metadata required to harness in any meaningful way the informational assets can be considerable. And besides, metadata is specific to each organization, sometimes even amongst different populations within an organization. For example, a business analyst won’t necessarily seek the same information as an engineer or a product manager might.

Attempting to create a universal metamodel, therefore, does not seem very smart to us. Indeed, such a metamodel would have to adapt to a plethora of different situations, and will inevitably fall victim to one of the 3 pitfalls below:

  • Excessive simplicity which won’t cover all the use cases needed.
  • Excessive levels of abstraction with the potential to adapt to a number of contexts at the cost of arduous and time-consuming training – not an ideal situation for an enterprise-wide catalog deployment.
  • Levels of abstraction lacking depth and ultimately leading to a multiplicity of concrete concepts bourn out on a combination of notions emanating from a variety of different contexts – many of which will be useless in any specific context, rendering the metamodel needlessly complicated and potentially incomprehensible.

In our view, smart metamodeling should ensure a metamodel that adapts to any context and can be enriched as use cases or maturity levels develop over time.

The Organic Approach to a Metamodel

A metamodel is a field of knowledge and the formal structure of a knowledge model is referred to as an ontology.

An ontology defines a range of object classes, their attributes, and the relationships between them. In a universal model, the ontology is static – the classes, the attributes, and the relations are predefined, with varying levels of abstraction and complexity.

Actian Data Intelligence Platform chose not to rely on a static ontology but rather on a scalable knowledge graph.

The metamodel is therefore voluntarily simple at the start – there are only a handful of types, representing the different classes of information assets (data sources, datasets, fields, dashboards), each with a few essential attributes (name, description, contacts).

This metamodel is fed automatically by the technical metadata extracted from the datasources which vary depending on the technology in question (the technical metadata of a table in a data warehouse differs from the technical metadata of a file in a data lake).

This organic metamodeling is the smartest way to handle the ontology issue in a data catalog. Indeed, it offers several advantages:

  • The metamodel can adapt to each context, often relying on a pre-existing model, integrating the inhouse nomenclature and terminology without the need for a long and costly learning curve;
  • The metamodel does not need to be fully defined before using the data catalog – you will only need to focus on a few classes of objects and the few necessary attributes to cover the initial use cases. You can then load the model as catalog adoption progresses over time;
  • User feedback can be integrated progressively, improving catalog adoption, and as a result, ensuring return on investment for the metadata management.

Adding Functional Attributes to the Metamodel in Order to Facilitate Searching

There are considerable advantages to this metamodeling approach, but also one major inconvenience: since the metamodel is completely dynamic, it is difficult for the engine to understand the structure, and therefore difficult for it to help users feed the catalog and use the data (two core components of a Smart Data Catalog).

Part of the solution relates to the metamodel and the ontology attributes. Usually, metamodel attributes are defined by their technical types (date, number, chain of characters, list of values, etc.). With the Actian Data Intelligence Platform, these library types do include these technical types of course.

But they also include functional types – quality levels, confidentiality levels, personal touch, etc. These functional types enable the platform engine to better understand the ontology, refine the algorithms and adapt the representation of the information.

For more information on how Smart Metamodeling enhances a Data Catalog, download our eBook: What is a Smart Data Catalog?”.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

Data Democratization: Promise vs. Reality

Teresa Wingfield

February 10, 2022

data democratization lock and key

Enabling universal access to data can create opportunities to generate new revenue and drive operational efficiencies throughout an organization. Even more importantly, data democratization, as it’s known, is crucial to business transformation. For that reason, vendors have made a lot of promises about enabling data democratization—and not all have panned out. For instance, various vendors have touted data for the masses through self-service analytics for many years. The objective has been to make information accessible to non-technical users without requiring IT involvement. Vendors have focused their efforts on shielding users from underlying data complexities, making analytics tools easier to use, and expanding reach to users in any location throughout the world via the cloud.

However, even with simplified access to data, organizations still haven’t made the progress they would like to when it comes to democratizing data. While it has become more common for non-technical users to access data on their own, for the most part, they can only do so in certain situations. Barriers still stand in the way, making it difficult for users to access all the data they need for decision-making.

Here are the four top barriers to data democratization that organizations must overcome in 2022 to adopt new data platform approaches to help reduce cost and complexity.

1. Users Can’t Access Data in Silos

Organizations typically store data for analytics and decision-making in a centralized data warehouse or similar repository optimized for analytics. But that’s only a subset of all the data that might be useful. Much of it remains sequestered in disparate data silos that most users cannot access. To run the analytics they want and gain insights to inform new programs and processes, users need access to transactional databases, IoT databases, data lakes, streaming data, and more—data that may be spread across multiple data centers and multiple clouds. Several use cases come to mind, including automated personalized e-commerce offers, supply chain optimization, real-time quotes for insurance, credit approval and portfolio management.

2. Today’s Semantic Layers Aren’t Enough

A semantic layer is a business representation of data that helps users access data without IT assistance. Although semantic layers are great at shielding users from the underlying complexities of data, they are designed to represent the data in only one database at a time. Today’s users need a semantic layer that is more ubiquitous to connect to and interact with multiple data sources across multiple locations. As Gartner puts it, users need frictionless access to data—from any source located on-premises and in the cloud.

Data fabrics and data meshes are emerging data architecture designs that can make data more accessible, available, discoverable, and interoperable than a singularly-focused semantic layer can. A data fabric acts as a distributed semantic layer connecting multiple sources of data across multiple locations. A data mesh goes a step further, treating data as a product that is owned by teams who best understand the data and its uses.

3. Lack of Shared Services

Indirectly impacting data democratization is a lack of shared services. The absence of such services means that too much time and resources are spent on separate efforts to manage, maintain, and secure data, which leaves less time to focus on enabling data access and delivering business value to end users. Plus, inconsistencies in security, controls, upgrades, patches, and more—across multiple deployments—often result in time-consuming and costly consequences.

4. Weak Tool Support

The purpose of and value delivered by different types of analytical tools vary greatly, so different users—including data engineers, data scientists, business analysts, and business users—need different tools. Many data warehouse vendors, though, fail to provide flexible analytic and development tool integration, which limits the utility of the tools to users and limits the variety of use cases that a data warehouse can serve.

How to Progress Data Democratization Efforts

To overcome these data democratization challenges, organizations must ensure that business-critical systems can analyze, transact, and connect at their very best using the right tool for the right job. As we head into 2022, now is the time to consider if your data democratization platform is exceeding your expectations and fulfilling your business needs. Actian is leading the way with our data platform approach. The data platform must bring together a wide range of data processing and analytic capabilities that focus on easier access to data and less management overhead. As organizations tackle these challenges, they will be able to generate new revenue and drive operational efficiencies to truly transform their business.

This article was originally published on vmBlog.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.