Databases

Database Migration to the Cloud: Things You Should Consider

Teresa Wingfield

December 29, 2021

3d rendering robot learning or machine learning

Did you know that 63% of organizations are actively migrating data from on-premises databases to the cloud? So says an IDC survey, and another 29% are planning to start a database migration within 3 years.

Carl Olofson, IDC’s Vice President of Research specializing in data management software, recently participated as a guest speaker in an Actian webinar where he discussed survey results, including the wide-ranging benefits that companies are realizing from their database migration to the cloud. The benefits include higher availability, improved security, better scalability, and much more.

At the same time, he cautioned that moving mission-critical databases to the cloud is not a decision to be taken lightly and shared great advice on determining whether you should continue with the technology that you know or try something different. Then, if you are ready to try something different, there are other questions to ask. Key among them: whether the database that drives your enterprise will be well-supported in the cloud.

Here’s his list of requirements that you need to look for in a managed database cloud service:

  • Provides transparent maintenance of database software.
  • Ensures uninterrupted operation with even performance.
  • Enables easy spin-up of database instances (this is key for development projects).
  • Provides data security policies that align with your data security policies.
  • Grows and shrinks database resources as needed.
  • Database infrastructure maintained by experienced trained professionals who are experts in the database technology in question.
  • Provides the capability for dynamic failover and disaster recovery and may include multi-region failover support.

The webinar at which Carl Olofson spoke is available on demand. You can hear Carl discuss these requirements and other factors you should take into account before shifting to the cloud. If you’re an Ingres or OpenROAD customer, I would encourage you to stay to the webinar’s end when Emma McGrattan, Senior Vice President of Engineering at Actian, discusses Ingres NeXt, Actian’s strategy for shifting to the cloud with virtually no conversion pain.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Integration

Does FHIR Fully Address Healthcare Data Interoperability Silo Challenges?

Actian Corporation

December 9, 2021

healthcare data

Let’s face it, information withholding is common in many industries, but it’s been especially rampant in the healthcare space. Payers and Providers have avoided scrutiny for the cost of care delivered and the reimbursements they were collecting. Consequently, information blocking has become one of the key impediments in shifting from a fee-for-service (FFS) model to a more transparent value-based care (VBC) model. Of course, information blocking isn’t absolute or explicit. Instead, it manifests itself through infrastructural barriers such as the slow transition from paper-based records, manual processes, and the lingering need for one-off, point-to-point integrations between proprietary platforms. Sometimes, information blocking is a side effect of HIPAA and Meaningful Use, where well-meaning guidelines can prevent or discourage information sharing when the technology or process is not granular or flexible enough to enable sharing without sacrificing security and privacy.

Congress crafted the Cures Act of 2016 to address these barriers in information sharing and bring more transparency to healthcare costs and reimbursements. Further, the Office of National Coordination of Healthcare Information Technology (ONCHIT and now ONC) defined specific data sharing standards (interoperability), guidelines for use, and mandated implementation timelines for payers and providers to get their act together. The resulting interoperability standard, Fast Healthcare Interoperability Resources (FHIR, pronounced “Fire”), is the latest update to Health Layer 7 (HL7), HL7V4 in effect.  By 2024, the plan is to replace the current de-facto standard HL7v2 (HL7v3 did not see widespread implementation in the US) with FHIR. With over 95% of healthcare organizations using HL7v2, you may want to understand better how this may impact your healthcare business.

Why FHIR Over HL7v2?

Why bother?  HL7v2 has several drawbacks, the key ones as follows:

  • First and foremost, HL7v2 doesn’t provide a way for a human to view the contents within each message passed.
  • Secondly, healthcare messages can be extensive, consisting of several pieces of information that one may want to send or receive or view separately; HL7v2 doesn’t provide an easy way to do this.
  • Also, HL7v2 is limited to the last generation of standards for semi-structured data formatting and transfer, XML and SOAP.
  • And finally, HL7v2 didn’t start as a free open standard and didn’t become one until 2013, and most of the major early adopters’ versions predate this shift. The result is multiple implementations with backward compatibility and cross-vendor integration challenges.

FHIR solves for each of these HL7v2 deficiencies in the following ways:

  • FHIR is a free open standard from inception, designed for use by any developer – not just those in the healthcare space.
  • FHIR messages are based on XML, RDF, and JSON data formats, using RESTful APIs to expose the data as a set of web services.
  • Exactly what data is formatted is also standardized to adhere to the US Core Data Interoperability (USCDI) data format, which must be stored and downloadable in a Consolidated Clinical Document Architecture (CCDA). When it comes to regulated standards, adoption is dependent on providing an agreed-upon set of data in a format that all systems can read.
  • FHIR messages are readable by humans and are modular, enabling specific elements to be shared and the ability to eyeball whether you’ve received what you expected.

There are more extensive comparisons of FHIR and HL7v2 and detailed descriptions of USCDI and CCDA on the government’s HealthIT website. The key takeaway should be that the new FHIR standard provides far more simplicity, granularity, and standardization of what data should be shared and how it should be formatted to enable that sharing. While FHIR removes interoperability barriers, it does not guarantee open and accessible healthcare data sharing if you don’t address the other data integration challenges surrounding healthcare data silos. Without pairing it with the right changes to your data analytics strategy and implementations, silos will remain. By extension, it will only marginally bolster the shift from FFS to VBC.

Interoperability is Only the Starting Point for Integration

FHIR interoperability will undoubtedly make it easier for providers to share their clinical data amongst themselves and payers. It will also make incorporating external clinical data (such as state and federal population health data repositories) easier by exposing these external repositories as web services. Of course, this will require your IT teams to create more point-to-point connections. And, while you can reuse these point-to-point connections, there is a better way to optimize the use of FHIR: embed it in a data hub.

With data hub functionality, you can ingest, prepare, and combine complementary financial and operational data from provider and payer administrative systems with FHIR clinical data. Typically, payer data – and data sent to and from them – is in EDI formats such as X.12 8xx formats. Much of the social determinants of health or SDOH data is in other formats outside the healthcare arena. Data from FHIR resources must be unpacked and transformed into EDI or other formatted data and vice versa. Often, the combination of clinical, SDOH, and financial data is necessary, making point-to-point connections suboptimal.

Instead, a virtual, central location to connect all possible data is a more efficient and flexible way to deal with the variety and diversity of healthcare data. This is the concept behind having a data hub. Historically, healthcare data hubs have been very complex data integration platforms used by data engineers and other IT integration specialists. But, in the spirit of data democratization that the Cures Act and FHIR are promoting, the “to be” healthcare data hub of the future should include the ability to avoid hard-coding and one-off scripts. A drag and drop menu, no-code/low-code approach avoids overly taxing healthcare IT teams and allows analysts and other power-users with self-service access to the data. Further, healthcare data hubs should have built-in transformation templates since the USCDI and CCDA structures and EDI formats such as 837 are well defined. This removes one of the largest time sinks for IT and empowers non-IT super users to work with the data directly.

Increased Data Sharing Doesn’t Automatically Mean Higher Data Quality

Another issue that FHIR doesn’t address and, if anything, may further exacerbate the problem is data quality. Given the broader and hopefully more frequent data sharing, there will always be data conversion errors inherent in sending specific separate resources instead of monolithic records, as with HL7V2. Furthermore, the need for data transformations will remain and grow as more diverse and disparate sets of data are shared. A healthcare data hub must be fluent in FHIR, EDI X.12 8xx, legacy HL7 versions, and other standards to perform data ingestions, make the transformations between standards, and provide data quality functions such as deduplication and the ability to set up pattern recognition for common conversion errors. Finally, the standards for data formats within each FHIR resource may change over time. A healthcare data hub must be aware of changes to how specific data must be transformed.

All this data sharing between various providers and payers takes place between applications and data repositories. Point-to-point integrations represent the data connections between specific operational and business processes and support automated data analysis for and decision support for care delivery and knowledge workers. Automation of the data ingestion and egress to and from the various applications, web services, and data repositories, preparation, transformation, and unification is also a critical factor in successful data sharing. Again, FHIR facilitates – but doesn’t complete – the solution.

Conclusions

The Patient Protection and Affordable Care Act of 2009 was a good start to broadening access to healthcare for millions of citizens without resorting to a single-payer system. Further, the sections of the law that promoted Accountable Care Organizations focused on outcomes partially addressed the need – or at least acknowledged it – to shift away from a Fee for Service model. The Cures Act, ONC interpretation, and support of FHIR as a tool for data sharing will undoubtedly accelerate the shift to value-based care by creating a level playing field for data sharing. A more comprehensive data integration strategy is still required and should focus on combining and improving the quality of healthcare and non-healthcare data.

A comprehensive data integration strategy with a data hub as the most effective means of ensuring all relevant and necessary data brought together cannot answer questions about what needs adjusting in any given clinical, operational, or financial process or decision to improve outcomes. A cloud data warehouse and analytics tools need to be integrated with the data hub to analyze and extract the insights to drive the shift from FFS to VBC.  In my last blog, we described the Actian Healthcare Data Analytics Hub, a more efficient and effective way to extract insights for Value-Based Care.  In the next blog, we’ll look at some use cases where analytics drive the shift from FFS to VBC.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

The Top 5 Data Trends to Look Out for in 2022

Actian Corporation

December 6, 2021

top-5-data-trends-2022-zeenea

The role of data is central to corporate strategies, no matter the industry. However, the challenges related to data governance, management, exploitation, and valorization are constantly evolving. Companies must adopt the most promising data and analytics methods as soon as possible. This article will give an overview of the main data trends that will help you tackle 2022 in the best possible conditions.

For organizations, the year 2021 brought two major challenges: the need for economic recovery after the disruptions brought by the pandemic and the realization that data exploitation is essential for being more predictive than reactive.

Each year, the Gartner Institute publishes a list of technological and strategic trends likely to accelerate the growth and digital transformation of companies. For 2022, five of the twelve key trends revealed by Gartner concern data.

Trend 1 – Data Fabric: Flexibility and Agility

According to Gartner, the lever for optimal data exploitation is a Data Fabric. It is designed as a coherent environment for reconciling all types of data, from all types of sources, and can reduce data management efforts by nearly 70%.

Promising flexibility and agility, a Data Fabric avoids data siloing and simplifies its integration into the decision-making and strategic processes of companies. Data Fabric facilitates the use of data, even for non-technical users, and contributes to the development of your organization’s data culture.

Trend 2 – Cloud-Native Platforms: Scalability, Adaptability, Profitability

The second major data trend identified by Gartner is the unavoidable place of Cloud-native platforms in the data ecosystem. These platforms, which promise scalability and adaptability, are a response to both performance and cost control.

The institute points out that Cloud-native platforms, which exploit the basic capabilities of cloud computing to provide scalable and elastic IT capabilities “as a service”, are expected to form the basis of 95% of companies’ digital transformation projects by 2025, compared with 40% by 2021.

Trend 3 – Hyper Automation to Favor Human Intelligence

Faced with the acceleration of time-to-market and the need to return to strong and rapid economic growth, companies are looking for solutions that limit human intervention. The challenge is to win the race against time and refocus human intelligence on value-added tasks.

According to Gartner, hyper automation will be one of the key trends in 2022. This hyper-automation translates into a massive use of advanced technologies, including artificial intelligence and machine learning to automate processes and augment human capabilities.

In its report, Gartner states that “the most successful hyper-automation teams focus on three key priorities: improving the quality of work, accelerating business processes and increasing decision-making agility.”

Trend 4 – Business Intelligence for Performance

Business Intelligence is based on a set of technologies that enable real-time and granular analysis of data to clarify and facilitate decision-making. This discipline is developing beyond large accounts in the world of SMEs. It relies on a wide range of applications, solutions and methodologies combined to collect data from internal systems and external sources, in order to integrate it into decision-making processes.

According to Gartner’s Data 2022 trends report, “over the next two years, one-third of large organizations will use BI for structured decision-making to improve their competitive advantage”.

Trend 5 – AI Engineering: The Ultimate Lever for Growth

AI Engineering addresses the critical mission of automating data updates, models and applications to streamline the use of AI in data analysis. AI Engineering services create the data platforms to deliver operational AI solutions.

AI Engineering is the key lever for value generation by 2025, according to Gartner. The institute predicts that “the 10% of companies that will establish AI engineering best practices will generate at least three times more value than the 90% of companies that will not.”

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

An Efficient Permission Management System for a Data Catalog

Actian Corporation

December 2, 2021

permission-management-sets-zeenea-data-catalog

An organization’s data catalog enhances all available data assets by relying on two types of information – on the one hand, purely technical information that is automatically synchronized from their sources; and on the other hand, business information that comes from the work of Data Stewards. The latter is updated manually and thus brings its share of risks to the entire organization.

A permission management system is therefore essential to define and control the access rights of catalog users. In this article, we detail the fundamental characteristics and the possible approaches to building an efficient permission management system, as well as the solution implemented by the Actian Data Intelligence Platform Data Catalog.

Permission Management System: An Essential Tool for the Entire Organization

For data catalog users to trust in the information they are viewing, it is essential that the documentation of cataloged objects is relevant, of high quality, and, above all, reliable. Your users must be able to easily find, understand, and use the data assets at their disposal.

The Origin of Catalog Information and Automation

A data catalog generally integrates two types of information. On the one hand, there is purely technical information that comes directly from the data source. This information is synchronized in a completely automated and continuous way between the data catalog and each data source to guarantee its veracity and freshness. On the other hand, the catalog contains all the business or organizational documentation, which comes from the work of the Data Stewards. This information cannot be automated; it is updated manually by the company’s data management teams.

A Permission Management System is a Prerequisite for Using a Data Catalog

To manage this second category of information, the catalog must include access and input control mechanisms. Indeed, it is not desirable that any user of your organization’s data catalog can create, edit, import, export or even delete information without having been given prior authorization. A user-based permission management system is therefore a prerequisite; it plays the role of a security guard for the access rights of users.

The 3 Fundamental Characteristics of a Data Catalog’s Permission Management System

The implementation of an enterprise-wide permission management system is subject to a number of expectations that must be taken into account in its design. Among them, we have chosen in this article to focus on three fundamental characteristics of a permission management system: its level of granularity and flexibility, its readability and auditability, and its ease of administration.

Granularity and Flexibility

First of all, a permission management system must have the right level of granularity and flexibility. Some actions should be available to the entire catalog for ease of use. Other actions should be restricted to certain parts of the catalog only. Some users will have global rights related to all objects in the catalog, while others will be limited to editing only the perimeter that has been assigned to them. The permission management system must therefore allow for this range of possibilities, from global permission to the fineness of an object in the catalog.

Our clients are of all sizes, with very heterogeneous levels of maturity regarding data governance. Some are start-ups, others are large companies. Some have a data culture that is already well integrated into their processes, while others are only at the beginning of their data acculturation process. The permission management system must therefore be flexible enough to adapt to all types of organizations.

Readability and Auditability

Second, a permission management system must be readable and easy to follow. During an audit, or a review of the system’s permission, an administrator who explores an object must be able to quickly determine who has the ability to modify it. Conversely, when an administrator looks at the details of a user’s permission set, they must quickly be able to determine the scope that is assigned to that user and their authorized actions on it.

This simply ensures that the right people have access to the right perimeters, and have the right level of permission for their role in the company.

Have you ever found yourself faced with a permission system that was so complex that it was impossible to understand why a user was allowed to access information? Or on the contrary was unable to do so?

Simplicity of Administration

Finally, a permission management system must be resilient in facing the increasing catalog volume. We know today that we live in a world of data: 2.5 exabytes of data were generated per day in 2020 and it is estimated that 463 exabytes of data will be generated per day in 2025. New projects, new products, new uses: companies must deal with the explosion of their data assets on a daily basis.

To remain relevant, a data catalog must evolve with the company’s data. The permission management system must therefore be resilient to changes in content or even to the movement of employees within the organization.

Different Approaches to Designing a Data Catalog Permission Management System

There are different approaches to designing a data catalog permission management system, which more or less meet the main characteristics expected and mentioned above. We have chosen to detail three of them in this article.

Crowdsourcing

First, the crowdsourcing approach – where the collective is trusted to self-correct. A handful of administrators can moderate the content and all users can contribute to the documentation. An auditing system usually completes the system to make sure that no information is lost by mistake or malice. In this case, there is no control before documenting, but a collective correction afterwards. This is typically the system chosen by online encyclopedias such as Wikipedia. These systems depend on the number of contributors and their own knowledge to work well, as self-correction can only be effective through the collective.

This system perfectly meets the need for readability – all users have the same level of rights, so there is no question about the access control of each user. It is also simple to administer – any new user has the same level of rights as everyone else, and any new object in the data catalog is accessible to everyone. On the other hand, there is no way to manage the granularity of rights. Everyone can do and see everything.

Permission Attached to the User

The second approach to designing the permission management system is using solutions where the scope is attached to the user’s profile. When a user is created in the data catalog, the administrators assign a perimeter that defines the resources that they will be able to see and modify. In this case, all controls are done upstream and a user cannot access a resource inadvertently. This is the type of system used by an OS such as Windows for example.

This system has the advantage of being very secure, there is no risk that a new resource will be visible or modifiable by people who do not have the right to do so. This approach also meets the need for readability: for each user, all the accessible resources are easy to find. The expected level of granularity is also good, since it is possible to allocate the data system resource by resource.

On the other hand, administration is more complex – each time a new resource is added to the catalog, it must be added to the perimeters of the said users. It is possible to overcome this limitation by creating dynamic scopes. To do this, you can define rules that assign resources to users, for example all PDF files will be accessible to so-and-so. But contradictory rules can easily appear, complicating the readability of the system.

Permission Attached to the Resource

The last major approach to designing a data catalog’s permission management system is to use solutions where the authorized actions are attached to the resource to be modified. For each resource, the possible permissions are defined user by user. Thus it is the resource that has its own permission set. By looking at the resource, it is then possible to know immediately who can view or edit it. This is for example the type of system of a UNIX-like OS.

The need for readability is perfectly fulfilled – an administrator can immediately see the permissions of different users when viewing the resource. The same goes for the need for granularity – this approach allows permissions to be given at the most macro level through an inheritance system, or at the most micro level directly on the resource. Finally, in terms of ease of administration, it is necessary to attach each new user to the various resources, which is potentially tedious. However, there are group systems that can mitigate this complexity.

The Data Catalog Permission Management Model: Simple, Readable and Flexible

Among these approaches, let’s detail the one chosen by the Actian Data Intelligence Platform and how it is applied.

The Resource Approach was Preferred

Let’s summarize the various advantages and disadvantages of each of the approaches discussed above. In both resource and user-level permission management systems, the need for granularity is well addressed – these systems allow for resource-by-resource permission to be assigned. In contrast, in the case of crowdsourcing, the basic philosophy is that anyone can access anything. Readability is clearly better in crowdsourcing systems or in systems where permissions are attached to the resource. It remains adequate in systems where permissions are attached to the user, but often at the expense of simplicity of administration. Finally, the simplicity of administration is very much optimized for the crowdsourcing approach and depends on what you are going to modify the most – the resource or the users.

Since the need for granularity is not met in the crowdsourcing approach, we eliminated it. We were then left with two options: resource-based permission or user-based permission models. Since the readability is a bit better with resource-based permission, and since the content of the catalog will evolve faster than the number of users, the user-based permission option seemed the least relevant.

The option we have chosen at the Actian Data Intelligence Platform was therefore the third one: user permissions are attached to the resource.

How the Data Catalog Permission Management System Works

In the Actian Data Intelligence Platform Data Catalog, it is possible to define for each user if they have the right to manipulate the objects of the whole catalog, one or several types of objects, or only those of their perimeter. This allows for the finest granularity, but also for more global roles. For example, “super-stewards” could have permission to act on entire parts of the catalog, such as the glossary.

We then associate a list of Curators with each object in the catalog, i.e., those responsible for documenting that object. Thus, simply by exploring the details of the object, one can immediately know who to contact to correct or complete the documentation, or to answer a question about it. The system is therefore readable and easy to understand. The users’ scope of action is precisely determined through a granular system, right down to the object in the catalog.

When a new user is added to the catalog, it is then necessary to define its scope of actions. For the moment, this configuration is done through the bulk editing of objects. In order to simplify management even further, it will soon be possible to define specific groups of users, so that when a new collaborator arrives there is no longer any need to add them by name to each object in their scope. Instead, they simply need to be added to the group, and their scope will be automatically assigned to them.

Finally, we have voluntarily chosen not to implement a documentation validation workflow in the catalog. We believe that team accountability is one of the keys to the success of a data catalog adoption. This is why the only control we put in place is the one that determines the user’s rights and scope. Once these two elements have been determined, the people responsible for the documentation are free to act. The system is completed with an event log on modifications to allow complete auditability, as well as a discussion system on the objects. It allows everyone to suggest changes or report errors on the documentation.

If you would like to learn more about our permission management model, or get more information about our Data Catalog.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Developing Your Data DNA

Traci Curran

November 29, 2021

data quality

In a world where everything is changing too quickly, the evolution towards agile, interoperable data services is a welcome change. Data is now able to be delivered as a service, without the need for costly investments in data centers and the resources needed to manage them. As more companies embrace the cloud, data integration and data quality need to be a more important consideration.

As a result, organizations are focusing on delivering products and services at a faster pace, and to achieve this, operational analytics is more critical than ever. Today, organizations are reliant on using their data, along with external data, to make better decisions.

And just as the cloud alleviated the expense and expertise needed to manage infrastructure, data is also seeing accelerated value from the cloud. Data lakes and cloud data warehouses make it affordable and easy to store and use all your data. So why are companies still struggling to maximize their data potential?

It’s probably due to one of these 3 culprits:

  1. You Failed to Alight Stakeholders and Create a Data-Driven Culture. This is, by far, the primary reason why most data projects fail. In fact, according to a 2021 survey of Fortune 1000 companies “executives report that cultural challenges – not technology challenges – represent the biggest impediment to successful adoption of data initiatives and the biggest barrier to realizing business outcomes.” For any data project to succeed, there needs to be strong leadership at the top of the organization and a data culture that permeates throughout the organization.
  2. Your Data is – Literally – Everywhere. I’m sure I’m not telling you anything new, but it really can’t be overstated – your data is living in places you don’t know about. It’s in third-party systems, spreadsheets on personal devices, and in public online repositories. It’s also in legacy systems which can pose a significant challenge since these are often proprietary and not always the most cooperative when you need to retrieve data regularly. These older systems are often also considered mission-critical, so if you don’t create a data-driven culture, there may be resistance from application owners. As you put together your rockstar team of stakeholders, this is a good time to audit the systems in use by every department. This leads me to my last point on what is limiting data insights….
  3.  Your Data Quality Sucks. While it stands to reason that your data isn’t going to be perfect, it should be as accurate and consistent as possible to drive better business decisions. At a minimum, data quality requires:
    • Discovery and Profiling. Know where your data lives and what it does. Understand the accuracy and completeness of your data and use that as a baseline. Data quality is like laundry, it never ends.
    • Standard, Conformant and Clean Data. Once you’ve done the work to understand your data, it’s important to define what “good” looks like and create rules that maintain that definition going forward. If you have a team that is focused on this today, understanding what those rules are and why they exist is a critical component of a successful data project.
    • Deduplicated Data. While no one wants to forecast revenue twice, with many databases and storage residing in the cloud, duplicate data can cause more than incorrect reports. Cloud costs can easily spiral if you’re storing and analyzing duplicate data.

Today, more organizations than ever are facing the challenge of increasing data and technological complexity, but few are seeing a significant return.  To thrive in the digital era, organizations must embrace new thinking. Infusing data obsession into the corporate DNA will allow data to start driving better decisions and better results. Check out how Actian’s DataConnect integration platform can help with your data quality goals.

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Architecture

Hybrid Cloud Data Warehouses: Don’t Get Stuck on One Side or the Other

Teresa Wingfield

November 26, 2021

hybrid cloud technology concept with data warehouses

Hybrid clouds seem to be the way the wind is blowing. According to the Enterprise Strategy Group, the number of organizations committed to or interested in a hybrid cloud strategy has increased from 81% in 2017 to 93% in 2020. But what exactly is a hybrid cloud? Turns out, there are a lot of definitions. I’ll share a definition from Deloitte that I like:

“Hybrid cloud is cloud your way. It’s integrating information systems—from on-premises core systems to private cloud, public cloud, and edge environments—to maximize your IT capabilities and achieve better business outcomes. It’s designing, building, and accelerating your possible.”

Why Hybrid Cloud?

Data warehouse deployments on-premises and in the public cloud play equally important roles in a hybrid cloud strategy. The Enterprise Strategy Group found that 89% of organizations still expect to have a meaningful on-premises footprint in three years. At the same time, Gartner predicts that public cloud services will be essential for 90% of data and analytics innovation by 2022. Accordingly, organizations are adopting a hybrid cloud strategy to leverage the right mix of locations to meet their needs.

Consider: The cloud provides the flexibility to build out and modify services in an agile manner, the potential to scale almost infinitely, the assurance of enhanced business continuity, and the ability to avoid capital expenditures (CapEx)—all of which continue to accelerate the adoption of cloud-based data warehouses. But data warehouses running on-premises in your own data center deliver their advantages:

  • Data Gravity: Sometimes, data is hard to move to public clouds since there’s so much of it and/or the data has interdependencies with other systems and databases.
  • More Control Over Governance and Regulatory Compliance: You know where and under what geographic or other restrictions your data is operating.
  • More Control Over Deployment Infrastructure: You may want to use hardware, operating systems, databases, applications, and tools you already have.
  • Avoiding High Operational Expenditure (OpEx): Consumption-based pricing models in public clouds can lead to high OpEx when usage is frequent – particularly if that data is fluid, moving between public clouds and on-premises locations. 

Hybrid Cloud Evaluation Criteria

To get optimal benefits from a hybrid cloud data warehouse, though, you’ll need a solution that can drive better business outcomes while using fewer resources. For starters, you’ll want a single-solution architecture that can operate in both public and on-premises environments. Solutions from many data warehouse vendors either don’t do this well or don’t do this at all. Many vendor’s data warehouse solution runs in the public cloud or on-premises, and their “hybrid” versions have been cobbled together to meet the increase in demand. However, without the same data and integration services on-premises and in the cloud, the same data model, the same identity, and the same security and management systems, these solutions effectively saddle you with two siloed data warehouse deployments.

Why are common data and integration services, the same data model, the same identity, and the same security and management systems important? Let me tell you:

Same Data Services

It is essential that your data warehouse supports the same data services for public cloud and on-premises data warehouses. Without this, you will wind up with data redundancy and data consistency issues, duplications of effort and resources (human and technical), increased costs, and an inability to provide seamless access across environments.

Same Data Model

A data model determines how data is organized, stored, processed, and presented. Having a single data model for the on-premises and cloud-based sides of your data warehouse eliminates incongruencies across source systems. It also strengthens data governance by ensuring that data is created and maintained in accordance with company standards, policies, and business rules. As data is transformed within the data warehouse—on-premises or in the cloud—it continues to adhere to data definitions and integrity constraints defined in the data model.

Same Identity Authentication

Your users should be able to sign on to on-premises and cloud data warehouses using the same login ID and password. Data warehouse support for single sign-on (SSO) access helps eliminate password fatigue for users and, more importantly, can help you ensure that your organization’s identity policies are extended to protect both data warehouse locations.

Same Security and Management Services

Shared security services are also critical for hybrid cloud data warehouses. I’ve already written two blog posts that provide details on security, governance, and privacy requirements for the modern data warehouse, one on database and one on cloud service security, so I will refer you to those for more details. But I would like to point out in this discussion that you will need integrated security services across your on-premises and public cloud environments to ensure a strong and consistent security posture for your hybrid data warehouse.

Finally, shared services for management tasks offer clear advantages in terms of cost, control, and simplicity:

    • You’ll need fewer staff members to develop, maintain, and monitor the components in your hybrid deployment.
    • You’ll improve control through consistent upgrades, patches, and backups.
    • You’ll simplify metering and licensing requirements across environments.

Actian Data Platform

It should come as no surprise that a data warehouse that meets all these criteria does exist: the Actian Data Platform can be deployed on-premises as well as in multiple clouds, including AWS, Azure, and Google Cloud. You can read more about the Actian solution here.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Management

Bloor Spotlight Highlights How Actian’s Ingres NeXt Avoids Pitfalls

Teresa Wingfield

November 22, 2021

avoid modernization pitfalls with Actian

Digital transformation requires use of the latest technologies. However, as you probably already know, modernizing a mission-critical database and the applications that interact with it can be risky and expensive, often turning into a long disruptive journey. But I have good news! According to a recent Bloor Spotlight report, Actian’s Ingres NeXt strategy for modernizing Ingres and OpenROAD applications either avoids or proactively addresses these potential pain points.

Bloor Senior Analyst Daniel Howard comments:

Ingres NeXt is worth paying attention to because it acknowledges both the massive need and desire for digital transformation and modernization as well as the difficulties and shortcomings of conventional approaches to them, then takes steps to provide the former while mitigating the latter.”

Let’s look at the top four obstacles that stand in the way of modernization:

It’s Risky

Less than half of modernization projects are successful. Complex dependencies among databases, applications, operating systems, hardware, data sources, and other structures increase the likelihood that something will go wrong. In addition, organizations are likely to make poor decisions at some point since there are few modernization best practices to guide the way.

It’s Expensive

Modernization typically requires Capital Expenditure (CapEx) justification. Although modernization can potentially save money and increase revenue in the long run, it can be difficult to prove that this will significantly outweigh the costs of maintaining your legacy systems over time. It can also be challenging to get a modernization initiative approved as part of an innovation budget. Innovation budgets are often quite small. According to Deloitte’s analysis, the average IT department invests more than half of its technology budget on maintaining business operations and only 19% on building innovative new capabilities.

It’s a Long Journey

Modernization can involve replacing thousands of hours’ worth of custom-developed business logic. Code may be stable, but it is perceived as brittle if it cannot be changed without great pain. Missing documentation, third-party applications, and libraries that are often no longer available can add time and complexity to a modernization project. Plus, many developers are simply unaware of conversion tools for updating “green screen” ABF applications and creating web and mobile versions.

It’s Disruptive

Mission-critical databases and applications require near 100% availability, so modernization requires careful planning and execution. Plus, technical staff and business users will need to be retrained and upskilled to make the most of new technologies.

How Exactly Does Ingres NeXt Avoid or Address These Pain Points?

The report discusses how automated migration utilities, asset reuse, and a high degree of flexibility and customization—among other things—result in a solution that can streamline your organization’s path to a modern data infrastructure.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Intelligence

Data Sampling: Create Subsets for a More Fluid Data Analysis

Actian Corporation

November 21, 2021

data-sampling-article-zeenea

Your data culture is growing! But if the amount of data at your disposal is exploding, then you may find it difficult to handle these colossal volumes of information. From then on, you will have to work based on a sample that is as representative as possible. This is where Data Sampling comes in.

 As the range of your data expands and your data assets become more massive, you may one day be faced with a volume of data that will make it impossible for your query to succeed. The reason: insufficient memory and computing processing. A paradox when all the efforts made up to now have been to guarantee excellence in the collection of voluminous data.

But don’t be discouraged. At this point, you will need to resort to Data Sampling. Data Sampling is a statistical analysis technique used to select, manipulate, and analyze a representative subset of data points. This technique allows you to identify patterns and trends in the larger data set.

Data Sampling: How it Works

Data Sampling enables data scientists, predictive modelers, and other data analysts to work with a small, manageable amount of data on a statistical population.

The goal: to build and run analytical models faster while producing accurate results. The principle: refocus analyses on a smaller sample to be more agile, fast, and efficient in processing queries.

The subtlety of data sampling lies in the representativeness of the sample. Indeed, it is essential to apply the most suitable method to reduce the volume of data to be taken into consideration in the analysis without degrading the relevance of the results obtained.

Sampling is a method that will allow you to obtain information based on the statistics of a subset of the population without having to investigate each individual. Because it allows you to work on subsets, Data Sampling saves you valuable time because it does not analyze the entire volume of data available. This time saving translates into cost savings and, therefore, a faster ROI.

Finally, thanks to Data Sampling, you make your data project more agile, and can then consider a more frequent recourse to the analysis of your data.

The Different Methods of Data Sampling

The first step in the sampling process is to clearly define the target population. There are two main types of sampling: probability sampling and non-probability sampling.

Probability sampling is based on the principle that each element of the data population has an equal chance of being selected. This results in a high degree of representativeness of the population. On the other hand, data scientists can opt for non-probability sampling. In this case, some data points will have a better chance of being included in the sample than others. Within these two main families, there are different types of sampling.

Among the most common techniques in the probability method, simple random sampling is one example. In this case, each individual is chosen at random, and each member of the population or group has an equal chance of being selected.

With systematic sampling, on the other hand, the first individual is selected at random, while the others are selected using a fixed sampling interval. Therefore, a sample is created by defining an interval that derives the data from the larger data population.

Stratified sampling consists of dividing the elements of the data population into different subgroups (called strata), linked by similarities or common factors. The major advantage of this method is that it is very precise with respect to the object of study.

Finally, the last type of probability sampling is cluster sampling, which divides a large set of data into groups or sections according to a determining factor, such as a geographic indicator.

In all cases, whether you choose probabilistic or non-probabilistic methods, keep in mind that in order to achieve its full potential, data sampling must be based on sufficiently large samples! The larger the sample size, the more accurate your inference about the population would be. So, ready to get started?

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Databases

Vector 6.2 Delivers Even Faster and More Secure Analytics

Teresa Wingfield

November 17, 2021

analytics database

Actian Vector is a vectorized columnar analytics database designed to deliver extreme performance with a high level of security. But who wouldn’t want analytics that are even faster and more secure? For those who do, the new Vector 6.2 release available on November 10, 2021, is a winner, continuing to push the limits of what is possible in operational data warehousing. Here’s a summary of just a few of Vector’s most important new features.

Vector 6.2 is Faster

Vector has long been the industry’s fastest analytics database. It’s designed for speed and efficiency using column-based storage and vector processing. With the incorporation of query result caching and queue-based workload management in the 6.2 release, Vector just got another performance boost.

Query Result Caching

If you’ve previously run a query and the data hasn’t changed since the last run, Vector doesn’t need to run the full query again. Instead, it leverages query result caching to retrieve the previous result instantaneously from the cache. This substantially reduces the time and system resources required to obtain the insights you seek.

Queue-Based Workload Management

Queue-based workload management dynamically adjusts workload queues based on available resources and resource quotas. Key benefits include:

  • Prevents system overload and workload starvation by limiting resource usage per the database administrator’s configuration.
  • Allows system administrators to quickly run high-priority reports even on a fully loaded system.
  • Enables the user to have very small queries answered independently of system load.
  • Manages usage by prioritizing resources appropriately across people, groups, and applications, even enabling specific profiles to run at different priorities during specified time slots.

Vector 6.2 is More Secure

Re-Keying Encryption

Previous releases of Vector have enabled organizations to maintain tight security for sensitive data, with support for encryption at rest and in transit as well as dynamic data masking of fields containing personally identifiable information and other sensitive data. Vector 6.2 extends security further, providing the ability to re-key the database with new encryption keys as recommended by NIST guidelines. This feature is valuable since it enables an enterprise to limit the amount of time a bad actor can use a stolen key to access your data.

Secure User-Defined Functions

This new release enhances security by executing Python, JavaScript and NumPY UDFs within a container that is sandboxed from the rest of the database. The NumPY UDF support is new and enables the vector-based execution of numeric data.   

Learn More About Vector

Visit our website to learn more about analytics database Vector’s extensive performance optimization, features, and use cases.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.