Data Platform

Actian Data Platform on Google Cloud: For the Data-Driven Enterprise

Actian Corporation

April 27, 2021

hybrid cloud technology concept with data warehouses

Today, Actian is excited to announce the delivery of Actian Data Platform on Google Cloud Marketplace. Actian is initially available in the us-east1 region, which will be expanded to include US West, German, and Irish cloud regions in the coming months. If you are interested in trying out Actian Data Platform on Google Cloud, please check out our listing here.

Actian is the highest-performance warehouse available on Google Cloud and beats the alternative on price performance by a factor of 8 – 12x. Our price-performance advantage takes advantage of the fact that Google Cloud has the best backhaul and high-bandwidth networking infrastructure of any cloud provider, ensuring lightning-fast data access even across regions. Leveraging this superior networking and storage capabilities in Google Cloud, Actian delivers 20% superior query performance compared to other cloud providers.

We have been collaborating closely with Google Cloud’s architects and storage teams over the past few months. From the standpoints of ease of use, integrated connectivity, and real-time decision-making, this collaboration ensures that Actian will deliver the best cloud data warehouse experience available—with the lowest TCO among all cloud data warehouses. Organizations in the midst of digital transformation can gain access to the real-time insights needed to make the operational analytics decisions that competitive advantage demands.

Actian is deployed in the form of containers and microservices on the latest compute nodes leveraging Google Kubernetes Engine (GKE), and Google Cloud Storage (GCS).  Actian supports agile data processing by enabling the rapid ingestion of data into the data warehouse. It can also be used to query data stored in Google data lakes via external tables.

Multi-Cloud and Hybrid Deployment

Because Actian can be used in a multi-cloud configuration, organizations can configure Actian to operate across multiple cloud providers. This enables organizations to finally realize the true potential of hybrid by bringing the compute power of Actian to the place where their data resides. In addition, Actian can also be deployed on-premises, allowing organizations to leverage the same database engine, the same physical data model, the same ETL/ELT tools, and the same BI tools of your choosing both in the cloud and on-prem.

Built for Google Cloud

Actian is well integrated into the Google ecosystem. Looker has been a long-time visualization partner of Actian, and now as part of Google Cloud, provides a great option for customers looking for a native BI tool on Google Cloud. We have also preconfigured connectors to pull data from Google Cloud Storage and DataProc into Actian. These complement a slew of over 200 built-in data and application sources that Actian can pull data from.

Over the next few months, we will be adding further integrations with Google tools like Google KubeFlow, and Google DataFusion.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What is the BCBS 239?

Actian Corporation

April 26, 2021

bcbs-239-blog-zeenea

In order for banks to have complete visibility on their risk exposure, the Basel Committee defined 14 key principles that were made into a standard called BCBS 239.

Its objective? Give banks access to reliable and consolidated data. Let’s get into it.

In 2007, the world economy was teetering on the brink of collapse. Many supposedly stable banking institutions were on the edge of bankruptcy following the failure of the American bank Lehman Brothers. In response to a crisis of unprecedented violence, a wind of regulation blew over the world, giving birth to the BCBS 239, also known as the Basel Committee on Banking Supervision’s standard number 239.

Published in 2013, the Basel Committee’s standard number 239, was intended to create conditions for transparency in banking institutions by defining a clear framework for the aggregation of financial risk data. In practice, its objective is to enable financial and banking institutions to produce precise reports on the risks to which they are exposed. BCBS 239 is a binding framework but contributes to the stability of the global financial system, which was severely tested during the 2007 financial crisis.

BCBS 239: A Little History

The Basel Committee was created in 1974 at the instigation of the G10 central bank governors. As of 2009, the organization has 27 member countries and is dedicated to strengthening the safety and soundness of the financial system and establishing standards for prudential supervision.

BCBS 239 is one of the Basel Committee’s most emblematic standards because it is a barrier to the abuses that led to the 2007 crisis. 

Indeed, the growth and diversification of the activities of banking institutions, as well as the multiplication of subsidiaries within the same group, created a certain opacity that generated inaccuracies in the banks’ reporting.

Inaccuracies that could, once accumulated, represent billions of dollars of vagueness, hindering quick and reliable decision-making by managers. The critical size reached by financial institutions required to guarantee reliable decision making based on consolidated and quality data. This is the very purpose of BCBS 239.

The 14 Founding Principles of BCBS 239

Although BCBS 239 was published in 2013, the thirty or so G-SIBs (globally systemically important institutions) that had to comply had until January 1, 2016 to do so. The national systemically important banking institutions (also called D-SIBs) had three more years to comply.

Since January 1, 2019, G-SIBs and D-SIBs must therefore comply with the 14 principles set out in BCBS 239. 

Eleven of them concern banking institutions in the first place. The other three are addressed to supervisory authorities. The 14 principles of BCBS 239 can be classified into four categories: governance and infrastructure, risk data aggregation capabilities, reporting capabilities and prudential supervision. 

Governance and Infrastructure

In the area of governance and infrastructure, there are two principles. The first is the deployment of a data quality governance system to improve financial communication and the production of more accurate and relevant reports in order to speed up and make decision-making processes more reliable. 

Risk Data Aggregation Capabilities

The second principle affects the IT infrastructure and requires banks to put in place a data architecture that enables the automation and reliability of the data aggregation chain.

The section on risk data integration capabilities brings together four key principles: data accuracy and integrity, completeness, timeliness and adaptability.

Four pillars that enable decisions to be based on tangible, reliable and up-to-date information.

Reporting Capabilities

The third component of BCBS 239 concerns the improvement of risk reporting practices.

This is an important part of the standard, which brings together five principles: the accuracy and precision of information, the completeness of information relating to the risks incurred in order to guarantee real and sincere visibility of the institution’s exposure to risks, but also the clarity and usefulness of reporting, the frequency of updating and the sincerity of distribution.

These reports must be transmitted to the persons concerned.

Supervision

The last three principles apply to the control and supervisory authorities. They set out the conditions for monitoring banks’ compliance with the first 11 principles. They also provide for the implementation of corrective actions and prudential measures and set the framework for cooperation with supervisory authorities. 

Thanks to BCBS 239, data becomes one of the levers of stability in a globalized economy.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Governance Framework | S01-E02 – Data Strategy

Actian Corporation

April 23, 2021

BECOMING DATA DRIVEN S1E2 Blog

This is the second episode of our series “The Effective Data Governance Framework”.

Split into three seasons, this first part will focus on Alignment: understanding the context, finding the right people, and preparing an action plan in your data-driven journey. 

This second episode will give you the keys to putting in place an efficient enterprise data strategy through the setting up of Objective Key Results. 

Season 1: Alignment

Evaluate your Data maturity

Specify your Data strategy

Getting sponsors

Build a SWOT analysis

Season 2: Adapting

Organize your Data Office

Organize your Data Community

Creating Data Awareness

Season 3: Implementing Metadata Management With a Data Catalog

The importance of metadata

6 weeks to start your data governance journey

In our previous episode, we addressed the Data Maturity of your company through different angles.

In the form of a workshop, we shared our Data Governance Maturity Audit which enables you, through the Kiviat Diagram,  to establish your starting point.

In this episode, we help you define your company Data Strategy effectively.

What is the First Step in Defining Your Data Strategy?

We recommend that you use the OKRs (Objective Key Results) framework to build your data strategy efficiently.

Before stepping into the topic itself, let’s delve into what OKRs mean, how they are built and then share some useful tips with you.

What Exactly are OKRs?

Here, an “Objective” is something which you want to achieve (and) that is aspirational for all employees. A “Key Result” is how you plan to measure quantitatively.

We recommend you limit to 3 the number of Key Results per Objective.

There are many benefits to putting in place enterprise-wide OKRs. Their 5 key benefits are:

  • More focus.
  • More accountability.
  • More engagement.
  • Better alignment.
  • More transparency.

In the Effective Data Governance framework, OKRs are cascaded, resulting in Key Results from the Executives involved in the Data Strategy to individuals involved from an operational perspective. Whilst the Actian Data Intelligence Platform believes in a “bottom-up” approach, the OKR setting exercise is a “top-down” approach.

It is very important that, at each level, any one individual is able to understand the OKRs at the upper levels and how his or her OKRs contribute to the overall company Data Strategy.

We recommend you set up a reasonable deadline for each OKR. By proceeding this way, all deducted OKRs will be consistent with the deadlines from the highest levels. We also recommend you constantly share, display and explain the OKR Map to all the stakeholders.

You Will Ensure Engagement, Alignment and Transparency.

We suggest you negotiate the OKRs, especially their deadlines, rather than imposing them.

An Example of Setting up OKRs in Your Company

You can start with CEO OKRs on the Data Strategy if he/she is involved. At the highest level, one OKR will result in one dedicated OKR map.

On the lower levels, you can have several key results per team or employee.

For example, let’s take a CEO with 3 OKRs that impact the Data Strategy as shown below:

Then, working from the top level OKRs, you will be able to deduce the OKRs for CXOs and Top Executives like the Chief Data Officer, the Chief Information Officer, the Chief Product Officer, the VP of Sales, and so on.

For each Executive, there will be OKRs assigned to those reporting directly to them (such as heads of Analytics, heads of IT Architecture, heads of HR, etc), followed by OKRs for Teams (data governance data/IT architecture, analytics, business intelligence, data science, etc.) and finally, OKRs carried out by individuals, as shown.

Now take the OKR1 from the CEO, which relates to increasing online sales by 2% by 30/06/2021.

This OKR map shows the cascade of related OKRs carried out by C Levels and executives, teams and individuals resulting from the CEO OKR1.

As you can see in the OKR map above, we take into account the deadlines at all levels, resulting in a monthly overview of individual OKRs.

As an example, as described, The CEO OKR1 generates OKR1 for the CDO which consists of the following:

  • Objective: Have the data catalog ready for the Data Lake
  • Key Result: Have 100% of their data assets coming from the Data Lake governed
  • Deadline: March 30th, 2021

And for the level below, a data steward carries the following OKR1

  • Objective: Have all of the data assets from the Data Lake documented
  • Key Result: Have100% of the data assets available for the analytics teams
  • Deadline: March 30th, 2021

Tips on How to Best Set up Your OKRs in the Long Run

We recommend you follow OKRs every quarter for the levels 1 and 2, and then more frequently at the team and individual levels.

Any change in the deadlines may have an impact at a higher level. Rather than impacting the chain of OKRs, we suggest adapting the impact of an OKR by reducing its scope as an MVP as much as possible in order to keep the pace.

Some other tips include:

  • Select one OKR at the CEO (or a lower) level and practice before generalizing the OKRs practice,
  • Consider the OKR practice as an OKR in itself and monitor it,
  • Appoint one person in charge of the implementation of the OKRs to make sure that the team follows the agreed upon OKRs practices. That person will coach the team on the OKR processes and will administer the OKR tools (you can find some here).

The Actian Data Intelligence Platform Customer Success Team and Professional Services will help you initialize the OKR Map best suited to your Data Strategy. You will benefit from our expertise in data-related topics, especially in data governance/cataloging.

Typically, a Data Governance project, in which the platform is involved, may generate between 2 to 10 workshops (the duration of each workshop varies between 2 hours to half a day) in order to draft and initiate the Corporate Data Strategy for the first 3 to 6 months.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Governance Framework | S01-E01 – Evaluate Your Maturity

Actian Corporation

April 14, 2021

BECOMING DATA DRIVEN S1E1 Blog

This is the first episode of our series “The Effective Data Governance Framework”. Split into three seasons, this first part will focus on Alignment: understanding the context, finding the right people, and preparing an action plan for your data-driven journey.  Our first episode will give you the keys on how to evaluate the maturity of your company’s data strategy for you to visualize where your efforts should lie in your data governance implementation.

Data is the Petrol of the 21st Century

With GAFA paving the way (Google, Apple, Facebook, and Amazon), data has, in recent years, become a crucial enterprise asset and has taken a substantial place in the minds of key data and business people alike.

The importance of data has been amplified by new digital services and uses that disrupt our daily lives. Traditional businesses who lag behind in this data revolution are inevitably put at a serious competitive disadvantage.

To be sure, all organizations and all sectors of activity are now impacted by the new role data represents as a strategic asset. Most companies now understand that in order to keep up with innovative startups and powerful web giants, they must capitalize on their data.

This shift in the digital landscape has led to widespread digital transformations the world over with everybody now wanting to become “Data-Driven”.

The Road to Becoming Data-Driven

In order to become data-driven, one has to look at data as a business asset that needs to be mastered first and foremost, and then exploited.

The data-driven approach is a means to collect, safeguard and maintain data assets of the highest quality whilst also tackling the new data security issues that come with the territory. Today, data consumers must have access to accurate, intelligible, complete, and consistent data in order to detect potential business opportunities, minimize time-to-market, and undertake regulatory compliance.

The road to the promised land of data innovation is full of obstacles.

Data legacy, with its heavy silos and the all too often tribal nature of data knowledge, rarely bodes well for the overall quality of data. The advent of Big Data has also reinforced the perception that the life cycle of any given data must be mastered in order for you to find your way through the massive volume of the enterprise’s stored data.

It’s a challenge that encompasses numerous roles and responsibilities, processes and tools.

The implementation of a data governance is therefore, a chapter that any data-driven company must write.

However, our belief that the approaches to data governance from recent years have not kept their promises is borne out by our own field experience along with numerous and ongoing discussions with key data players.

We strongly believe in adopting a different approach to maximize the chances of success. Our Professional Services and Customer Success teams provide our customers with the expertise they need to build effective data governance, through a more pragmatic and iterative approach that can adapt to a constantly changing environment.

We call it the Effective Data Governance Framework.

Our Beliefs on Data

Awareness of the importance of data is a long journey that every company has to make. But each journey is different: company data maturity varies a lot; expectations and obligations can also vary widely.

Overall success will come about with a litany of small victories over time.

We have organized our framework in 3 steps.

Alignment

Evaluate your Data maturity

Specify your Data strategy

Getting sponsors

Build a SWOT analysis

Adapting

Organize your Data Office

Organize your Data Community

Creating Data Awareness

Implementing Metadata Management With a Data Catalog

The importance of metadata

6 weeks to start your data governance journey

Season 1, Episode 1: Alignment

This first season is designed to help your organization align itself with your data strategy by ensuring an understanding of the overall context.

What follows will help you, and all the key sponsors, identify the right stakeholders from the get-go. This first iteration will help you evaluate the data maturity of your organization through different angles.

In the form of a workshop, our Data Governance Maturity Audit will help you visualize, through a Kiviat Diagram, your scores as shown below:

Data Maturity Audit: Important Questions to Ask

Organization

Is an organizational structure with different levels of governance (exec, legal, business, …) in place? Are there roles and responsibilities at different specified levels (governance committees, tech leaders, data stewards, …)?

Data Stewards

Are the data stewards in charge of coordinating data governance activities identified and assigned to each area or activity?

Accountabilities

Have the roles, responsibilities and accountability for decision-making, management and data security been clearly defined and communicated (to the data stewards themselves, but also to everyone involved in the business)?

The Means

Do data stewards have sufficient authority to quickly and effectively correct data problems while ensuring that their access does not violate personal or sensitive data policies?

The Requirements

Have policy priorities affecting key data governance rules and requirements been defined? Is there an agreement (formal agreement or verbal approval) on these priorities by the key stakeholders (sponsors, policy makers, exec)?

Life Cycle Management

Have standard policies and procedures for all aspects of data governance and data management lifecycle, including collection, maintenance, use and dissemination, been clearly defined and documented?

Compliance

Are policies and procedures for ensuring that all data is collected, managed, stored, transmitted, used and destroyed in such a way that confidentiality is maintained in accordance with security standards in place (GDPR for example)?

Feedback

Has an assessment been conducted to ensure the long-term relevance and effectiveness of the policies and procedures in place, including the assessment of staffing, tools, technologies and resources?

Process Visions

Do you have a mapping describing the processes to monitor compliance with its established policies and procedures?

Transparency

Have the policies and procedures been documented and communicated in an open and accessible way to all stakeholders, including colleagues, business partners and the public (eg: via a publication on your website)?

Overview
Does your organization have an inventory of all the data sources (from software packages, internal databases, data lakes, local files, …)?

Managing Sensitive Information
Does your organization have a detailed, up-to-date inventory of all data that should be classified as sensitive (ie, which is at risk of being compromised / corrupted by unauthorized or inadvertent disclosure), personal, or both?

Level of Risks
Has your data been organized according to the level of risk of disclosure of personal information potentially contained in the records?

Documentation Rules
Does your organization have a written and established rule describing what should be included in a data catalog? Is it clear how, when and how often this information is written and by whom?

Information Accessibility
Does your organization let everyone concerned by data access the data catalog? Is the data needed indexed in the catalog or not?

Global Communication
Does your organization communicate internally on the importance data can play in its strategy?

Communication Around Compliance
Does your organization communicate with its employees (at least those who are directly involved in using or manipulating data) about current regulatory obligations related to data?

Working for the Common Good
Does your organization promote the sharing of datasets (those that are harder to find and/or only used by a small group for example) via different channels?

Optimizing Data Usage
Does your organization provide the relevant people training on how to read, understand and use the data?

Promoting Innovation
Does your organization value and promote the successes and innovations produced (directly or not) by the data?

Collecting & Storing Data
Does your organization have clear information on the reason for capturing and storing personal data (operational need, R&D, legal, etc.)?

Justification Control
Does your organization have a regular verification procedure to ensure the data collected is consistent with the information mentioned above?

Anonymization
Have anonymization or pseudo-anonymization mechanisms been put in place for personal data, direct or indirect?

Detailed Procedure
Has the organization established and communicated policies and procedures on how to handle records at all stages of the data life cycle, including the acquisition, maintenance, use, archiving or destruction of records?

Data Quality Rules
Does the organization have policies and procedures in place to ensure that the data is accurate, complete, up-to-date and relevant to the users’ needs?

Data Quality Control
Does the organization conduct regular data quality audits to ensure that its quality control strategies are up-to-date and that corrective actions taken in the past have improved the quality of the data?

Data Access Policy
Are there policies and procedures in place to restrict and monitor access to data in order to limit who can access what data (including assigning differentiated access levels based on job descriptions and responsibilities)?

Are these policies and procedures consistent with local, national, … privacy laws and regulations (including the GDPR)?

Data Access Control
Have internal procedural controls been put in place to manage access to user data, including security controls, training and confidentiality agreements required by staff with personal data access privileges?

General Framework
Has a comprehensive security framework been defined, including administrative, physical, and technical procedures to address data security issues (such as access and data sharing restrictions, strong password management, regular selection and training of staff, etc.)?

Risk Assessment
Has a risk assessment been undertaken?

Does this risk assessment include an assessment of the risks and vulnerabilities related to both intentional and malicious misuse of data (eg hackers) and inadvertent disclosure by authorized users?

Risk Mitigation Plan
Is there a plan in place to mitigate the risks associated with intentional and unintentional data breaches?

Prevention
Does the organization monitor or audit data security on a regular basis?

Recovery Plan
Have policies and procedures been established to ensure the continuity of data services in the event of a data breach, loss, or another disaster (this includes a disaster recovery plan)?

Flow Regulation
Are policies in place to guide decisions on data exchange and reporting, including sharing data (in the form of individual records containing personal information or anonymized aggregate reports) internally with business profiles, analysts/data scientists, decision-makers, or externally with partners?

Usage Contracts and Legal Commitment
When sharing data, are appropriate procedures, such as sharing agreements, in place to ensure that personal information remains strictly confidential and protected from unauthorized disclosure? Note that data sharing agreements must fall in line with all applicable regulations, such as the GDPR.

These agreements can only take place if data sharing is permitted by law.

Control of Product Derivatives
Are appropriate procedures, such as obfuscation or deletion, in place to ensure that information is not inadvertently disclosed in general reports and that the organization’s reporting practices remain in compliance with the laws and regulations in force (for example, GDPR)?

Stakeholder Information
Are stakeholders, including the individuals whose data are kept, regularly informed about their rights under the applicable laws or regulations governing data confidentiality?

Our interactive toolkit will allow you to visualize where your efforts should lie when implementing a data governance strategy.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What is a Data Product Manager?

Actian Corporation

April 8, 2021

data-product-manager

Data Product Management has been a regular topic of discussion among Data Science, Engineering, and Product Management teams over the last few years, particularly when Data Science products and Machine Learning are involved.

The role of the Data Product Manager has many similarities to that of a Software Product Manager in that a keen understanding of customer business requirements is crucial. There are, however, some key differences in their respective responsibilities and the skill sets needed.

In What Business Environment Does a Data Product Manager Usually Navigate?

It is fair to say that Machine Learning dependent products impact our daily lives. Social media platforms (Linkedin, Facebook, Twitter), Google, Uber and Airbnb have all developed highly sophisticated ML algorithms to improve the quality of their product.

Today, Data Science products are by no means the chasse gardée of the top tech companies however. They have also become a common feature in a variety of enterprise related domains such as predictive analytics, supply chain management, crime detection, fraud, high staff turnover prevention to mention but a few.

Data Product Managers are often called for when Data Science Products are involved, in other words when the core business value in the spotlight depends on Machine Learning and Artificial Intelligence.

What Does a Data Product Manager Do?

Again, the role of the Data Product Manager is analogous to most Product Management roles in that it is geared towards developing the best possible product for the customers/users. That remains the key focus for the Data Product Manager.

There are, however, some subtle differences when it comes to the remit of the Data Product Manager.

The population range that the Data Product Manager caters for is often wide and can include Data Scientists, Data Engineers, Data Analysts, Data Architects and even developers and testers. Such a diverse pool of expectations requires a solid understanding of each of these fields in order for the Data Product Manager to understand the use case for each stakeholder, not to mention strong people skills to navigate through these different universes unscathed.

To demonstrate the diverse range of skills involved in this new role, the ideal Data Product Manager will have a broad understanding of Machine Learning algorithms, Artificial intelligence and statistics. He will have some coding experience (enough to dip his toes in if needed), be good at math, understand the Big Data technologies… and have second to none communication skills.

The Data Product Manager can even be assigned the responsibility of centralizing access to Data at the enterprise level*.

Here, he might be asked to come up with new ways to manage, collect and exploit data in order to improve the usability and quality of the information. This part of the job may involve choosing a suitable Data Management software to centralize and democratize access to the data sets for all parties mentioned above, breaking down silos between teams and facilitating data access for all.

They may then choose a Data Catalog platform with a powerful knowledge graph and a simple search engine…such platforms do exist.

*How, in this instance, does the role of the Data Product Manager differ from that of the Data Steward you may ask. After all, isn’t it up to the Data Steward to curate, manage, handle permissions and make the data available to the data consumers? One way to consider the distinctions between the two roles could be to see the Data Steward as the data custodian of the data of the present and the Data Product Manager as the custodian and innovator of the data of the future.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Platform

Compute and Storage Resources With Actian Data Platform on GKE

Actian Corporation

March 31, 2021

Computer vs storage

On-Premise, You’re Grounded

The emergence of the Hadoop Distributed File System (HDFS) and the ability to create a data lake of such unprecedented depths – on standard hardware no less! – was such a breakthrough that the administrative pain and the hardware costs involved with building out an HDFS-based analytic solution were acceptable casualties of innovation. Today, though, with an analytic tool like the Actian Data Platform (formerly known as Avalanche) containerized, running in the cloud, and taking advantage of Google Kubernetes Engine (GKE), there’s no reason to put up with those pains. Indeed, because Actian on GKE treats compute and storage as separate resources, organizations can gain access to the power of Actian — to meet all their analytic needs, on both a day-to-day and peak-season basis — more easily and cost-effectively than ever before.

Consider: When Hadoop first appeared, the cloud was not taken as an option for data analytics. Building out an HDFS-based data lake involved adding servers and storage resources on-premises — which also meant investments in ancillary infrastructure (networks, load balancers, and so on) as well as on-site personnel to manage and maintain the growing number of cabinets taking over the data center. The cost of analytic insight was driven still higher by the fact that all these compute and storage resources had to be deployed with an organization’s peak processing demands in mind. No matter that those peaks only occurred occasionally — at the end of the quarter or during the busy holiday shopping season — the cluster performing the analytics needed to be ready to support those demands when they arrived. Was much of that CPU power, RAM, and storage space idle during the non-peak periods? Yes, but that was the price to be paid for reliable performance during periods of peak demand.

But peak period performance was not the only element driving up the cost of an on-prem, HDFS-based data lake. If the organization needed to store large amounts of data, the distributed nature of HDFS required that organizations deploy more compute resources to manage the additional storage — even if there was already excess compute capacity within the broader analytic cluster. Additionally, no one added just a little storage when expanding capacity. Even if you only needed a few GB of additional storage, you’d deploy a new server with multiple terabytes of high speed storage, even if that meant you’d be growing into that storage space over quite a long time. Further, every organization had to figure this out for themselves which incurred significant devotion of skilled IT resources that could be used elsewhere.

Unbinding the Ties on the Ground

Actian has broken the links between compute and storage. Actian running in the cloud on GKE, scales compute and storage independently creating great opportunities and potentially great cost savings for organizations seeking flexible, high-performance, cloud-based analytical solutions.

We’ve already talked about the administrative advantages of running the Actian Data Platform as a containerized application on GKE. Actian can be deployed faster and more easily on Google GKE because all the components are ready to  go. There are no configuration scripts to run; application stacks to build in the wrong order. What we didn’t mention (or at least expand upon) in our last blog on the topic is that you don’t have to configure Actian on GKE to meet those peak-performance spike demands. You can deploy Actian with just your day-to-day performance needs in mind. Nor did we mention that you don’t need to provision storage for each worker node in the cluster.

How is this possible, you ask? Because Google’s cloud services are highly elastic — something one cannot say about an on-premises infrastructure. Though the compute resources initially allocated to an Actian cluster (measured in Actian Units, AUs) are sufficient to support daily operational workloads, invariably, they will not be sufficient to deliver the desired compute performance during demand peaks —they are, after all, configured to support day-to-day traffic demands. The elasticity of the Google cloud infrastructure is such that additional AUs can be added into the cluster when they’re needed. All you need to do is scale the AUs to match the desired performance levels and the Google compute infrastructure will take care of the rest. More AUs means more cores will be added — or subtracted — as needed. Yes, as you use more compute power during those peak periods you’ll pay more for the use of those resources, but one big advantage of the cloud is that you ultimately pay only for the compute resources you actually use. Once the peak has passed, the extra AUs can be removed, and your costs will drop back to the levels associated with your day-to-day processing demands.

Similarly, with storage, the Google cloud infrastructure will allocate as much storage space as your data requires. If you add or remove data from the system, Google increases or decreases the amount of storage allocated for your needs — instantly and automatically.

Serving Up Satisfaction

This storage elasticity becomes an even more obvious benefit when you realize that you don’t need to deploy additional HDFS worker nodes just to manage this data — even if you’re expanding your database by an extra 4, 40, or 400TB. As with added compute cores, you’ll pay more for more storage space — it’s the same pay-for-what-you-use model — but because the storage and compute components have been separated you are not required to add a dedicated server to manage storage for every TB of storage you add. GKE will always ensure that Actian has the compute resources to deliver the performance you need, you can increase and decrease the number AUs based on your performance expectations, not the limitations of a runtime architecture built with on-prem constraints in mind.

In the end, separation of compute and storage offers a huge advantage to anyone interested in serious analytics. Large companies can reduce their costs by not having to overbuild their on-prem infrastructures to accommodate the performance demands that they know will be arriving. Smaller companies can build out an analytics infrastructure that might have been unaffordable before because they don’t have to configure for peak performance demands either. For both large and small companies, Google delivers the resources that your analytics require — no more and no less — enabling Actian on Google Cloud Platform to deliver the analytical insights you require without breaking the bank.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

5 Tips for Extracting More ROI From Your CRM and Marketing Tech Stacks

Actian Corporation

March 28, 2021

operational data warehouse

Tech stacks are getting more complicated by the day. Marketing Operations, Revenue Operations, Sales Operations, IT, Analytics, and Executives—we are all doing business using digital automations, including integrations, that allow us to target and interact with our customers and prospects in more meaningful and rewarding ways than ever before.

Take a moment to consider. Are you using a single unified app or platform to strategically grow your revenue? Or are your teams still operating in silos and spreadsheets while losing out on opportunities to make a bigger impact?

If you are like many strategic marketing and revenue leaders, your various specialized sales and marketing technology platforms (MarTech)—Salesforce.com, Marketo, ZenDesk, Sales Loft, and so many more—generate a lot of last-mile data analytics. You have more data and insights, but it can be a struggle to unify it into one big picture. Teams spend so much time doing their best to get that last mile of data to load or to maintain expected week-over-week growth. This leads to more IT projects, longer lead times, more business resources doing integrations or manually inputting spreadsheets, and ultimately burnout or growth slowdown.

You already know bad data can sabotage your business. But good data buried under layers of apps and reports can be just as damaging. Time and resources currently spent on compiling and reporting last-mile data can prevent your business from reaching its full potential and focus your most talented people on poorly identified and prioritized opportunities instead of driving real new revenue channels and targeting the right accounts, roles, and decision-makers.

Here are five tips to find revenue hidden in your tech stacks.

1. Do Not Buy That New Point App or CRM Module Before Getting Your House in Order

Make sure you can adequately answer the following questions before purchasing a new sales or marketing application:

  • Is your data squeaky clean, validated, and in the right marketing campaign?
  • Are sales teams able to prioritize real leads?
  • What is your Ideal Customer Profile?
  • Which Job Titles are responding to marketing and sales outreach, then taking meetings?
  • Which Job Titles are converting to opportunities? Can you see this in real time across marketing and sales data?
  • Do you have a single view of your customer and prospects, including the ability to see customer journey and experience, as a combined view of marketing and sales outreach, and engagement?
  • How frequently are you communicating with your top prospects across all channels—email, phone, chat, social media, etc.? Can you analyze that data by touchpoint and across nurture tracks?
  • Is the customer and prospect data in your CRM and MarTech systems well-understood, clean, and optimized to match your go-to-market (GTM)?
  • Can you measure your KPIs? Are they accurate? And are monitored automatically and easily visualized and reportable to all revenue/marketing leaders so that they can focus on decision-making and you can focus on actions?

If your analysts and operations teams are spending a large percentage of time on manual workloads and entries, such as updating standalone spreadsheets in Microsoft Excel, it is a sure sign that there are opportunities you should pursue to improve your operations before investing in more point applications—such as automating manual work and ensuring the optimization of existing processes inside your CRM and MarTech platforms. That said, it’s true that optimizing your CRM and MarTech stacks can only take you so far. Undoubtedly, some data will never be unified and there will always be a requirement for an outside view. But, there is a huge opportunity for revenue leaders to unify customer data in a modern cloud data analytics platform—mapped to your KPI’s and GTM—to deliver more revenue.

2. See If You Can Save on CRM or Marketing Automation Platform Fees

Once your operational house is in order, look for opportunities to remove unnecessary services and products, such as:

  • CRM storage fees for older data, or data you do not need. Offload to your unified analytics platform, where storage is typically much less expensive in self-service cloud-based utilities.
  • CRM platform consulting fees and platform fees. Avoid these costs with self-service analytics, using a unified analytics platform.
  • MarTech platform and other app cost reduction or avoidance due to optimized automation and management of customer data.

3. Double Down on One Big Thing

Focus on one big thing that will have the largest impact across your people, your processes, and how you go to market using your MarTech stack. For example, you may be able to make a larger impact with an end-to-end program which includes data cleansing, data validation, tight personas, and a customer journey mapped for the new program/sales experience.

4. Feed Your CRM and MarTech Properly

That  means good data, real-time leads, and integrated information so frontline sales and customer engagement teams have a prioritized daily list of activities, including lead and account scores that allow simple sorting in CRM reports. Share persona-mapped leads and have Program-Priority, or ‘Sales Play,’ categorized for easy handling. A centralized Revenue Operations or Marketing Operations analyst or team running automations can eliminate duplicated efforts and ensure the best data route to the correct territory and appropriate sales representative.

5. Redirect Your Resources

Now that you know your ideal customer and are saving time, money, and effort by streamlining CRM, MarTech platforms, tech services, data gathering, and analytics, it is time to redirect your resources to future revenue generation. Secure strategic funding by presenting your new revenue operations plan based on what is working in the market, supported by your enhanced command of 360-degree data. Continue to measure, improve, and act upon what is most important to your current and prospective customers.

Tackling all these can seem like a huge task. However, it is well worth the effort to ensure your business is ready to take advantage of future opportunities. In the next blog entry in this series, I’ll give you a detailed prescription of how best to address these issues and to streamline your ability to acquire, retain, and expand your customer base in pursuit of revenue optimization. However, if you’re short on patience or time, take a look at our Customer360 Revenue Optimization solution.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Quality Management: The Ingredients to Improve Your Data

Actian Corporation

March 26, 2021

data quality management

Having large volumes of data is useless if they are of poor quality. The challenge of Data Quality Management is a major priority for companies today. As a decision-making tool used for managing innovation as well as customer satisfaction, monitoring data quality requires much rigor and method.

Producing data for the sake of producing data because it’s trendy, because your competitors are doing it, because you read about it in the press or on the Internet- all that is in the past. Today, no business sector denies the eminently strategic nature of data. 

However, the real challenge surrounding data is of its quality. According to the 2020 edition of the Gartner Magic Quadrant for Data Quality Solutions, more than 25% of critical data in large companies is incorrect. This puts enterprises in a situation that generates direct and indirect costs. Strategic errors, bad decisions, various costs associated with data management… The average cost of bad data quality is 11 million euros per year

Why is that? 

Simply because from now on, all of your company’s strategic decisions are guided by the knowledge of your customers, your suppliers, and your partners. If we consider that data is omnipresent in your business, Data Quality becomes a priority issue

Gartner is not the only one to underline this reality. At the end of 2020, IDC revealed in a study that companies are facing many challenges with their data. Nearly 2 out of 3 companies consider the identification of relevant data as a challenge, 76% of them consider that data collection can be improved, and 72% think that their data transformation processes for analysis purposes could be improved.

Data Quality Management: A Demanding Discipline

Just like when you’re cooking, the more you use quality ingredients, the more your guests will appreciate your recipe. Because data are elements that must lead to better analyses and, therefore, to better decisions, it is essential to ensure that they are of good quality.

But what is quality data? Several criteria can be taken into account. The accuracy of the data (a complete telephone number), its conformity (a number is composed of 10 digits preceded by a national prefix), its validity (it is always used), its reliability (it allows you to reach your correspondent), etc.

For an efficient Data Quality Management, it is necessary to make sure that all the criteria you have defined to consider that the data is of good quality are fulfilled. But be careful. Data must be updated and maintained to ensure its quality over time to avoid it becoming obsolete. And obsolete data, or data that is not updated, shared or used, instantly loses its value because it no longer contributes effectively to your thinking, your strategies and your decisions.

Data Quality Best Practices

To guarantee the integrity, the coherence, the accuracy, the validity and, in a word, the quality of your data, you must act with correct methodology. The essential step of an efficient Data Quality Management project is to avoid duplication. Beyond acting as a dead weight in your databases, duplicates distort analyses and can undermine the relevance of your decisions. 

If you choose a Data Quality Management tool, make sure it includes a module that automates the exploitation of metadata. By centralizing all the knowledge you have about your data within a single interface, their exploitation is facilitated. This is the second pillar of your Data Quality Management project. 

The precise definition of your data and their taxonomy, allows you to efficiently engage the quality optimization process. Then, once your data has been clearly identified and classified, it is a matter of putting it into perspective with the expectations of the various business lines within the company in order to assess its quality. 

This work of reconciliation between the nature of the available data and its use by the business lines is a decisive element of Data Quality management. But it is also necessary to go further and question the sensitivity of the data. Whether or not the data is sensitive depends on your choices in relation to the challenge of regulatory compliance.

Since the GDPR came to be in 2018, the consequences of risky choices in terms of data security are severe, and not only from a financial point of view. Indeed, your customers are now very sensitive to the nature, use and protection of the data they share with you. 

By effectively managing Data Quality, you also contribute to maintaining trust with your customers… and customer trust is priceless.

 
actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Analytics

What is the Difference Between a Data Analytics Hub and a Lakehouse?

Actian Corporation

March 25, 2021

Data Analytics Ladder

In the opening installment of this blog series—Data Lakes, Data Warehouses and Data Hubs: Do We Need Another Choice? I explore why simply migrating these on-prem data integration, management, and analytics platforms to the Cloud does not fully address modern data analytics needs. In comparing these three platforms, it becomes clear that all of them meet certain critical needs, but none of them meet the needs of business end-users without significant support from IT. In the second blog in this series —  What is a Data Analytics Hub? — I introduce the term data analytics hub to describe a platform that takes the optimal operational and analytical elements of data hubs, lakes, and warehouses and combines them with cloud features and functionality to address directly the real-time operational and self-serve needs of business users (rather than exclusively IT users). I also take a moment to examine a fourth related technology, the analytics hub. Given the titular proximity of analytics hub to data analytics hub, it only made sense to clarify that an analytics hub remains as incomplete a solution for modern analytics as does a data lake, hub, and warehouse.

Why? Because, in essence, a data analytics hub, takes the best of all these integration, management, and analytics platforms and combines them in a single platform. A data analytics hub brings together data aggregation, management, and analytics support for any data source with any BI or AI tool, visualization, reporting or other destination. Further, a data analytics hub is built to be accessible to all users on a cross-functional team (even a virtual one). The diagram below shows the relationship between the four predecessors and the data analytics hub (it will look familiar to you if you read installment two of this series).

Wait…What About Data Lakehouse?

Last week, I had the privilege of hosting Bill Inmon, considered the father of data warehousing for a webinar on modern data Integration in cloud data warehouses. Needless to say, there were lots of questions for Bill, but there was one that I thought deserved focused discussion here: What is a data lakehouse, and how is it different from a data lake or data warehouse?

Let’s start with the most obvious and a dead giveaway from the name: a data lakehouse is a combination of commodity hardware, open standards, and semi-structured and unstructured data handling capabilities from a data lake and the SQL Analytics, structured schema support, and BI tool integration found in a data warehouse. This is important because the question is less how a data lakehouse differs from a data lake or data warehouse and more how is it more like one or the other. And that distinction is important because where you start in your convergence matters. In simple mathematical terms if A + B = C then B + A = C. But in the real world this isn’t entirely true. The starting point is everything when it comes to the convergence of two platforms or products, as that starting point informs your view of where you’re going, your perception of the trip, and your sense of whether or not you’ve ended up where you expected when you’ve finally arrived at the journey’s end.

Speaking of journeys, let’s take a little trip down memory lane to understand the challenges driving the idea of a data lakehouse.

Historically, data lakes were the realm of data scientists and power users. They supported vast amounts of data — structured and unstructured — for data exploration and complicated data science projects on open standard hardware. But those needs didn’t require access to active data such as that associated with the day-to-day operational business process. They often became science labs and, in some cases, data dumping grounds.

Contrast that with the historical needs of business analysts and other line of business (LOB) power users. They were building and running operational workloads associated with SQL analytics, BI, visualization, and reporting, and they required access to active data. For their needs, IT departments set up enterprise data warehouses, which traditionally leveraged a limited set of ERP application data repositories intimately tied to day-to-day operations. IT needed to intermediate between the data Warehouse and the business analysts and LOB power users, but the data warehouse itself effectively provided a closed-feedback-loop that drove insights for better decision support and business agility.

As digital transformation has progressed, though, needs changed. Applications have become more intelligent and they permeate every aspect of the business. Expectations for data lakes and data warehouses have evolved. The demand for real-time decision support has reduced the data warehouse/ERP repository feedback loop asymptotically, to the point where it approaches real-time. And the original set of ERP repositories are no longer the only repositories of interest to business analysts and LOB power users – web clickstreams, IoT, log files, and other sources are also critical pieces to the puzzle. But these other sources are found in the disparate and diverse datasets swimming in data lakes and spanning multiple applications and departments. Essentially, every aspect of human interaction can be modelled to reveal insights that can greatly improve operational accuracy — so consolidating data from a diverse and disparate data universe and pulling it into a unified view has crystalized as a key requirement. This need is driving convergence in both the data lake and data warehouse spaces and giving rise to this idea of a data lakehouse.

Back to the present: Two of the main proponents for data lakehouses are databricks and Snowflake. The former approaches the task of platform consolidation from the perspective of a data lake vendor and the latter from the perspective of a data warehouse vendor. What their data lakehouse offerings share is this:

  • Direct access to source data for BI and analytics tools (from the data warehouse side).
  • Support for structured, semi-structured and unstructured data (from the data lake side).
  • Schema support with ACID compliance on concurrent reads and writes (from the data warehouse side).
  • Open standard tools to support data scientists (from the data lake side).
  • Separation of compute and storage (from the data warehouse side).

Key advantages shared include:

  • Removing the need for separate repositories for data science and operational BI workloads.
  • Reducing IT administration burden.
  • Consolidating the silos established by individual BI and AI tools creating their own data repositories.

Emphasis is Everything

Improving the speed and accuracy of analysis on large complex datasets isn’t a task for which the human mind is well suited; we simply can’t comprehend and find subtle patterns in truly large, complex sets of data (or, put another way, sorry, you’re not Neo and you can’t “see” the Matrix in a digital data stream). However, AI is very good at finding patterns in complex multivariate datasets — as long as data scientists can design, train, and tune the algorithms needed to do this (tasks for which their minds are very well suited). Once the algorithms have been tuned and deployed as part of operational workloads, they can support decision-making done by humans (decision support based on situational awareness) or done programmatically (decision support automated and executed by machines as unsupervised machine-to-machine operations). Over time, any or all of these algorithms may need tweaking based on a pattern of outcomes or drift from expected or desired results. Again, not that I feel the need to put in a plug, these are tasks for which the human mind is well suited.

But go back to the drive for convergence and consider where the data lakehouse vendors are starting. What’s the vantage point from their perspective? And how does that color the vendor’s view of what the converged destination looks like? Data lakes have historically been used by data scientists, aided by data engineers and other skilled IT personnel, to collect and analyze the data needed to handle the front end of the AI life cycle, particularly for Machine Learning (ML). Extending that environment means facilitating the deployment of their ML into the operational workloads. From that perspective, success would be a converged platform that shortens the ML lifecycle and makes it more efficient. For business analysts, data engineers, and power users, though, playing with algorithms or creating baseline datasets to train and tune is not part of their day job. For them, additively running ML as part of their operational workloads, inclusive of the additional diverse and disparate datasets, is what matters.

While data scientists and data engineers may not be in IT departments proper, they are not the same as non-IT end-users. Data lakes are generally complex environments to work in, with multiple APIs and significant amounts of coding, which is fine for data scientists and engineers but not fine at all for non-IT roles such as business and operational analysts or their equivalents in various LOB departments. They really need convergence that expands a data warehouse to handle the operationalized ML components in their workloads on a unified platform — without expanding the complexity of the environment or adding in requirements for lots of nights and weekends getting new degrees.

Are We Listening to Everyone We Need To?

I’ve been in product management and product marketing and, at the end of the day, the voice that carries the furthest and loudest is the voice of your customers. They’re the ones who will always best define the incremental features and functionality of your products. For data lake vendors it’s the data scientists, engineers and IT; for data warehouse vendors, it’s IT. Logically, the confines of the problem domain are limited to these groups.

But guess what? This logic misses the most important group out there

That group comprises the business and its representatives, the Business and Operational Analysts and other power users outside of IT and engineering. The data lake and data warehouse vendors — and by extension the data lakehouse vendors — don’t talk to these users because IT is always standing in the middle, always intermediating. These users talk to the vendors of BI and Analytics tools and, to a lesser extent, the vendors offering data hubs and analytics hubs.

The real issue for all these groups involves getting data ingested into the data repository, enriching it, running baseline in-platform analysis and leveraging existing tools for further BI analysis, AI, visualization and reporting without leaving the environment. The issue is more acute for the business side as they need self-service tools they currently don’t have outside of the BI and Analytics tools (which often silo data within the tool/project instead of facilitating the construction of a unified view that can be seen by all parties).

Everyone agrees there needs to be a unified view of data that all parties can access, but the agreement will not satisfy all parties equally. A data lakehouse based on a data lake is a great way to improve the ML lifecycle and bring data scientists closer to the rest of the cross functional team. However, that could be accomplished simply by moving the HDFS infrastructure to the cloud and using S3, ADLS, or Google Cloud Store plus a modern cloud data warehouse. Such a solution would satisfy the vast majority of use cases operationalizing ML components to workloads. What’s really missing from both the data lake- and data warehouse-originated lakehouses is the functionality of the data hub and analytics Hub, which is built into the data analytics hub.

Conclusion: A Lakehouse Offers Only a Subset of the Functionality Found in a Data Analytics Hub

The diagram with which we started illustrates how a data analytics hub consolidates the essential elements of a data lake, data warehouse, analytics hub, and data hub. It also illustrates the shortsightedness of the data lakehouse approach. It’s not enough to merge only two of the four components users need for modern analytics, particularly when the development of this chimera is based on feedback from a subset of the cross functional roles that use the platform.

In the next blog we’ll take a deeper look at the use cases driven by this broader group of users, and it will become clear why and how a data analytics hub will better meet the needs of all parties, regardless of whether they are focused on ML-based optimizations or day-to-day operational workloads.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.