Data Platform

Compute and Storage Resources With Actian Data Platform on GKE

Actian Corporation

March 31, 2021

Computer vs storage

On-Premise, You’re Grounded

The emergence of the Hadoop Distributed File System (HDFS) and the ability to create a data lake of such unprecedented depths – on standard hardware no less! – was such a breakthrough that the administrative pain and the hardware costs involved with building out an HDFS-based analytic solution were acceptable casualties of innovation. Today, though, with an analytic tool like the Actian Data Platform (formerly known as Avalanche) containerized, running in the cloud, and taking advantage of Google Kubernetes Engine (GKE), there’s no reason to put up with those pains. Indeed, because Actian on GKE treats compute and storage as separate resources, organizations can gain access to the power of Actian — to meet all their analytic needs, on both a day-to-day and peak-season basis — more easily and cost-effectively than ever before.

Consider: When Hadoop first appeared, the cloud was not taken as an option for data analytics. Building out an HDFS-based data lake involved adding servers and storage resources on-premises — which also meant investments in ancillary infrastructure (networks, load balancers, and so on) as well as on-site personnel to manage and maintain the growing number of cabinets taking over the data center. The cost of analytic insight was driven still higher by the fact that all these compute and storage resources had to be deployed with an organization’s peak processing demands in mind. No matter that those peaks only occurred occasionally — at the end of the quarter or during the busy holiday shopping season — the cluster performing the analytics needed to be ready to support those demands when they arrived. Was much of that CPU power, RAM, and storage space idle during the non-peak periods? Yes, but that was the price to be paid for reliable performance during periods of peak demand.

But peak period performance was not the only element driving up the cost of an on-prem, HDFS-based data lake. If the organization needed to store large amounts of data, the distributed nature of HDFS required that organizations deploy more compute resources to manage the additional storage — even if there was already excess compute capacity within the broader analytic cluster. Additionally, no one added just a little storage when expanding capacity. Even if you only needed a few GB of additional storage, you’d deploy a new server with multiple terabytes of high speed storage, even if that meant you’d be growing into that storage space over quite a long time. Further, every organization had to figure this out for themselves which incurred significant devotion of skilled IT resources that could be used elsewhere.

Unbinding the Ties on the Ground

Actian has broken the links between compute and storage. Actian running in the cloud on GKE, scales compute and storage independently creating great opportunities and potentially great cost savings for organizations seeking flexible, high-performance, cloud-based analytical solutions.

We’ve already talked about the administrative advantages of running the Actian Data Platform as a containerized application on GKE. Actian can be deployed faster and more easily on Google GKE because all the components are ready to  go. There are no configuration scripts to run; application stacks to build in the wrong order. What we didn’t mention (or at least expand upon) in our last blog on the topic is that you don’t have to configure Actian on GKE to meet those peak-performance spike demands. You can deploy Actian with just your day-to-day performance needs in mind. Nor did we mention that you don’t need to provision storage for each worker node in the cluster.

How is this possible, you ask? Because Google’s cloud services are highly elastic — something one cannot say about an on-premises infrastructure. Though the compute resources initially allocated to an Actian cluster (measured in Actian Units, AUs) are sufficient to support daily operational workloads, invariably, they will not be sufficient to deliver the desired compute performance during demand peaks —they are, after all, configured to support day-to-day traffic demands. The elasticity of the Google cloud infrastructure is such that additional AUs can be added into the cluster when they’re needed. All you need to do is scale the AUs to match the desired performance levels and the Google compute infrastructure will take care of the rest. More AUs means more cores will be added — or subtracted — as needed. Yes, as you use more compute power during those peak periods you’ll pay more for the use of those resources, but one big advantage of the cloud is that you ultimately pay only for the compute resources you actually use. Once the peak has passed, the extra AUs can be removed, and your costs will drop back to the levels associated with your day-to-day processing demands.

Similarly, with storage, the Google cloud infrastructure will allocate as much storage space as your data requires. If you add or remove data from the system, Google increases or decreases the amount of storage allocated for your needs — instantly and automatically.

Serving Up Satisfaction

This storage elasticity becomes an even more obvious benefit when you realize that you don’t need to deploy additional HDFS worker nodes just to manage this data — even if you’re expanding your database by an extra 4, 40, or 400TB. As with added compute cores, you’ll pay more for more storage space — it’s the same pay-for-what-you-use model — but because the storage and compute components have been separated you are not required to add a dedicated server to manage storage for every TB of storage you add. GKE will always ensure that Actian has the compute resources to deliver the performance you need, you can increase and decrease the number AUs based on your performance expectations, not the limitations of a runtime architecture built with on-prem constraints in mind.

In the end, separation of compute and storage offers a huge advantage to anyone interested in serious analytics. Large companies can reduce their costs by not having to overbuild their on-prem infrastructures to accommodate the performance demands that they know will be arriving. Smaller companies can build out an analytics infrastructure that might have been unaffordable before because they don’t have to configure for peak performance demands either. For both large and small companies, Google delivers the resources that your analytics require — no more and no less — enabling Actian on Google Cloud Platform to deliver the analytical insights you require without breaking the bank.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Analytics

5 Tips for Extracting More ROI From Your CRM and Marketing Tech Stacks

Actian Corporation

March 28, 2021

operational data warehouse

Tech stacks are getting more complicated by the day. Marketing Operations, Revenue Operations, Sales Operations, IT, Analytics, and Executives—we are all doing business using digital automations, including integrations, that allow us to target and interact with our customers and prospects in more meaningful and rewarding ways than ever before.

Take a moment to consider. Are you using a single unified app or platform to strategically grow your revenue? Or are your teams still operating in silos and spreadsheets while losing out on opportunities to make a bigger impact?

If you are like many strategic marketing and revenue leaders, your various specialized sales and marketing technology platforms (MarTech)—Salesforce.com, Marketo, ZenDesk, Sales Loft, and so many more—generate a lot of last-mile data analytics. You have more data and insights, but it can be a struggle to unify it into one big picture. Teams spend so much time doing their best to get that last mile of data to load or to maintain expected week-over-week growth. This leads to more IT projects, longer lead times, more business resources doing integrations or manually inputting spreadsheets, and ultimately burnout or growth slowdown.

You already know bad data can sabotage your business. But good data buried under layers of apps and reports can be just as damaging. Time and resources currently spent on compiling and reporting last-mile data can prevent your business from reaching its full potential and focus your most talented people on poorly identified and prioritized opportunities instead of driving real new revenue channels and targeting the right accounts, roles, and decision-makers.

Here are five tips to find revenue hidden in your tech stacks.

1. Do Not Buy That New Point App or CRM Module Before Getting Your House in Order

Make sure you can adequately answer the following questions before purchasing a new sales or marketing application:

  • Is your data squeaky clean, validated, and in the right marketing campaign?
  • Are sales teams able to prioritize real leads?
  • What is your Ideal Customer Profile?
  • Which Job Titles are responding to marketing and sales outreach, then taking meetings?
  • Which Job Titles are converting to opportunities? Can you see this in real time across marketing and sales data?
  • Do you have a single view of your customer and prospects, including the ability to see customer journey and experience, as a combined view of marketing and sales outreach, and engagement?
  • How frequently are you communicating with your top prospects across all channels—email, phone, chat, social media, etc.? Can you analyze that data by touchpoint and across nurture tracks?
  • Is the customer and prospect data in your CRM and MarTech systems well-understood, clean, and optimized to match your go-to-market (GTM)?
  • Can you measure your KPIs? Are they accurate? And are monitored automatically and easily visualized and reportable to all revenue/marketing leaders so that they can focus on decision-making and you can focus on actions?

If your analysts and operations teams are spending a large percentage of time on manual workloads and entries, such as updating standalone spreadsheets in Microsoft Excel, it is a sure sign that there are opportunities you should pursue to improve your operations before investing in more point applications—such as automating manual work and ensuring the optimization of existing processes inside your CRM and MarTech platforms. That said, it’s true that optimizing your CRM and MarTech stacks can only take you so far. Undoubtedly, some data will never be unified and there will always be a requirement for an outside view. But, there is a huge opportunity for revenue leaders to unify customer data in a modern cloud data analytics platform—mapped to your KPI’s and GTM—to deliver more revenue.

2. See If You Can Save on CRM or Marketing Automation Platform Fees

Once your operational house is in order, look for opportunities to remove unnecessary services and products, such as:

  • CRM storage fees for older data, or data you do not need. Offload to your unified analytics platform, where storage is typically much less expensive in self-service cloud-based utilities.
  • CRM platform consulting fees and platform fees. Avoid these costs with self-service analytics, using a unified analytics platform.
  • MarTech platform and other app cost reduction or avoidance due to optimized automation and management of customer data.

3. Double Down on One Big Thing

Focus on one big thing that will have the largest impact across your people, your processes, and how you go to market using your MarTech stack. For example, you may be able to make a larger impact with an end-to-end program which includes data cleansing, data validation, tight personas, and a customer journey mapped for the new program/sales experience.

4. Feed Your CRM and MarTech Properly

That  means good data, real-time leads, and integrated information so frontline sales and customer engagement teams have a prioritized daily list of activities, including lead and account scores that allow simple sorting in CRM reports. Share persona-mapped leads and have Program-Priority, or ‘Sales Play,’ categorized for easy handling. A centralized Revenue Operations or Marketing Operations analyst or team running automations can eliminate duplicated efforts and ensure the best data route to the correct territory and appropriate sales representative.

5. Redirect Your Resources

Now that you know your ideal customer and are saving time, money, and effort by streamlining CRM, MarTech platforms, tech services, data gathering, and analytics, it is time to redirect your resources to future revenue generation. Secure strategic funding by presenting your new revenue operations plan based on what is working in the market, supported by your enhanced command of 360-degree data. Continue to measure, improve, and act upon what is most important to your current and prospective customers.

Tackling all these can seem like a huge task. However, it is well worth the effort to ensure your business is ready to take advantage of future opportunities. In the next blog entry in this series, I’ll give you a detailed prescription of how best to address these issues and to streamline your ability to acquire, retain, and expand your customer base in pursuit of revenue optimization. However, if you’re short on patience or time, take a look at our Customer360 Revenue Optimization solution.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Quality Management: The Ingredients to Improve Your Data

Actian Corporation

March 26, 2021

data quality management

Having large volumes of data is useless if they are of poor quality. The challenge of Data Quality Management is a major priority for companies today. As a decision-making tool used for managing innovation as well as customer satisfaction, monitoring data quality requires much rigor and method.

Producing data for the sake of producing data because it’s trendy, because your competitors are doing it, because you read about it in the press or on the Internet- all that is in the past. Today, no business sector denies the eminently strategic nature of data. 

However, the real challenge surrounding data is of its quality. According to the 2020 edition of the Gartner Magic Quadrant for Data Quality Solutions, more than 25% of critical data in large companies is incorrect. This puts enterprises in a situation that generates direct and indirect costs. Strategic errors, bad decisions, various costs associated with data management… The average cost of bad data quality is 11 million euros per year

Why is that? 

Simply because from now on, all of your company’s strategic decisions are guided by the knowledge of your customers, your suppliers, and your partners. If we consider that data is omnipresent in your business, Data Quality becomes a priority issue

Gartner is not the only one to underline this reality. At the end of 2020, IDC revealed in a study that companies are facing many challenges with their data. Nearly 2 out of 3 companies consider the identification of relevant data as a challenge, 76% of them consider that data collection can be improved, and 72% think that their data transformation processes for analysis purposes could be improved.

Data Quality Management: A Demanding Discipline

Just like when you’re cooking, the more you use quality ingredients, the more your guests will appreciate your recipe. Because data are elements that must lead to better analyses and, therefore, to better decisions, it is essential to ensure that they are of good quality.

But what is quality data? Several criteria can be taken into account. The accuracy of the data (a complete telephone number), its conformity (a number is composed of 10 digits preceded by a national prefix), its validity (it is always used), its reliability (it allows you to reach your correspondent), etc.

For an efficient Data Quality Management, it is necessary to make sure that all the criteria you have defined to consider that the data is of good quality are fulfilled. But be careful. Data must be updated and maintained to ensure its quality over time to avoid it becoming obsolete. And obsolete data, or data that is not updated, shared or used, instantly loses its value because it no longer contributes effectively to your thinking, your strategies and your decisions.

Data Quality Best Practices

To guarantee the integrity, the coherence, the accuracy, the validity and, in a word, the quality of your data, you must act with correct methodology. The essential step of an efficient Data Quality Management project is to avoid duplication. Beyond acting as a dead weight in your databases, duplicates distort analyses and can undermine the relevance of your decisions. 

If you choose a Data Quality Management tool, make sure it includes a module that automates the exploitation of metadata. By centralizing all the knowledge you have about your data within a single interface, their exploitation is facilitated. This is the second pillar of your Data Quality Management project. 

The precise definition of your data and their taxonomy, allows you to efficiently engage the quality optimization process. Then, once your data has been clearly identified and classified, it is a matter of putting it into perspective with the expectations of the various business lines within the company in order to assess its quality. 

This work of reconciliation between the nature of the available data and its use by the business lines is a decisive element of Data Quality management. But it is also necessary to go further and question the sensitivity of the data. Whether or not the data is sensitive depends on your choices in relation to the challenge of regulatory compliance.

Since the GDPR came to be in 2018, the consequences of risky choices in terms of data security are severe, and not only from a financial point of view. Indeed, your customers are now very sensitive to the nature, use and protection of the data they share with you. 

By effectively managing Data Quality, you also contribute to maintaining trust with your customers… and customer trust is priceless.

 
actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Analytics

What is the Difference Between a Data Analytics Hub and a Lakehouse?

Actian Corporation

March 25, 2021

Data Analytics Ladder

In the opening installment of this blog series — Data Lakes, Data Warehouses and Data Hubs: Do We Need Another Choice? I explore why simply migrating these on-prem data integration, management, and analytics platforms to the Cloud does not fully address modern data analytics needs. In comparing these three platforms, it becomes clear that all of them meet certain critical needs, but none of them meet the needs of business end-users without significant support from IT. In the second blog in this series —  What is a Data Analytics Hub? — I introduce the term data analytics hub to describe a platform that takes the optimal operational and analytical elements of data hubs, lakes, and warehouses and combines them with cloud features and functionality to address directly the real-time operational and self-serve needs of business users (rather than exclusively IT users). I also take a moment to examine a fourth related technology, the analytics hub. Given the titular proximity of analytics hub to data analytics hub, it only made sense to clarify that an analytics hub remains as incomplete a solution for modern analytics as does a data lake, hub, and warehouse.

Why? Because, in essence, a data analytics hub, takes the best of all these integration, management, and analytics platforms and combines them in a single platform. A data analytics hub brings together data aggregation, management, and analytics support for any data source with any BI or AI tool, visualization, reporting or other destination. Further, a data analytics hub is built to be accessible to all users on a cross-functional team (even a virtual one). The diagram below shows the relationship between the four predecessors and the data analytics hub (it will look familiar to you if you read installment two of this series).

Wait…What About Data Lakehouse?

Last week, I had the privilege of hosting Bill Inmon, considered the father of data warehousing for a webinar on modern data Integration in cloud data warehouses. Needless to say, there were lots of questions for Bill, but there was one that I thought deserved focused discussion here: What is a data lakehouse, and how is it different from a data lake or data warehouse?

Let’s start with the most obvious and a dead giveaway from the name: a data lakehouse is a combination of commodity hardware, open standards, and semi-structured and unstructured data handling capabilities from a data lake and the SQL Analytics, structured schema support, and BI tool integration found in a data warehouse. This is important because the question is less how a data lakehouse differs from a data lake or data warehouse and more how is it more like one or the other. And that distinction is important because where you start in your convergence matters. In simple mathematical terms if A + B = C then B + A = C. But in the real world this isn’t entirely true. The starting point is everything when it comes to the convergence of two platforms or products, as that starting point informs your view of where you’re going, your perception of the trip, and your sense of whether or not you’ve ended up where you expected when you’ve finally arrived at the journey’s end.

Speaking of journeys, let’s take a little trip down memory lane to understand the challenges driving the idea of a data lakehouse.

Historically, data lakes were the realm of data scientists and power users. They supported vast amounts of data — structured and unstructured — for data exploration and complicated data science projects on open standard hardware. But those needs didn’t require access to active data such as that associated with the day-to-day operational business process. They often became science labs and, in some cases, data dumping grounds.

Contrast that with the historical needs of business analysts and other line of business (LOB) power users. They were building and running operational workloads associated with SQL analytics, BI, visualization, and reporting, and they required access to active data. For their needs, IT departments set up enterprise data warehouses, which traditionally leveraged a limited set of ERP application data repositories intimately tied to day-to-day operations. IT needed to intermediate between the data Warehouse and the business analysts and LOB power users, but the data warehouse itself effectively provided a closed-feedback-loop that drove insights for better decision support and business agility.

As digital transformation has progressed, though, needs changed. Applications have become more intelligent and they permeate every aspect of the business. Expectations for data lakes and data warehouses have evolved. The demand for real-time decision support has reduced the data warehouse/ERP repository feedback loop asymptotically, to the point where it approaches real-time. And the original set of ERP repositories are no longer the only repositories of interest to business analysts and LOB power users – web clickstreams, IoT, log files, and other sources are also critical pieces to the puzzle. But these other sources are found in the disparate and diverse datasets swimming in data lakes and spanning multiple applications and departments. Essentially, every aspect of human interaction can be modelled to reveal insights that can greatly improve operational accuracy — so consolidating data from a diverse and disparate data universe and pulling it into a unified view has crystalized as a key requirement. This need is driving convergence in both the data lake and data warehouse spaces and giving rise to this idea of a data lakehouse.

Back to the present: Two of the main proponents for data lakehouses are databricks and Snowflake. The former approaches the task of platform consolidation from the perspective of a data lake vendor and the latter from the perspective of a data warehouse vendor. What their data lakehouse offerings share is this:

  • Direct access to source data for BI and analytics tools (from the data warehouse side).
  • Support for structured, semi-structured and unstructured data (from the data lake side).
  • Schema support with ACID compliance on concurrent reads and writes (from the data warehouse side).
  • Open standard tools to support data scientists (from the data lake side).
  • Separation of compute and storage (from the data warehouse side).

Key advantages shared include:

  • Removing the need for separate repositories for data science and operational BI workloads.
  • Reducing IT administration burden.
  • Consolidating the silos established by individual BI and AI tools creating their own data repositories.

Emphasis is Everything

Improving the speed and accuracy of analysis on large complex datasets isn’t a task for which the human mind is well suited; we simply can’t comprehend and find subtle patterns in truly large, complex sets of data (or, put another way, sorry, you’re not Neo and you can’t “see” the Matrix in a digital data stream). However, AI is very good at finding patterns in complex multivariate datasets — as long as data scientists can design, train, and tune the algorithms needed to do this (tasks for which their minds are very well suited). Once the algorithms have been tuned and deployed as part of operational workloads, they can support decision-making done by humans (decision support based on situational awareness) or done programmatically (decision support automated and executed by machines as unsupervised machine-to-machine operations). Over time, any or all of these algorithms may need tweaking based on a pattern of outcomes or drift from expected or desired results. Again, not that I feel the need to put in a plug, these are tasks for which the human mind is well suited.

But go back to the drive for convergence and consider where the data lakehouse vendors are starting. What’s the vantage point from their perspective? And how does that color the vendor’s view of what the converged destination looks like? Data lakes have historically been used by data scientists, aided by data engineers and other skilled IT personnel, to collect and analyze the data needed to handle the front end of the AI life cycle, particularly for Machine Learning (ML). Extending that environment means facilitating the deployment of their ML into the operational workloads. From that perspective, success would be a converged platform that shortens the ML lifecycle and makes it more efficient. For business analysts, data engineers, and power users, though, playing with algorithms or creating baseline datasets to train and tune is not part of their day job. For them, additively running ML as part of their operational workloads, inclusive of the additional diverse and disparate datasets, is what matters.

While data scientists and data engineers may not be in IT departments proper, they are not the same as non-IT end-users. Data lakes are generally complex environments to work in, with multiple APIs and significant amounts of coding, which is fine for data scientists and engineers but not fine at all for non-IT roles such as business and operational analysts or their equivalents in various LOB departments. They really need convergence that expands a data warehouse to handle the operationalized ML components in their workloads on a unified platform — without expanding the complexity of the environment or adding in requirements for lots of nights and weekends getting new degrees.

Are We Listening to Everyone We Need To?

I’ve been in product management and product marketing and, at the end of the day, the voice that carries the furthest and loudest is the voice of your customers. They’re the ones who will always best define the incremental features and functionality of your products. For data lake vendors it’s the data scientists, engineers and IT; for data warehouse vendors, it’s IT. Logically, the confines of the problem domain are limited to these groups.

But guess what? This logic misses the most important group out there

That group comprises the business and its representatives, the Business and Operational Analysts and other power users outside of IT and engineering. The data lake and data warehouse vendors — and by extension the data lakehouse vendors — don’t talk to these users because IT is always standing in the middle, always intermediating. These users talk to the vendors of BI and Analytics tools and, to a lesser extent, the vendors offering data hubs and analytics hubs.

The real issue for all these groups involves getting data ingested into the data repository, enriching it, running baseline in-platform analysis and leveraging existing tools for further BI analysis, AI, visualization and reporting without leaving the environment. The issue is more acute for the business side as they need self-service tools they currently don’t have outside of the BI and Analytics tools (which often silo data within the tool/project instead of facilitating the construction of a unified view that can be seen by all parties).

Everyone agrees there needs to be a unified view of data that all parties can access, but the agreement will not satisfy all parties equally. A data lakehouse based on a data lake is a great way to improve the ML lifecycle and bring data scientists closer to the rest of the cross functional team. However, that could be accomplished simply by moving the HDFS infrastructure to the cloud and using S3, ADLS, or Google Cloud Store plus a modern cloud data warehouse. Such a solution would satisfy the vast majority of use cases operationalizing ML components to workloads. What’s really missing from both the data lake- and data warehouse-originated lakehouses is the functionality of the data hub and analytics Hub, which is built into the data analytics hub.

Conclusion: A Lakehouse Offers Only a Subset of the Functionality Found in a Data Analytics Hub

The diagram with which we started illustrates how a data analytics hub consolidates the essential elements of a data lake, data warehouse, analytics hub, and data hub. It also illustrates the shortsightedness of the data lakehouse approach. It’s not enough to merge only two of the four components users need for modern analytics, particularly when the development of this chimera is based on feedback from a subset of the cross functional roles that use the platform.

In the next blog we’ll take a deeper look at the use cases driven by this broader group of users, and it will become clear why and how a data analytics hub will better meet the needs of all parties, regardless of whether they are focused on ML-based optimizations or day-to-day operational workloads.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.