Take full advantage of the separation of compute and storage resources with Avalanche on GKE By Jeremy Hankinson March 31, 2021 On-Premise, you’re grounded The emergence of the Hadoop Distributed File System (HDFS) and the ability to create a data lake of such unprecedented depths – on standard hardware no less! – was such a breakthrough that the administrative pain and the hardware costs involved with building out an HDFS-based analytic solution were acceptable casualties of innovation. Today, though, with an analytic tool like Actian Avalanche containerized, running in the cloud, and taking advantage of Google Kubernetes Engine (GKE), there’s no reason to put up with those pains. Indeed, because Avalanche on GKE treats compute and storage as separate resources, organizations can gain access to the power of Avalanche — to meet all their analytic needs, on both a day-to-day and peak-season basis — more easily and cost-effectively than ever before. Consider: When Hadoop first appeared, the cloud was not taken as an option for data analytics. Building out an HDFS-based data lake involved adding servers and storage resources on-premises — which also meant investments in ancillary infrastructure (networks, load balancers, and so on) as well as on-site personnel to manage and maintain the growing number of cabinets taking over the data center. The cost of analytic insight was driven still higher by the fact that all these compute and storage resources had to be deployed with an organization’s peak processing demands in mind. No matter that those peaks only occurred occasionally — at the end of the quarter or during the busy holiday shopping season — the cluster performing the analytics needed to be ready to support those demands when they arrived. Was much of that CPU power, RAM, and storage space idle during the non-peak periods? Yes, but that was the price to be paid for reliable performance during periods of peak demand. But peak period performance was not the only element driving up the cost of an on-prem, HDFS-based data lake. If the organization needed to store large amounts of data, the distributed nature of HDFS required that organizations deploy more compute resources to manage the additional storage — even if there was already excess compute capacity within the broader analytic cluster. Additionally, no one added just a little storage when expanding capacity. Even if you only needed a few GB of additional storage, you’d deploy a new server with multiple terabytes of highspeed storage, even if that meant you’d be growing into that storage space over quite a long time. Further, every organization had to figure this out for themselves which incurred significant devotion of skilled IT resources that could be used elsewhere. Unbinding the ties on the ground Actian has broken the links between compute and storage.Actian Avalanche running in the cloud on GKE, scales compute and storage independently creating great opportunities and potentially great cost savings for organizations seeking flexible, high-performance, cloud-based analytical solutions. We’ve already talked about the administrative advantages of running Actian Avalanche as a containerized application on GKE. Avalanche can be deployed faster and more easily on Google GKE because all the components are ready to go. There are no configuration scripts to run; application stacks to build in the wrong order. What we didn’t mention (or at least expand upon) in our last blog on the topic is that you don’t have to configure Avalanche on GKE to meet those peak-performance spike demands. You can deploy Avalanche with just your day-to-day performance needs in mind. Nor did we mention that you don’t need to provision storage for each worker node in the cluster. How is this possible, you ask? Because Google’s cloud services are highly elastic — something one cannot say about an on-premises infrastructure. Though the compute resources initially allocated to an Avalanche cluster (measured in Avalanche Units, AUs) are sufficient to support daily operational workloads, invariably, they will not be sufficient to deliver the desired compute performance during demand peaks —they are, after all, configured to support day-to-day traffic demands. The elasticity of the Google cloud infrastructure is such that additional AUs can be added into the cluster when they’re needed. All you need to do is scale the AUs to match the desired performance levels and the Google compute infrastructure will take care of the rest. More AUs means more cores will be added — or subtracted — as needed. Yes, as you use more compute power during those peak periods you’ll pay more for the use of those resources, but one big advantage of the cloud is that you ultimately pay only for the compute resources you actually use. Once the peak has passed, the extra AUs can be removed, and your costs will drop back to the levels associated with your day-to-day processing demands. Similarly, with storage, the Google cloud infrastructure will allocate as much storage space as your data requires. If you add or remove data from the system, Google increases or decreases the amount of storage allocated for your needs — instantly and automatically. Serving up satisfaction This storage elasticity becomes an even more obvious benefit when you realize that you don’t need to deploy additional HDFS worker nodes just to manage this data — even if you’re expanding your database by an extra 4, 40, or 400TB. As with added compute cores, you’ll pay more for more storage space — it’s the same pay-for-what-you-use model — but because the storage and compute components have been separated you are not required to add a dedicated server to manage storage for every TB of storage you add. GKE will always ensure that Avalanche has the compute resources to deliver the performance you need, you can increase and decrease the number AUs based on your performance expectations, not the limitations of a runtime architecture built with on-prem constraints in mind. In the end, separation of compute and storage offers a huge advantage to anyone interested in serious analytics. Large companies can reduce their costs by not having to overbuild their on-prem infrastructures to accommodate the performance demands that they know will be arriving. Smaller companies can build out an analytics infrastructure that might have been unaffordable before because they don’t have to configure for peak performance demands either. For both large and small companies, Google delivers the resources that your analytics require — no more and no less — enabling Avalanche on Google Cloud Platform to deliver the analytical insights you require without breaking the bank. About Jeremy Hankinson 20 Year Actian veteran. Director, Performance Engineering for Actian's Analytics Products with fingers in more pies than fingers. Technology enthusiast. Lapsed physicist. Embattled Father. Avid snow boarder. Englishman living in California.