Actian Blog / Pssst .. Have you heard about VectorH?

Pssst .. Have you heard about VectorH?

OilBusiness 2000×350 REDUCED

Hello World! We’ve been busy building some innovative features into the Actian Vector in Hadoop (VectorH) product and we would love to tell you all about them.

So, the list of the features and innovations that we have done recently for VectorH… wait .. do you even know what VectorH is about? Yes, it’s a great CamelCase example but we did not need a blog post for that.

Let me focus this post on what VectorH really is and the kinds of problems it is designed to solve.   We’ll cover the new features in a subsequent post.

What’s VectorH?

VectorH is our columnar, high performance, ACID-compliant, ANSI SQL 2003 compliant, distributed RDBMS which runs natively within an Apache Hadoop cluster. It uses HDFS or MapR-FS for storage and Hadoop YARN for resource management. VectorH has its roots in the TPC-H record-setting Vectorwise database that pioneered Vectorized processing.

At the heart of Vectorwise (and VectorH) is the x100 execution engine that originated from the research carried out at CWI (the Dutch National Research Institute for Mathematics and Computer Science).

How is VectorH different?

The “secret sauce” that makes VectorH unique is its mature, production-quality implementation of Vector processing and Positional Delta Trees (which enables it to do very efficient transactional, real-time updates without impacting query times).

Updates? Yes, that’s right – we can do updates over HDFS despite it being an append-only file system. The industry is just starting to see some systems offer update capabilities on Hadoop but VectorH has supported it for a while and it has matured a lot since its conception.

VectorH includes a number of other innovations such as lightweight compression methods, multi-core parallelization, intelligent HDFS block placement, predictive buffer management, etc.

These innovations result in record-breaking performance and the performance characteristics of VectorH deserve a separate post. Performance is a very important factor when it comes to large scale data processing but should not be the only factor when choosing the right solution for your implementation.

Should I be using VectorH?

Our VectorH customers have been able to use VectorH to address the following use cases:

  • Cost/complexity reduction: Some of our customers had separate Hadoop clusters and separate dedicated clusters for data warehousing. Data was transferred from the Hadoop cluster into a dedicated data warehouse cluster and then used for analytical processing or made available to BI tools. The data warehouse clusters were expensive to maintain and were not scaling to handle the increasing data/complexity. They were able to move to VectorH within their existing Hadoop clusters to get the same SQL functionality, create faster response times and serve their Business Intelligence users without having to rewrite their queries, and eliminate the separate expensive data warehousing hardware/software.
  • Handling Enterprise workloads: There are a wide variety of SQL engines available for Hadoop, and though innovative, our customers found that a) there was either a lack of SQL maturity so thousands of the existing queries did not work and had to be rewritten OR b) there were stability issues where they could not scale to handle their production workloads with large numbers of concurrent queries. VectorH proves to have enterprise-grade manageability, scalability, and integrity.
  • Meeting SLA’s: A certain segment of our customers in the financial sector have very rigid requirements where certain tasks need to finish in a timely fashion to be able to generate business-critical reports and insights. This required faster performance from the underlying system as well as the ability to modify a subset of the data points (adjustments) without having to run the whole ETL task again. The Positional Delta trees within VectorH were able to handle these incremental updates very well without impacting query times.

If your data volumes are larger than 5 TB or you face any of the above 3 issues, you should consider Actian VectorH to provide the scale and performance to address your business needs.

So there you have it – a very brief overview of what makes VectorH so special and gives it the ability to solve complex enterprise data management use cases.

How can I try VectorH? 

If you can relate to the use cases discussed above, you should give VectorH a try. You can download a trial version of VectorH here and send an email to eval@actian.com to request a trial license key. Our new Getting Started Guide should help you with some basic concepts and installation and our Evaluation Guide should help you with more advanced topics.

We have recently published a Spark-Vector connector on GitHub that extends the VectorH capabilities by integrating with the Spark ecosystem. The VectorH team is excited about making this feature available because it enables a variety of new use cases. There is a blog coming about this soon, so keep an eye out.

About Vishal Bagga

Vishal Bagga is a Product Manager at Actian and focuses on Actian’s Big Data products. He is passionate about all things Big Data, especially Distributed databases, Cloud computing, Hadoop and Big Data architectures. Before Actian, Vishal led the product development efforts at Versant for Versant Object Database where the team delivered key product enhancements, including a redesigned database kernel to exploit multi-core parallelism. Vishal is on Twitter as @vishalbagga.