Hadoop Analytics with Actian VectorH

Turbocharge Apache Spark and SQL Performance on Hadoop



Designed from the ground up for speed, Actian VectorH turns Hadoop and Spark into a high performance analytics platform at a compelling price point.  VectorH has been proven up to 900x faster than Hive, Impala, Spark SQL, and HAWQ using an independent industry standard benchmark.



With native Spark access to Hadoop file formats, Actian VectorH enables broad access to leverage existing analytics investments and provides extensibility through open source technologies to additional functionality like machine learning, graph analytics, and streaming data.


Enterprise Grade

Actian VectorH delivers a unique combination of cutting edge innovation and mature database features that are proven in the enterprise. VectorH supports the latest ANSI SQL standards, is fully ACID compliant, and provides native DBMS security making Hadoop analytics secure, consumable, accessible and re-usable.

What Is VectorH?

Actian VectorH is a high-performance columnar SQL database that runs natively in Hadoop, exploiting vectorized query execution and multi-level in-memory data management to optimize analytic workloads. VectorH can power modern decision support systems and BI by enabling developers, data scientists and business analysts to query HDFS data for machine learning, advanced analytics, statistics and more.

The latest release features native file format integration through Apache Spark, providing direct access to Hadoop data file formats like Parquet and ORC, and supporting DataFrames for Spark SQL and Spark R applications.

State of the Art Innovation

Engineered for maximum performance with optimizations at the chip level and across cluster nodes for record-breaking analytical processing of the data that matters most to your applications and business.

Multi-core parallelism & MPP


Maximize utilization of chip cores across all cluster nodes

Vector Processing


Single Instruction Multiple Data

Exploiting CPU Chip Cache


Process data on CPU, not in RAM

2nd Gen Column Store


Positional Delta Trees enable online updates, minimize disk I/O, and are less CPU intensive

Vectorized Compression


10x Compression and column storage to reduce Hadoop storage

High Performance ETL and Data Quality

DF Flow

Actian DataFlow provides the fastest Hadoop ETL, DQ and Analytics for ActianVector

Record-Breaking Speed

Imagine getting query results in seconds not hours.  VectorH delivers!


The results above compare the sums of query execution times for all 22 TPC-H queries using popular SQL on Hadoop solutions.  The TPC Benchmark™H (TPC-H) has a suite of business oriented ad-hoc queries and concurrent data modifications representative of a decision support workload. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.


“As an innovative provider of leading network monitoring solutions to manage global transportation and mobile systems, Expandium pushes the edge of Big Data technologies. With explosive growth in mobile data, we’ve developed our new network intelligence platform on Actian VectorH to perform near real-time data ingestion in a production environment. We’re excited about employing Actian’s new native Spark integration to stream data to machine learning solutions to sustain our technical leadership. ”

– Rodolphe Guillard, Software Team Leader, Expandium

VectorH Features


Spark Powered Direct Query Access

Directly access Hadoop data files stored in Parquet, ORC, or other standard formats

Realize performance benefit without converting to Vector file format first


Native Spark DataFrame Support

Direct connection to Spark functionality via DataFrames

VectorH can accelerate query performance for Spark SQL and Spark R applications


Scale-out Hadoop Performance

Linear scalability from small to large Hadoop clusters

Supported on popular Hadoop distributions from Hortonworks, Cloudera, and Apache


Zero-Penalty Real-Time Data Updates

Enables full create/read/update/delete capabilities on Hadoop

Tracks changes in memory and avoids any performance penalty for updates


Extensive SQL Support

Standard ANSI SQL enabling the use of existing SQL without rewrite

Advanced analytics, including cubing, grouping, and window functions

Performance Optimized

Mature Query Optimizer

Mature and proven cost-based query planner

Optimal use of all available resources, including node, memory, cache, and CPU


MPP Architecture

Leverages Hadoop to handle thousands of users, nodes, and petabytes of data

Exploits redundancy in HDFS to provide system-wide data protection



Compress the data by at least a factor of 10 to reduce the amount of Hadoop storage

Store the data in a columnar format for faster access



YARN for automated Hadoop cluster resource management

Web-based management console for monitoring analytic/query processing



Role-based security

Authentication through LDAP or Active Directory

Use Cases

VectorH enables developers, data scientists and business analysts to extract actionable insights from large and varied data sets stored in Hadoop. Data engineers can identify trends, correlations, and other patterns in seconds from weblogs, click-paths, demographic, psychographic, geographic, mobile and other kinds of data that is stored in Hadoop.

Granular, multi-channel, near real-time customer profile analytics can tell you about your customers, the best means to connect, the targeted offers that will resonate, their predilection to churn, and the best ways to personalize the entire customer experience to win more business and drive up loyalty levels.

To gain a more complete and accurate profile, mine all avenues of information, in any format, from any location or channel, structured or unstructured, from an endlessly growing number of sources, such as sales transactions, Web, social, mobile, purchase history, service history, and much more.

Examine the totality of your company’s customer data to identify, attract, and retain the most profitable ones. Once you have embraced data from all channels and sources, employ advanced data modeling to de-duplicate, identify common characteristics, and create customer clusters that provides a comprehensive, singular portrait of your customer’s purchasing habits.


Most companies doing segmentation use basic account information and demographics to find groups of customers based on high-level account and behavior metrics. Historical information often is so voluminous, companies are forced to work with samples. With Actian, you can connect to and mine all of your data, including big data sources, to get a detailed, holistic view of the customer.

Uncovering relationships between customers and key purchase drivers and predicting the value of each customer along thousands of customer attributes, you can uncover new segments that your competition isn’t thinking about yet, increasing conversions and gaining higher returns on your marketing investment.

Create a better customer experience with targeted offers, appropriate responses, and effective dialogue. True customer engagement that is built on a deep understanding of specific needs and wants leads to more satisfied customers and longer lasting relationships, increasing revenue and wallet share for your business.

Measure and maximize current and forecasted customer value across a number of products, segments, and time periods to design new programs that accentuate your best customers and provide you with a distinct business advantage.

Connect to all of your data, from account histories and demographics to mobile and social media interactions, and blend these disparate sources with speed and accuracy. Uncover key purchase drivers to understand why someone purchases or rejects your products. Assign customer value scores by correlating which characteristics and behaviors lead to value at various points of time in the future.

Generally, it is more cost effective to sell to existing customers than it is to accumulate new ones. Optimize outbound marketing to give prominence to your high-value customers. Customize inbound customer touch centers by arming call centers with highly personalized customer scores. Increase customer lifetime values cost effectively with individual precision, improving both loyalty and profitability.


Maximize long-term customer value by not only predicting what a customer will do next, but influencing that action as well.

If you want specifics about customer behavior and spend, you need all data available to you, structured or unstructured, from traditional enterprise sources, social networks, customer service interactions, Web click streams, and any other touch points that may occur. Actian allows you to connect to all of your data to build complete customer profiles, regardless of format or location, which feed into your data science models.

Use micro-segmentation models to find and classify small clusters of similar customers. Customer value models predict the value of each customer to the business at various intervals. Combining the output of these two models into a personalized recommendation engine gives you the information you need to take action that gives you a distinct competitive advantage. You can optimize your supply chain, customize campaigns with confidence, and ultimately drive meaningful, personalized engagements.

Stand out in a crowded market and capture more wallet share using Actian to deploy effective, innovative, highly personalized campaigns through deep analysis.

Traditional campaign optimization models use limited samples of transactional data, which can lead to incomplete customer views. Actian allows you to connect to social media and competitor web sites in real time to learn which competitive offerings are gaining traction in the marketplace. Web purchasing patterns and call center text logs stored on Hadoop provide valuable insight into customer interactions. Marketing and campaign data ensure any recommended actions comply with company goals, rules, and regulations.

Actian helps you build, test, and deploy campaigns with rapid succession. Quickly learn when to make adjustments and adapt to changes in the market or your customer base. With your customer scores and optimized lists in hand, you can design innovative campaigns that allow you to create and sustain a competitive advantage.


Churn prediction models have been limited to account information and transactional history, a tiny fraction of available data. With Actian, increase the accuracy of churn predictions by combining and analyzing traditional transactional and account datasets with call center text logs, past marketing and campaign response data, competitive offers, social media, and a host of other data sources.

Discover customer classifications and assign customer lifetime and churn scores to understand which customers you can’t afford to lose. Generate raw churn predictions informed by individual customer profitability. Use the insights gained to transform the customer experience without spending more than necessary.

With Actian, share personalized recommendations with customer representatives, outbound marketing campaigns, and product and service supply planners. Create programs to retain your high-value customers and offload less profitable segments. As a result, you can boost average revenue per customer, improve customer satisfaction and loyalty, optimize supply chains, accurately price products, and plan for new releases.

Uncover your most profitable product groupings, learn which products benefit most from associations with other products, know optimal shelf arrangements, and better target marketing and promotions—all to increase retail revenue.

Market basket analysis models are typically limited to a small sample of historical receipt data, aggregated to a level where potential impact and insights are lost. With Actian, bring in additional sources, in varying formats, enabling discovery of critical patterns, at any product level, to create a competitive advantage.

Actian enables data science models and advanced analytics to go deeper into detailed associations on all product relationships, and segment customers and spending habits into similar groups to learn more about shoppers.

With detailed shopper segmentation and market basket analysis results, create a shelf optimization plan that identifies your highest performing product groups. Discovering buying patterns can lead to targeted promotions with greater impact on the customer experience to increase business value.



Global Bank Risk Group
& Actian

Actian Plaform applied at a Global Bank Risk Group Replatforming ~30 risk application off of Oracle to Actian.

The Goals

The Results

LOADING 2 billion risk data points in 6 hours (~100k/sec) 1 hour 40 min (333k/sec)
FILTERED AGGREGATION 30 seconds 6 sec on 5 node cluster; 2 sec on 10 node cluster
FULL DAY AGGREGATION Hierarchy dimension on 1 million data points in < 15 sec Sub second response time
LARGE DATA VOLUMES Store 80 days (160 billion rows) of data Store 100 days (200 billion rows) with linear scaling
HORIZONTAL SCALABILITY Up to 10 billion rows per day Text book scalability as nodes added to cluster
DRILL UP/DRILL DOWN < 2 sec < 1 sec

“Having the ability to run analytics queries that take advantage of research without analysts needing to be retrained as programmers should enable Actian Analytics Platform users to focus on the approach that delivers the most efficient use of their existing tools and skills.”

– Matt Aslett, 451 Research

Get the Actian Vector in Hadoop technical overview.

Learn More

Accelerate Analytic Workflows with Actian DataFlow

Actian DataFlow eliminates performance bottlenecks in your data-intensive applications by delivering a comprehensive set of ETL and data quality capabilities for end-to-end data access, transformation, preparation, and predictive analysis.  It complements Actian Vector with a graphical interface to manage data workflows, orchestrating analytic functions and maximizing parallel work streams for faster execution.

  • Easy to Implement – No complex parallel processing issues; visual and API level interfaces
  • Scalable – Performance dynamically scales with increased core counts and increased nodes
  • High Throughput – Fast, deep analysis of large data sets with no limit on input data size
Chart of Actian DataFlow
  • Extensible technology – Customize the platform so you can remain in control of development
  • Cost Efficient – Maximum performance from commodity multi-core SMP servers and clusters
facebooklinkedinrsstwitterBlogAsset 1PRDatasheetDatasheetAsset 1DownloadForumGuideLinkWebinarPRPresentationRoad MapVideo