Hadoop Analytics with Actian VectorH Turbocharge Apache Spark and SQL Performance on Hadoop Get Data Sheet FastDesigned from the ground up for speed, Actian VectorH turns Hadoop and Spark into a high performance analytics platform at a compelling price point. VectorH has been proven up to 900x faster than Hive, Impala, Spark SQL, and HAWQ using an independent industry standard benchmark. OpenWith native Spark access to Hadoop file formats, Actian VectorH enables broad access to leverage existing analytics investments and provides extensibility through open source technologies to additional functionality like machine learning, graph analytics, and streaming data. Enterprise GradeActian VectorH delivers a unique combination of cutting edge innovation and mature database features that are proven in the enterprise. VectorH supports the latest ANSI SQL standards, is fully ACID compliant, and provides native DBMS security making Hadoop analytics secure, consumable, accessible and re-usable. OverviewFeaturesUse CasesCustomersSupportGet StartedWhat Is VectorH? Actian VectorH is a high-performance columnar SQL database that runs natively in Hadoop, exploiting vectorized query execution and multi-level in-memory data management to optimize analytic workloads. VectorH can power modern decision support systems and BI by enabling developers, data scientists and business analysts to query HDFS data for machine learning, advanced analytics, statistics and more. The latest release features native file format integration through Apache Spark, providing direct access to Hadoop data file formats like Parquet and ORC, and supporting DataFrames for Spark SQL and Spark R applications. Explore White Paper / Report SQL in Hadoop Buyer’s Guide White Paper / Report Actian Vector in Hadoop: A Technical Overview Data Sheet Actian Vector in Hadoop QuickStart Data Sheet White Paper / Report Performance Troubleshooting Tips for Actian VectorH Data Sheet Actian DataFlow Data Sheet State of the Art Innovation Engineered for maximum performance with optimizations at the chip level and across cluster nodes for record-breaking analytical processing of the data that matters most to your applications and business. Multi-core parallelism & MPP Maximize utilization of chip cores across all cluster nodes Vector Processing Single Instruction Multiple Data Exploiting CPU Chip Cache Process data on CPU, not in RAM 2nd Gen Column Store Positional Delta Trees enable online updates, minimize disk I/O, and are less CPU intensive Vectorized Compression 10x Compression and column storage to reduce Hadoop storage High Performance ETL and Data Quality Actian DataFlow provides the fastest Hadoop ETL, DQ and Analytics for ActianVector Record-Breaking Speed Imagine getting query results in seconds not hours. VectorH delivers! The results above compare the sums of query execution times for all 22 TPC-H queries using popular SQL on Hadoop solutions. The TPC Benchmark™H (TPC-H) has a suite of business oriented ad-hoc queries and concurrent data modifications representative of a decision support workload. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. “As an innovative provider of leading network monitoring solutions to manage global transportation and mobile systems, Expandium pushes the edge of Big Data technologies. With explosive growth in mobile data, we’ve developed our new network intelligence platform on Actian VectorH to perform near real-time data ingestion in a production environment. We’re excited about employing Actian’s new native Spark integration to stream data to machine learning solutions to sustain our technical leadership. ” – Rodolphe Guillard, Software Team Leader, Expandium VectorH Features Spark Powered Direct Query Access Directly access Hadoop data files stored in Parquet, ORC, or other standard formats Realize performance benefit without converting to Vector file format first Native Spark DataFrame Support Direct connection to Spark functionality via DataFrames VectorH can accelerate query performance for Spark SQL and Spark R applications Scale-out Hadoop Performance Linear scalability from small to large Hadoop clusters Supported on popular Hadoop distributions from Hortonworks, Cloudera, and Apache Zero-Penalty Real-Time Data Updates Enables full create/read/update/delete capabilities on Hadoop Tracks changes in memory and avoids any performance penalty for updates Extensive SQL Support Standard ANSI SQL enabling the use of existing SQL without rewrite Advanced analytics, including cubing, grouping, and window functions Mature Query Optimizer Mature and proven cost-based query planner Optimal use of all available resources, including node, memory, cache, and CPU MPP Architecture Leverages Hadoop to handle thousands of users, nodes, and petabytes of data Exploits redundancy in HDFS to provide system-wide data protection Compression Compress the data by at least a factor of 10 to reduce the amount of Hadoop storage Store the data in a columnar format for faster access Manageability YARN for automated Hadoop cluster resource management Web-based management console for monitoring analytic/query processing Security Role-based security Authentication through LDAP or Active Directory Use Cases VectorH enables developers, data scientists and business analysts to extract actionable insights from large and varied data sets stored in Hadoop. Data engineers can identify trends, correlations, and other patterns in seconds from weblogs, click-paths, demographic, psychographic, geographic, mobile and other kinds of data that is stored in Hadoop. Customer ProfileMicro- SegmentationCustomer Lifetime ValueNext Best ActionCampaign OptimizationChurn AnalysisMarket Basket AnalysisCustomer ProfileGranular, multi-channel, near real-time customer profile analytics can tell you about your customers, the best means to connect, the targeted offers that will resonate, their predilection to churn, and the best ways to personalize the entire customer experience to win more business and drive up loyalty levels. To gain a more complete and accurate profile, mine all avenues of information, in any format, from any location or channel, structured or unstructured, from an endlessly growing number of sources, such as sales transactions, Web, social, mobile, purchase history, service history, and much more. Examine the totality of your company’s customer data to identify, attract, and retain the most profitable ones. Once you have embraced data from all channels and sources, employ advanced data modeling to de-duplicate, identify common characteristics, and create customer clusters that provides a comprehensive, singular portrait of your customer’s purchasing habits. Micro- SegmentationMost companies doing segmentation use basic account information and demographics to find groups of customers based on high-level account and behavior metrics. Historical information often is so voluminous, companies are forced to work with samples. With Actian, you can connect to and mine all of your data, including big data sources, to get a detailed, holistic view of the customer. Uncovering relationships between customers and key purchase drivers and predicting the value of each customer along thousands of customer attributes, you can uncover new segments that your competition isn’t thinking about yet, increasing conversions and gaining higher returns on your marketing investment. Create a better customer experience with targeted offers, appropriate responses, and effective dialogue. True customer engagement that is built on a deep understanding of specific needs and wants leads to more satisfied customers and longer lasting relationships, increasing revenue and wallet share for your business. Customer Lifetime ValueMeasure and maximize current and forecasted customer value across a number of products, segments, and time periods to design new programs that accentuate your best customers and provide you with a distinct business advantage. Connect to all of your data, from account histories and demographics to mobile and social media interactions, and blend these disparate sources with speed and accuracy. Uncover key purchase drivers to understand why someone purchases or rejects your products. Assign customer value scores by correlating which characteristics and behaviors lead to value at various points of time in the future. Generally, it is more cost effective to sell to existing customers than it is to accumulate new ones. Optimize outbound marketing to give prominence to your high-value customers. Customize inbound customer touch centers by arming call centers with highly personalized customer scores. Increase customer lifetime values cost effectively with individual precision, improving both loyalty and profitability. Next Best ActionMaximize long-term customer value by not only predicting what a customer will do next, but influencing that action as well. If you want specifics about customer behavior and spend, you need all data available to you, structured or unstructured, from traditional enterprise sources, social networks, customer service interactions, Web click streams, and any other touch points that may occur. Actian allows you to connect to all of your data to build complete customer profiles, regardless of format or location, which feed into your data science models. Use micro-segmentation models to find and classify small clusters of similar customers. Customer value models predict the value of each customer to the business at various intervals. Combining the output of these two models into a personalized recommendation engine gives you the information you need to take action that gives you a distinct competitive advantage. You can optimize your supply chain, customize campaigns with confidence, and ultimately drive meaningful, personalized engagements. Campaign OptimizationStand out in a crowded market and capture more wallet share using Actian to deploy effective, innovative, highly personalized campaigns through deep analysis. Traditional campaign optimization models use limited samples of transactional data, which can lead to incomplete customer views. Actian allows you to connect to social media and competitor web sites in real time to learn which competitive offerings are gaining traction in the marketplace. Web purchasing patterns and call center text logs stored on Hadoop provide valuable insight into customer interactions. Marketing and campaign data ensure any recommended actions comply with company goals, rules, and regulations. Actian helps you build, test, and deploy campaigns with rapid succession. Quickly learn when to make adjustments and adapt to changes in the market or your customer base. With your customer scores and optimized lists in hand, you can design innovative campaigns that allow you to create and sustain a competitive advantage. Churn AnalysisChurn prediction models have been limited to account information and transactional history, a tiny fraction of available data. With Actian, increase the accuracy of churn predictions by combining and analyzing traditional transactional and account datasets with call center text logs, past marketing and campaign response data, competitive offers, social media, and a host of other data sources. Discover customer classifications and assign customer lifetime and churn scores to understand which customers you can’t afford to lose. Generate raw churn predictions informed by individual customer profitability. Use the insights gained to transform the customer experience without spending more than necessary. With Actian, share personalized recommendations with customer representatives, outbound marketing campaigns, and product and service supply planners. Create programs to retain your high-value customers and offload less profitable segments. As a result, you can boost average revenue per customer, improve customer satisfaction and loyalty, optimize supply chains, accurately price products, and plan for new releases. Market Basket AnalysisUncover your most profitable product groupings, learn which products benefit most from associations with other products, know optimal shelf arrangements, and better target marketing and promotions—all to increase retail revenue. Market basket analysis models are typically limited to a small sample of historical receipt data, aggregated to a level where potential impact and insights are lost. With Actian, bring in additional sources, in varying formats, enabling discovery of critical patterns, at any product level, to create a competitive advantage. Actian enables data science models and advanced analytics to go deeper into detailed associations on all product relationships, and segment customers and spending habits into similar groups to learn more about shoppers. With detailed shopper segmentation and market basket analysis results, create a shelf optimization plan that identifies your highest performing product groups. Discovering buying patterns can lead to targeted promotions with greater impact on the customer experience to increase business value. CUSTOMER SUCCESS STORY Global Bank Risk Group & Actian Actian Plaform applied at a Global Bank Risk Group Replatforming ~30 risk application off of Oracle to Actian. The Goals The Results LOADING 2 billion risk data points in 6 hours (~100k/sec) 1 hour 40 min (333k/sec) FILTERED AGGREGATION 30 seconds 6 sec on 5 node cluster; 2 sec on 10 node cluster FULL DAY AGGREGATION Hierarchy dimension on 1 million data points in < 15 sec Sub second response time LARGE DATA VOLUMES Store 80 days (160 billion rows) of data Store 100 days (200 billion rows) with linear scaling HORIZONTAL SCALABILITY Up to 10 billion rows per day Text book scalability as nodes added to cluster DRILL UP/DRILL DOWN < 2 sec < 1 sec “Having the ability to run analytics queries that take advantage of research without analysts needing to be retrained as programmers should enable Actian Analytics Platform users to focus on the approach that delivers the most efficient use of their existing tools and skills.” – Matt Aslett, 451 Research Get the Actian Vector in Hadoop technical overview. Learn More Accelerate Analytic Workflows with Actian DataFlow Actian DataFlow eliminates performance bottlenecks in your data-intensive applications by delivering a comprehensive set of ETL and data quality capabilities for end-to-end data access, transformation, preparation, and predictive analysis. It complements Actian Vector with a graphical interface to manage data workflows, orchestrating analytic functions and maximizing parallel work streams for faster execution. Easy to Implement – No complex parallel processing issues; visual and API level interfaces Scalable – Performance dynamically scales with increased core counts and increased nodes High Throughput – Fast, deep analysis of large data sets with no limit on input data size Extensible technology – Customize the platform so you can remain in control of development Cost Efficient – Maximum performance from commodity multi-core SMP servers and clusters Watch Overview Video Download Data Sheet