Sparking a Revolution at Big Data London

Actian empowers enterprises to confidently manage and govern data at scale. Organizations trust Actian data management and data intelligence solutions to streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data and AI division of HCLSoftware, at actian.com.

Sparking a Revolution at Big Data London

#Big Data #DataConnect #Hybrid Data #Vector #Zen

The essential guide to Actian Hybrid Data Conference 2017 in London, UK

#Big Data #DataConnect #Hybrid Data #Vector #Zen

If you’re visiting what’s billed as the United Kingdom’s largest data and analytics events, you know that Big Data London at the Olympia promises to be a massive gathering of experts and analysts across the fields of Big Data, machine learning, AI, cloud technologies, and more. You’ll be able to learn from numerous pioneers and visionaries of the data community as well as get a unique look at the current state of the local data economy.

Over five thousand attendees are expected to visit this unique two-day event, which is open to everyone, starting on November 15. Big Data London will feature 80 exhibitors, over 100 speakers and use cases, and five theaters with live demos. A keynote presentation will mark the start of each day of the conference, with Neha Narkhede, co-founder and CTO of Confluent, speaking on the rise of the streaming platform and building large-scale operable data systems on November 15, and Amr Awadallah, CTO of Cloudera, discussing machine learning, AI and data analytics on November 16.

Be sure to visit Actian at booth #426 (located near the center of West Hall Level 1 next to the AI Lab Feature) and meet members of our technical and sales team members who can answer all of the questions you may have. In addition, Actian technologists will be giving two presentations during the conference:

November 15 at 11:10 AM – 11:40 AM in the Fast Data Theater – Mary Schulte, Senior Systems Engineer, will discuss scale-up and scale-out Big Data deployment options for common Enterprise use cases, with simple tried-and-true solutions.
November 16 at 3:10 PM – 3:40 PM in the Fast Data Theater – Keith Bolam, Engineering Solutions Manager, demonstrates how to quickly analyze a Billion rows, and covers the 5W’s and H of Interpreting Fast and Fresh Data.

Note that while attendance is free, you’ll still need to register online to get in.

If you’re new to Actian products, here are some of the products in our portfolio we’ll be featuring at Big Data London:

Actian NoSQL accelerates agile development for complex object models at enterprise scale.
Actian Zen Embedded Database enables zero-admin, nano-footprint, hybrid NoSQL & SQL data management.
Actian Vector in-memory analytics database is a consistent performance leader on the TPC-H Decision Support Benchmark over the last 5 years.
Actian DataConnect provides lightweight, enterprise-class hybrid data integration.

We hope you have a fantastic time at the conference and we look forward to meeting all of you in person to learn more about Actian’s products, community and customers.

Follow us on Twitter, and on LinkedIn to stay connected with what we are up to. If you fancy a job to pursue your passion in data management, data integration, and data analytics, check out our careers page and come join our team – WE’RE HIRING!

About Author

The X-Vector

#Vector

#Vector

As an employee new to Actian I decided to dig into what makes Actian Vector a star performer. Three specific qualities, covered below, caught my eye as I reviewed the technical overview.

Vectorization: When I first heard this term, my memory went back to 30 years ago, when IBM offered my employer, Watson Calculating Services Limited, a free trial of the Vector Facility for our ES9000 mainframe. It was cool because we saw a massive improvement in our FORTRAN applications without having to rewrite them. A simple compiler directive was all that it took to take advantage of vectorization. So, how does this extend to what Vector does? Actian has applied the techniques developed from the acceleration of floating-point operations and high-performance computing using specialized hardware to accelerate database workloads. The result is 100x performance improvements without specialized hardware. Actian provides these performance improvements on industry-standard Intel x86 architecture server processors, transparently, without having to rewrite standard SQL queries.

blue-x-actian-blog

Hybrid Column Store: Relational databases store data that is optimized for row-at-a-time access. However, for fast analytics on a subset of columns, storing data in a compressed columnar format is the way to go because analytics workloads in traditional data warehouses tend to use de-normalized tables to optimize read performance, but rarely analyze whole rows. Vector goes a step further, by optimizing the in-memory block format to minimize cache misses. This boosts memory access speeds to maximize performance.

Positional Delta Trees: Allowing incremental changes, while maintaining transactional read consistency is a tough challenge for columnar databases. Actian Vector maintains full multi-version read consistency, so every new transaction will see all previously committed transactions, so you don’t have to rely on large bulk data loads alone for updates. Actian Vector’s Positional Delta Trees (PDTs) store small incremental changes, as well as updates and deletes, so queries run lightning fast and any calculations add up despite changes that occur while the query executes.

In my judgment, these are some of the many qualities that make Vector stand out from the crowd.

Want to know more? Then, visit us at the Actian’s Hybrid Data Conference at London’s Amba hotel on the 9th of November to discuss in person with Actian engineering, executives, customers. Check out the full agenda and register. It’s free to attend.

About Author

Walt Maguire to Lead Actian Pre-Sales Engineering

#Company Culture

Walt Maguire to Lead Actian Pre-Sales Engineering

#Company Culture

Walt Maguire has been a voracious consumer of science fiction novels from an early age, and now he loves getting hands-on experience with the technology of the future. He started his technology career taking apart clock radios to explore how they worked and disassembling the family washing machine. Needless to say, his parents had mixed feelings about his curiosity, which turned to happiness once computers came along. Walt got to play with bits in a CPU, and mom & dad got to keep their washer functional. In the years since then, Walt has spent his career doing nearly everything that can be done with data.

Today he leads the global pre-sales team for Actian and plans to adopt a very hands-on approach to demonstrating Actian performance and features to show how they can solve data challenges in today’s business environment. In his spare time, he channels his creative needs turning trees into furniture, and loves to travel. Please welcome Walt to Actian!

About Author

The Essential Guide to Actian Hybrid Data Conference 2017 in London, UK

#Hybrid Data #Hybrid Data Conference

#Hybrid Data #Hybrid Data Conference

You are cordially invited to Actian’s Hybrid Data Conference in central London, UK on November 9^th, 2017. Unlike many tech conferences, this one is totally free, and is open to Actian Customers, Partners, and the broader Tech Community alike. Bring a friend, or bring a few friends, this is a great technology community networking event and you won’t want to miss out.

The keynote speaker is Matt Aslett, from 451 Research. Matt will be discussing the Inevitability of Hybrid Data Management. “From cars to the cloud, the concept of hybrid is everywhere and is rapidly growing in momentum. What appears to be a choice today will soon become an inevitability. The same is true of hybrid data management, as the need for greater business agility mandates that enterprises rethink their traditional approaches to data management.”

There is also a fantastic lineup of Actian, Actian Customer, and Actian Partner technical speakers with talks covering the advances, breakthroughs, and applications of data analytics, data management, data integration, Graph, IOT Embedded systems, and more. This is a conference hosted by Data Professionals, for Data Professionals.

When you register, be sure to mention the name(s) of your friend that you are bringing along, and you’ll each receive a limited edition Actian T-shirt, and will also be granted 10 entries into our door prize giveaways. Oops, we forgot to mention the door prize giveaways! Actian will be giving away some great prizes at the Hybrid Data Conference. Check in for your chance to win an iPhone X, Amazon Dot, or a Microsoft Xbox. See the main event site for full terms and conditions for the door prizes.

This year’s event is being held at the gorgeous Amba Hotel Charing Cross, just steps away from Trafalgar Square. If you are spending the night at the hotel, be sure to use reference ACTI081117 to snag some savings with our group discounted rate. You can call +44 800 330 8397 to reserve your room(s), or email bookcc@amba-hotel.com for more information.

Remember, this is London, and it is quite possible that you could experience four seasons in one day. It’s just how things are on a big island in the Northern Hemisphere. Check the latest weather forecast, and pack accordingly. Conference rooms always tend to be on the cool side, so bring a comfy jumper in your briefcase, just in case the venue is a tad chilly.

Travel in and around London is easier than you think, and there is a fantastic transport system. Rather than driving in, your best bet is to catch a river bus, bus, or train, and connect via the London Underground tube system to get to Charing Cross station or catch a taxi if you are not too far. The Transport for London website has all the latest transport information you will need to plan your journey.

If this is your first visit to London, you may want to get acquainted with some of the sights and things to do, in and around London, by checking out the Visit London official visitor guide.

If you’re new to Actian products, here are a few of the portfolio highlights that will be covered at the event:

Actian NoSQL accelerates Agile development for complex object models at Enterprise scale.
Actian Zen Embedded Database enables zero-admin, nano-footprint, hybrid NoSQL & SQL data management.
Actian Vector in-memory analytics database is a consistent performance leader on the TPC-H Decision Support Benchmark over the last 5 years.
Actian DataConnect provides lightweight, enterprise class hybrid data integration.

We look forward to meeting you in person, and hope you’ll join us for the day to learn more about Actian’s products, community, and customers.

About Author

Delivering Real-Time Reporting at Speed and Scale

#Analytics #Ingres #Vector

Delivering Real-Time Reporting at Speed and Scale

Emma McGrattan is CTO at Actian, leading global R&D in high-performance analytics, data management, and integration. With over two decades at Actian, Emma holds multiple patents in data technologies and has been instrumental in driving innovation for mission-critical applications. She is a recognized authority, frequently speaking at industry conferences like Strata Data, and she's published technical papers on modern analytics. In her Actian blog posts, Emma tackles performance optimization, hybrid cloud architectures, and advanced analytics strategies. Explore her top articles to unlock data-driven success.

#Analytics #Ingres #Vector

When a major UK logistics company wanted to improve reporting for its large accounts, they turned to Actian to design, implement and support the underlying database system (“LARS”) using Ingres, HVR and Vector products for its architecture.

The Brief

The customer had around 100 customer accounts representatives dedicated to large accounts, with each rep manually producing their own set of spreadsheet-based daily, twice-daily and ad hoc reports for emailing to their account contacts, based on a range of daily extracts from an Ingres operational-level database.

The customer wanted to standardize the format of the reports and to automate their production in order to save reps’ time, to deliver reports to their accounts in a consistent and timely manner, and ultimately to make it feasible to outsource the function.

The challenge was not just to provide the capability of producing the volume of scheduled complex analytical reports (over 1000 per day, tightly clustered around critical times in mid-morning and mid-afternoon) and simultaneously supporting ad hoc complex report production for 200 users with a response time of seconds, but also a) to do this without significant overhead on the source operational-level database and b) reduce the need for the range of existing Extracts from the operational-level database. An additional requirement was that it should be possible to ‘switch’ other existing applications from the operational-level database to this new database at a future stage, thus mandating the database design to be as similar as possible to the existing operational-level database.

Because of delays to the start of the project (due to changes within the customer’s organization), there was considerable pressure to deliver the project in as short a timescale as possible.

The Architecture

To provide the user-visible front-end analytical and reporting facility a semi-customized package from a partner organization was chosen, based on the Logi Analytics product.

The database schema design was constrained by the source database schema design, which resulted in the need to provide a range of database views involving joins over 12 tables, with some of the tables having over 300 million rows. In order to provide interactive users with realistic response times whilst also servicing the needs of scheduled Reports, Vector was chosen as the ideal DBMS for this database, due to its very high speed of processing complex retrieval queries and its ability to mirror the Ingres source database structure virtually unchanged.

Since the source Ingres database and the target Vector database had essentially similar schemas, HVR (High Volume Replicator) was chosen as the software solution to keeping the Vector database in-line with the source Ingres database. The HVR Capture process reads the Ingres source database transaction log, passes insert and update operations via the HVR Hub to the target machine where the HVR Integrate process reflects the inserts and updates as ‘upserts’ into the Vector database (‘deletes’ were suppressed within HVR, to avoid the regular purges of the source database also resulting in purges of the target database), placing very little load on the source database machine.

The Implementation

Ingres source database runs on an older HP-UX platform, so HVR was installed on a dedicated Linux server to act as its Hub. The Vector database sits on a separate dedicated Linux server. An HVR ‘capture’ component runs on the Ingres machine, captures the source database changes from the transaction log and sends them via the HVR Hub to the HVR ‘integrate’ component running on the Vector server which applies the same changes (via ‘upserts’) to the target Vector database.

To meet the customer’s need for reduced development timescales the project was delivered ready for user acceptance testing in 3 months from the start of development, thanks to Vector’s ability to mirror an Ingres schema with little change.

In order to reduce the number of table joins in the views from 12 down to a more manageable 9, a regularly-scheduled job (running every 10 minutes) was created to maintain a de-normalized table.

The denormalization update job, HVR’s ‘upsert’ job, the large number of scheduled reports, and the interactive users happily co-exist on the Vector server.

Vector Performance

It is often fairly meaningless to quote retrieval response times from a system since there are so many variables involved, but we can provide a flavour of the retrieval performance of the Vector database compared with its Ingres source database. A member of the customer’s IT staff needed to run an unreasonably heavy ad-hoc SQL query against the Ingres source database which ran for 10 minutes before she killed it as untenable. We ran the same SQL against the live Vector database, during ‘prime-time’ activity – it completed in 0.05 seconds. Although this is not a direct comparison since the two databases were running on different platforms and hardware configurations, it does illustrate the dramatic retrieval speed of which Vector is capable.

In fact the performance of Vector was so impressive it changed the specified requirements from the client facing team. The envisioned work practice was to allow up to ~200 complex reports to run between 10AM and 10:30 but Vector was so fast and comfortable at scale that these reports are now all run within 5 minutes of 10AM and that was only limited by the resources (cores, memory, etc.) on the machine.

Customer Satisfaction

The customer was sufficiently impressed with the novel architecture of the LARS implementation that they commissioned a second more challenging Vector-based project to be fed from a continuous message stream. This will be the subject of a future blog entry.

About Author

About Emma McGrattan

Vector in Hadoop 5.0 – New Features You Should Care About

#Analytics #Hadoop Analytics #VectorH

#Analytics #Hadoop Analytics #VectorH

Today we announce the introduction of the next release of Actian Vector in Hadoop, extending our support of Apache Spark to include direct access to native Hadoop file formats and tighter integration with Spark SQL and Spark R applications. In this release, we also incorporate performance improvements, integration with Hadoop security frameworks, and administrative enhancements. I’ll cover each of these in greater detail below.

Combine Native Hadoop Tables With Vector Tables

In previous releases, Vector in Hadoop required data to be stored in a proprietary format which optimized analytics performance and delivered great compression to reduce access latency. Vector in Hadoop 5.0 provides the ability to register Hadoop data files (such as Parquet, ORC, and CSV files) as tables in VectorH and to join these external tables with native Vector tables. Vector in Hadoop will provide the fastest analytics execution against data in these formats, even faster than their native query engines. However, query execution will never be as fast with external tables as with native Vector data. If performance matters we suggest that you load that data into Vector in Hadoop using our high-speed loader.

This feature enables customers who have standardized on a particular file format and who want to avoid copying data into a proprietary format to still get the performance acceleration VectorH offers. The details of the storage benchmark that we conducted as part of our SIGMOD paper showed the Vector file format to be more efficient from a query performance/data read and data compression perspective. See our blog post from July 2016 which further explains that benchmark.

True Enterprise Hadoop Security Integration

A Forrester survey last year indicated that data security is the number one concern with Hadoop deployments. Vector in Hadoop provides the enterprise-grade security natively that one expects in a mature EDW platform, i.e., discretionary access control (control over who can read, write, and update what data in the database), column-level data at rest encryption, data in motion encryption, security auditing with SQL addressable audit logs, and security alarms. For the rest of the Hadoop ecosystem, these concerns have driven the development of Hadoop Security Frameworks, through projects like Apache Knox and Apache Ranger. As we see these frameworks starting to appear on customer RFIs, we’re provided documentation on how to configure VectorH for integration with Apache Knox and Apache Ranger.

Significant Performance Enhancements

The performance enhancements which resulted in Vector 5.0 claiming top performance in the TPC-H 3000GB benchmark for non-clustered systems are now available in Vector in Hadoop 5.0, where we typically see linear or better than linear scalability.

Automatic Histogram Generation

Database query execution plans are heavily reliant on knowledge of the underlying data; without data statistics it has to make assumptions about data distribution e.g. it will assume that all zip codes have the same number of residents; or that customer last names are as likely to begin with an X as with an M. VectorH 5.0 includes an implementation of automatic statistic/histogram generation for Vector tables. It results in histograms being automatically created and cached in memory when a query contains a reference to a column in a WHERE, HAVING or ON clause with no explicitly created (by optimizedb or CREATE STATISTICS) histogram.

Accelerate Startup and Shutdown With Distributed Write Ahead Log

In earlier Vector in Hadoop releases the write ahead log file, which holds details of updates in the system, was managed on the VectorH Leader Node. This memory resident log file consumed a lot of the Leader Node memory and became a bottle neck in startup, as the log file needed to be replayed during startup and that process could take several minutes. In VectorH 5.0 we have implemented a distributed Write Ahead Log (WAL) file, where each node has a local WAL. This alleviates pressure on memory, improves our startup times and as a side-effect it also results in much faster COMMIT processing.

Speed Up Queries With Distributed Indexes

In earlier releases, the VectorH Leader Node was responsible for maintaining the automatic min-max indexes for all partitions. As a reminder, the min-max index keeps track of the minimum and maximum value stored within a data block; this internal index allows us to quickly identify which are the blocks that will participate in solving a query and which ones don’t need to be read. This index is memory resident and is built on server startup. In VectorH 5.0 each node is responsible for maintaining its own portion of the index which alleviates pressure on memory on the leader node, improves our startup times by distributing the work and speed-ups DML queries.

Simplified Partition Management With Partition Specification

We found a number of VectorH customers encountered performance problems because they didn’t know to include the PARTITION clause when creating tables, especially when using CREATE TABLE AS SELECT (CTAS). So let’s say they had an existing table that was distributed across 15 partitions and they wanted to create a new table based on that original table, their assumption was that it too would have 15 partitions, but that’s not the way the SQL standard intended it, and in this case being true to the SQL standard hurt us. To alleviate this we have added a configuration parameter which can be set to require the use of either NOPARTITION or PARTITION= when creating a vector table explicitly or via CTAS.

Simplify Backup and Restore With Database Cloning

VectorH 5.0 introduces a new utility, clonedb, which enables users to make an exact copy of their database into a separate Vector instance e.g. take a copy of a production database into a development environment for testing purposes. This feature was requested by one of our existing customers but has been very well received across all Vector/VectorH accounts.

Faster Exports With Spark Connector Parallel Unload

The Vector Spark Connector can now be used to unload large data volumes in parallel across all nodes.

Simplified Loading With SQL Syntax for vwload

VectorH 5.0 includes the ability to utilize vwload with the SQL COPY statement for fast parallel data load from within SQL.

Simplified Creation of CSV Exports From SQL

VectorH 5.0 includes the ability to export data in CSV format from SQL using the following syntax:

INSERT INTO EXTERNAL CSV 'filename' SELECT ... [WITH NULL_MARKER='NULL', FIELD_SEPARATOR=',', RECORD_SEPARATOR='n']

Next Steps

To learn more, request a demo or a trial version of VectorH to try within your Hadoop cluster. You can also explore the single-server version of Actian Vector running on Linux, distributed free as a community edition, available for download.

About Author

About Emma McGrattan

The Essential Guide to Gartner Catalyst 2017 in San Diego, CA

#Gartner

#Gartner

If you’re headed to Gartner Catalyst 2017 in San Diego, CA August 21 – 24, or if this is your first time at a Gartner event, here’s your essential guide to get the most out of the upcoming conference. We hope you find it useful.

This is a conference for technical professionals. You’ll have plenty of opportunities to meet with your peers across all disciplines including CIOs, CTOs, solution architects, developers, database admins, data scientists, data engineers, business analysts and DevOps, amongst others.

Gartner has lined up a host of hot topics and session tracks, so be sure to check out the official session calendar to build your personalized schedule that you can access from the Gartner Events Navigator. They also have a mobile app (Android, iOS and Windows) that you can use after you have registered, and this will come in handy as you move around at the conference between sessions, roundtables, meetups, networking, and breakfast/lunches.

Actian CTO Mike Hoskins, will be sharing Actian’s Hybrid Data Vision as part of his talk titled “Actian: Drowning in Data? How to Bridge the Gap to Business Insights.” The talk will be held in the TechZone Theatre/Harbor Ballroom, Second Level on Wednesday, August 23 @ 1:30 PM PT, so be sure to add this one to your calendar. This talk is in the same area as, and during, the coffee/desserts break, so seats tend to fill up fast… don’t be late or you’ll be left standing!

Remember that the event is in San Diego, California and not in San Diego, Texas! The nearest airport is San Diego International Airport (SAN), formerly known as Lindbergh Field, which is located a short distance by car/taxi from the event hotel. Most local school districts have either just started or are starting their new school year, so the event is perfectly timed to miss the peak Summer vacations for many US tourists. Be sure to check out the local weather forecast before you pack your suitcase. Remember to find some time to stretch your legs and explore the nearby Gaslamp District and Seaport Village. Check out Gartner’s latest event-related venue and travel information, as there are some travel alerts to be aware of.

The Actian team will be there, and we look forward to meeting you in person at the Actian Booth #108 in the Harbor Ballroom, second level of the Manchester Grand Hyatt San Diego.

We’ll be sharing our hybrid data vision and will have subject matter experts available onsite to walk you through our portfolio of hybrid data-management, analytics and integration products and services for Technology Professionals like you.

If you’re new to Actian products, here are a few of the portfolio highlights:

Actian NoSQL accelerates Agile development for complex object models at Enterprise scale.
Actian Zen Embedded Database enables zero-admin, nano-footprint, hybrid NoSQL & SQL data management.
Actian Vector in-memory analytics database is a consistent performance leader on TPC-H Decision Support Benchmark over the last 5 years.
Actian DataConnect provides lightweight, enterprise class hybrid data integration.

We hope you’ll stop by to say “Hi” to the team and learn about Actian’s products, community, and customers.

Follow us on Twitter, and on LinkedIn to stay connected with what we are up to. If you fancy a job to pursue your passion in data management, data integration, and data analytics, check out our careers page and come join our team – WE’RE HIRING!

About Author

Can Your Applications Find You?

#Data Management #Databases #Ingres #Partners

compass depicting applications finding you

#Data Management #Databases #Ingres #Partners

One important trend in database management is integrating location data better to improve insights about events and activities that matter to your business.

“…Interest in analyzing geospatial/location data has increased over the past four years from 26% to 36%.”
Source: Gartner Survey Analysis: Big Data Investments

Tracking customer location can be critical for offering location-based services, particularly for travelers (think Uber matching cars to riders, or restaurants making offers to customers nearby) and for shoppers (to optimize shelf locations for popular items and perhaps make real-time offers). Tracking and managing assets by location can not only improve response time to failures but also track potential interactions that ultimately predict future failures.

Actian Ingres has supported geospatial data for a few years now, recognizing location as a data type to improve the validity, accuracy, and processing of location data. Earlier this year, we extended that support in Ingres by introducing a plugin for ESRI ArcGIS 10.x users to view and manipulate geospatial data. ArcGIS, ESRI’s geographic information system (GIS) for working with maps and geographic information, is used for creating and using maps and mapping information

The ESRI plugin supports two of the tools, ArcMap and ArcCatalog, in versions 10.x of ArcGIS on Windows, and Actian supports the plugin on Ingres 10S, 10.2,. ArcMap is the primary application used in ArcGIS for mapping, editing, analysis, and data management. With the ESRI plugin and ArcMap, users can access geospatial data to create maps, visualize, filter, summarize, analyze, compare, and interpret spatial data. ArcCatalog allows users to store and organize geospatial data (like a Windows Explorer for geospatial data).

Actian is working with a couple of partners to help our customers get the most out of Ingres and ArcGIS:

Critigen provides implementation services with ESRI expertise to develop and deploy geospatial applications.
Safe Software supports Ingres through their FME integration tool and complements Actian DataConnect.

The ESRI plugin and documentation are available to existing Actian customers for download at esd.actian.com. To find out more about geospatial features, go to docs.actian.com.

Download the ESRI plugin and let us know what you think!

About Author

Predictions for the Hybrid Data Landscape

#Analytics #Cloud #Data Integration #Data Management #DataCloud #Embedded #Hybrid #Hybrid Data #Ingres #IoT #PSQL #Zen

Predictions for the hybrid data landscape

#Analytics #Cloud #Data Integration #Data Management #DataCloud #Embedded #Hybrid #Hybrid Data #Ingres #IoT #PSQL #Zen

The Age of Data has arrived, with new data sources, targets and processing models proliferating madly across enterprises of all sizes. While data has never been more valuable to a business — it now informs the who, what, where, when and how of decision-making — this new hybrid data landscape introduces new challenges. We anticipate the following innovative efforts in data management, integration and analytics to address these challenges.

The Rise of HTAP – Best of Both Worlds in Data Management

One of the most exciting trends for the balance of this decade will be HTAP (Hybrid Transactional/Analytical Processing), which is a Gartner-coined term representing a hybrid, converged software infrastructure that can handle both traditional transactional data management workloads AND modern analytic data management workloads.

Every business is struggling to find tools and techniques to effectively analyze the volume, variety and velocity of data. A new generation of columnar analytic SQL databases (like Actian Vector) will be critical to delivering on the promise of data-driven decisions. At the same time, organizations are familiar with, and trying to preserve, their investment in traditional transactional SQL databases (like Actian Ingres) that represent the backbone of data management in most organizations. How to marry those two data management needs?

What if you could have both capabilities in the same database? What if you could have the best of both worlds? Robust, enterprise-class OLTP database capabilities that leverage a 30+ year history of pioneering work in data management. And then add the world’s highest-performance columnar analytic database engine (with vector processing) into the same database infrastructure. One database, one security model, one SQL, one vendor – providing an innovative hybrid of operational and analytic processing that covers the entire spectrum of data management! With the ability to deploy to the cloud or on-premise. Now that is something to get excited about.

The Rise of Edge Databases for IoT Data Management

The emerging IoT stacks and solutions are missing one important element of scalable architectures – an elastic middle tier that can sit at the “edge” of the network and deliver robust processing services to the onboarding and analysis of IoT data. Most conventional IoT architectures focus simply on the two main end-points – the sensors themselves, spitting out low-level data, and the cloud, where sensor events should eventually “land” for analysis.

The sheer volume and repetition of sensor data make it impractical to imagine “landing” all sensor data in the cloud. The smarter IoT architectures will provide an intelligent middle tier – a kind of gateway function that resides near the sensors, at the edge. This layer is intended for early capture, processing and local analysis of the sensor data before only vital information is sent to the cloud.

The natural technology to deploy at the onboarding “edge” of the network is a bullet-proof embedded IoT edge database. Apart from the obvious advantages of deploying an embedded IoTDB at the “edge” of the network (persistence, security, etc.), you could also apply crucial local filtering (e.g. duplicates, errors, steady states, etc.) and data operations (e.g. sorts, aggregates, model application and local analytics) on the data prior to “landing” the data in the cloud – a much more efficient and productive setup for cloud-based analytics of sensor data.

The Rise of Hybrid Integration Platforms

It seems that regardless of how much we invest, integration remains an unsolved problem – permanently atop the priority list in all IT shops and organizations. The diversity of IT systems guarantees a baseline of integration challenges. An uncountable number of new end-points every year exacerbates the situation. Factor in that old and new end-points are changing constantly, and you multiply the problem further. Add the requirement for different integration patterns and delivery models and you begin to see the many intimidating dimensions of the integration problem.

Is there hope? Yes, tools that surpass the limited nature of today’s typical integration offerings are making their way into the market. Instead of focusing on one dimension of today’s integration problem – legacy on-premises ETL, heavy EAI tooling or lightweight cloud services, we will see customers turn to hybrid integration platforms – modern, dynamic and cloud-based solutions – to tackle all dimensions. Whether it is the variety of end-points (cloud, mobile or on-prem), or the variety of patterns (A2A via APIs or B2B via data), or the variety of skills (IT expert to LoB practitioner) or the variety of delivery models (cloud or on-premise), a modern hybrid integration platform like the Actian DataCloud will enable customers to adapt to today’s data integration needs.

The Rise of Graph Analytics in the Cloud

Neo4J, the leading commercial provider of on-premises graph database technology, recently raised a funding round of $36 million. This funding establishes graph databases (and the associated graph analytics space) as first class citizens in the pantheon of modern analytic techniques.

Why graph? In the now-immortal words of Donald Rumsfeld, there are “known knowns” (handled via BI and reporting), there are “known unknowns” (handled via predictive analytics to get a grip on a known analytic challenge such as fraud), and then there are “unknown unknowns.” These are the questions you never knew to ask, the queries you never knew to write. What are the unknown/unseen patterns hidden away in your data, and how do you find them? This is one of the great analytic challenges in datasets – what are the inherent (but unseen) relationships in the data – what objects are “close” to what other objects? What objects are “outliers”? What heretofore seemingly unrelated events share space and time?

It is exactly for this reason that graph is an important new analytic weapon. Graph analytics in the cloud are the ideal implementation platform, and we expect to see offerings that let you transfer your data into the cloud, load it into a back-end graph datastore like Actian Versant, and then “graph it” to see patterns inherent in the data (and even see new patterns emerge spontaneously as you add more data).

About Author

Introducing Actian DataConnect 11

#Data Integration #DataConnect #Integration

#Data Integration #DataConnect #Integration

We are proud to introduce the latest release of Actian DataConnect. We listened to our customers and adopted a ‘back to basics’ approach with the new product architecture. A lightweight desktop installation for design and a flexible SDK and CLI for run-time are the core components which will plug into any existing job management infrastructure you may have already built around a previous version. For users who want to take advantage of our out-of-the-box, robust and secure cloud infrastructure, DataCloud is the preferred deployment option.

Backwards compatibility with Actian DataConnect versions 9 and 10 is another core theme in the version 11 release. Pervasive Data Integrator version 9 users can skip Version 10 altogether and upgrade directly to Version 11. Those maps, schema, processes, and other artifacts can be imported to Version 11 without the need to perform a migration. Simply import and use them.

The design environment focuses on developer productivity and integration architecture simplification with a small footprint, desktop IDE installation. It includes all the familiar mapping and event features that were available in Data Integrator version 9. We’ve also added even more development tools that will increase your speed to iterate through the development of new and to modify your existing integration projects.

We wanted to get this release into the field so our version 9 users could begin to take advantage of it immediately. Version 10 users will be able to upgrade in a subsequent release targeted later in 2017.

What’s New and Different About DataConnect 11?

Architecture:

Lightweight desktop design interface built on a widely adopted extensible open-source IDE framework.
Ability to import, rather than migrate, integration artifacts from prior DataConnect versions.
Full support for Data Integrator Version 9 Events and Actions for backward compatibility.
Open, file system-based metadata repository that enables use of your existing source control systems.
Flexible software development kit (SDK) and command line interface (CLI) to support your custom job management infrastructure.
DataCloud deployment option: Manage in the cloud, run time on-premises via agents.

Integration Features:

REST Invoker 3.0: Easy-to-use and standardized approach to RESTful web service APIs.
Engine execution profiler provides immediate, interactive performance feedback.
Built-in XML and Text editors for power users to directly modify metadata.
Content assist in the script editor (aka code completion).
Reject connection tab for improved ease of use.
Optional support for macro sets and encrypted values.
Improved “Search and Replace” functionality and Help system.

Want to Learn More?

For the hands-on users, here is a short series of videos showing the new user interface in action.

Download data sheet and whitepaper: click here

About Author

Architecting Next-Generation Data Management Solutions

#Data Management #Hybrid

#Data Management #Hybrid

This is part 2 of our conversation with Forrester analyst Michele Goetz. Please click here to read the first post: Rethink Hybrid for the Data-Driven Enterprise.

After a recent Actian webinar featuring Forrester Research, John Bard, senior director of product marketing at Actian, asked Forrester principal analyst Michele Goetz more about next-generation data management solutions. Here is the second part of that conversation (see part one here):

John Bard, Actian: What are key business imperatives that are forcing a greater priority of speed of query processing for systems of insight?

Michele Goetz, Forrester: More and more businesses are becoming digital. Retailers are creating digital experiences in their brick-and-mortar stores. Oil and gas companies are placing thousands of sensors on wells to get information on production and equipment states in real-time. And the mobile mind shift is driving more and more consumer and business engagement through mobile apps. Everything is in real-time, delivered through a web of microservices, and increasingly sophisticated analytics are embedded in streams and processes. This places a significant demand on systems that have to hit high-performance levels on massively orchestrated data services to get insight on demand, make decisions quickly, take action quickly, and achieve outcomes that meet business goals.

JB: How important is it for operational data and systems of insight to be tightly linked? What are some applications/use cases driving that integration?

MG: More and more, transactional systems have to operate on insight and not just as entry points to capture a transactional event. Analytics are running on streams of data and individual transactions such as purchases and business process events and transactions. These analytics provide suggestions and instructions to inform pricing, offers, next best action, and security/fraud patterns, along with automating manual processes. Today’s modern data platform has to run analytic and operational workloads side by side to not only enable a process but also capitalize on opportunities and threats as they occur.

JB: How does an enterprise strike a balance between best-in-class solutions that often require integration versus all-in-one platforms that often force compromises?

MG: For each business process, customer engagement, automated process, and partner engagement, there are different service-level needs for data and analytics. Data and data services have to be more personalized to the tasks at hand and desired outcomes. Upstream in-development applications are designed with specific requirements for data, insights, and the cadence for when data and insight are needed. These requirements manifest within the data and application APIs that drive microservices and business services. A monolithic all-in-one platform creates rigidity as a purpose-built system that is inflexible to business changes. The cost to purchase and maintain is significant and has an impact on the ability to modernize, thus building up technical debt. Additionally, for every new capability, a new silo is built, further fragmenting data and inhibiting insight. Companies need to move toward a hybrid approach that takes into account the cloud, data variety, service levels, best-in-class technologies, and open source for innovation. Hybrid systems allow flexibility and adaptability to drive service-oriented data toward business value without the cost and delivery bottlenecks that one-size-fits-all systems create.

JB: What is the best design approach to accelerate development to achieve faster deployment to production and therefore business value?

MG: Start with what the solution is supporting and the service levels it requires. Have an understanding of how that fits into specific data architecture patterns: data science for advanced analytics and visualization, intelligent transactional data , or analytic and BI workspaces. These patterns guide the choices for database, integration, and cloud while also helping to establish governance that guides trusted sources, repeatable and reusable data APIs and services, and the management of security policies.

JB: What sort of new applications and services can be created from these new hybrid data architectures?

MG: Hybrid data management is about putting the right data services and systems to the task and outcome at hand. It provides more freedom to introduce modern data technologies to quickly take advantage of capabilities to scale, get to insights you couldn’t see because of lack of data access, and deliver data and insight in real time without the lag from nightly batch processing and reconciliation. Additionally, hybrid data management has better administrative layers to help manage the peaks and valleys across the ecosystem and avoid performance bottlenecks, as well as right cost data service levels between cloud and on-premises systems. Going hybrid means getting access to all the data to create customer 360s that take personalization to the next level. It allows analytics to mature toward machine learning, advanced visualizations, and AI by providing a better data infrastructure backbone. And apps and products become more intelligent as hybrid systems create engagement that is insightful and adaptive to the way the solutions are used.

About Author