Data Management

Automatically Synchronize Your OLTP and OLAP Tables in Ingres 11

Alex Hanshaw

February 9, 2018

A key feature of the latest update to our hybrid database Ingres 11 is the inclusion of the x100 analytics engine and support for x100 tables. Adding a world-class analytics engine to Ingres enterprise OLTP (OnLine Transaction Processing) capability enables a new class of applications that can mix OLTP and analytic queries – with each query being directed to the appropriate execution engine.

Actian engineers have just delivered an enhancement to Ingres which adds support for X100 table references in database procedures. This allows database procedures and rules to be used to trigger updates, inserts and deletes against X100 tables when inserts, updates and deletes are applied to Ingres tables. This means you can automatically synchronize your OLTP (Ingres) and OLAP (X100) tables, adding new transactional data to your analytics data.

The following example will walk you through the creation of two tables (airport and airport_X100) and the creation of the database procedures and rules that will mirror inserts, updates and deletes from the Ingres table to the X100 table. Take a look and let us know what you think!

This enhancement is delivered via a patch that can be downloaded from esd.actian.com here.

Create an OLTP table (airport) and an OLAP (OnLine Analytical Processing) target (airport_x100):

CREATE TABLE airport ( ap_id INT, ap_iatacode NCHAR(3) NOT NULL, ap_place NVARCHAR(30), ap_name NVARCHAR(50), ap_ccode NCHAR(2))g
commit; g
CREATE TABLE airport_x100 ( ap_id INT, ap_iatacode NCHAR(3) NOT NULL, ap_place NVARCHAR(30), ap_name NVARCHAR(50), ap_ccode NCHAR(2)) with structure=x100g
commitg

First we’ll create a procedure that can be used to mirror inserts from airport into airport_x100:

CREATE PROCEDURE proc_ins_airport_x100 (apid INT NOT NULL NOT DEFAULT, apiatacode NCHAR(3) NOT NULL NOT DEFAULT, applace NVARCHAR(30) NOT NULL NOT DEFAULT, 
     apname NVARCHAR(50) NOT NULL NOT DEFAULT, apccode NCHAR(2) NOT NULL NOT DEFAULT)
     AS BEGIN
          INSERT INTO airport_x100 ( ap_id, ap_iatacode, ap_place, ap_name, ap_ccode)
          VALUES ( :apid, :apiatacode, :applace, :apname, :apccode);
     ENDg

Now we create a rule that will automatically trigger on an insert into airport and fire the procedure proc_ins_airport_x100 with each parameter containing the column values that were part of the insert into airport:

CREATE RULE trg_ins_airport_x100 AFTER INSERT INTO airport
     EXECUTE PROCEDURE proc_ins_airport_x100 (apid = new.ap_id, apiatacode = new.ap_iatacode, applace = new.ap_place, apname = new.ap_name, apccode = new.ap_ccode)g

We insert 4 rows into the OLTP table airport:

INSERT INTO airport VALUES (50000, 'AAA', 'Vector', 'Aomori', 'AU')g
INSERT INTO airport VALUES (50001, 'AA1', 'Vector', 'Tegel', 'AU')g
INSERT INTO airport VALUES (50002, 'AA2', 'Vector', 'Tegel', 'NL')g
INSERT INTO airport VALUES (50003, 'AA3', 'Vector', 'Tegel', 'NL')g

Now we see if our target OLAP table (airport_x100) has been updated:

select ap_id, ap_iatacode as iatacode, cast(ap_place as varchar(20)) as ap_place, cast(ap_name as varchar(20)) as ap_name, ap_ccode from airport_x100;g

+-------------+--------+--------------------+--------------------+--------+
|ap_id        |iatacode|ap_place            |ap_name             |ap_ccode|
+-------------+--------+--------------------+--------------------+--------+
|        50000|AAA     |Vector              |Aomori              |AU      |
|        50001|AA1     |Vector              |Tegel               |AU      |
|        50002|AA2     |Vector              |Tegel               |NL      |
|        50003|AA3     |Vector              |Tegel               |NL      |
+-------------+--------+--------------------+--------------------+--------+
(4 rows)

So the inserts were automatically propagated. Now we take a look at updates. Multiple rules fired on a single condition do not have a defined order of execution. So we take care to ensure only one rule fires or that the order does not affect the end result.

First off we are creating a procedure that will update airport_x100’s ap_id column with the value of the ap_id parameter if airport_x100’s ap_iatacode column matches the apiatacodeold parameter value:

CREATE PROCEDURE proc_upd_airport_x100_01 ( apid INT NOT NULL NOT DEFAULT, apiatacodeold NCHAR(3) NOT NULL NOT DEFAULT)
     AS BEGIN
          UPDATE airport_x100 SET ap_id = :apid WHERE ap_iatacode = :apiatacodeold;
     ENDg

Now we create the rule that triggers the above procedure, it only fires if the ap_id column in the table airport is updated. It is possible to pass the new and/or old value from an update into the called procedure:

CREATE RULE trg_upd_airport_x100_01 AFTER UPDATE(ap_id) of airport
     EXECUTE PROCEDURE proc_upd_airport_x100_01 ( apid = new.ap_id, apiatacodeold = old.ap_iatacode)g

Now we create a second update procedure that will similarly, conditionally update airport_x100’s ap_iatacode column with the supplied value. As well as updating airport_x100 this table will generate pop-up messages when executed. We are not limited to single insert, delete or update statements:

CREATE PROCEDURE proc_upd_airport_x100_02 (apiatacode NCHAR(3) NOT NULL NOT DEFAULT, apiatacodeold NCHAR(3) NOT NULL NOT DEFAULT)
     AS
     DECLARE
          str VARCHAR(30) not null;
     BEGIN
          str = 'apiatacode = ' + :apiatacode;
          MESSAGE :str;
          str = 'apiatacodeold = ' + :apiatacodeold;
          MESSAGE :str;
          UPDATE airport_x100 SET ap_iatacode = :apiatacode WHERE ap_iatacode = :apiatacodeold;
     ENDg

We now create a rule that will fire the above procedure if the ap_iatacode in the table airport is updated:

CREATE RULE trg_upd_airport_x100_02 AFTER UPDATE(ap_iatacode) of airport
     EXECUTE PROCEDURE proc_upd_airport_x100_02 ( apiatacode = new.ap_iatacode, apiatacodeold = old.ap_iatacode)g

We go on to create procedures and rules that will apply updates to other columns in airport_x100. This is not a requirement. Multiple columns can be updated in a single procedure if required:

CREATE PROCEDURE proc_upd_airport_x100_03 (applace NVARCHAR(30) NOT NULL NOT DEFAULT, apiatacodeold NCHAR(3) NOT NULL NOT DEFAULT)
     AS BEGIN
          UPDATE airport_x100 SET ap_place = :applace WHERE ap_iatacode = :apiatacodeold;
     ENDg

CREATE RULE trg_upd_airport_x100_03 AFTER UPDATE(ap_place) of airport
     EXECUTE PROCEDURE proc_upd_airport_x100_03 ( applace = new.ap_place, apiatacodeold = old.ap_iatacode)g

CREATE PROCEDURE proc_upd_airport_x100_04 (apname NVARCHAR(30) NOT NULL NOT DEFAULT, apiatacodeold NCHAR(3) NOT NULL NOT DEFAULT)
     AS BEGIN
          UPDATE airport_x100 SET ap_name = :apname WHERE ap_iatacode = :apiatacodeold;
     ENDg

CREATE RULE trg_upd_airport_x100_04 AFTER UPDATE(ap_name) of airport
     EXECUTE PROCEDURE proc_upd_airport_x100_04 ( apname = new.ap_name, apiatacodeold = old.ap_iatacode)g

CREATE PROCEDURE proc_upd_airport_x100_05 (apccode NVARCHAR(30) NOT NULL NOT DEFAULT, apiatacodeold NCHAR(3) NOT NULL NOT DEFAULT)
     AS BEGIN
          UPDATE airport_x100 SET ap_ccode = :apccode WHERE ap_iatacode = :apiatacodeold;
     ENDg

CREATE RULE trg_upd_airport_x100_05 AFTER UPDATE(ap_ccode) of airport
     EXECUTE PROCEDURE proc_upd_airport_x100_05 ( apccode = new.ap_ccode, apiatacodeold = old.ap_iatacode)g

Now we will perform updates to our transactional data in the airport table to show how the above procedures and rules automatically apply changes to our analytics table airport_x100:

update airport set ap_id = 99999 where ap_id = 50000g

The select shows trg_upd_airport_x100_01 was triggered and executed the procedure proc_upd_airport_x100_01. The single column update has been mirrored into the airport_x100 table:

select ap_id, ap_iatacode as iatacode, cast(ap_place as varchar(20)) as ap_place, cast(ap_name as varchar(20)) as ap_name, ap_ccode from airport_x100;g

+-------------+--------+--------------------+--------------------+--------+
|ap_id        |iatacode|ap_place            |ap_name             |ap_ccode|
+-------------+--------+--------------------+--------------------+--------+
|        99999|AAA     |Vector              |Aomori              |AU      |
|        50001|AA1     |Vector              |Tegel               |AU      |
|        50002|AA2     |Vector              |Tegel               |NL      |
|        50003|AA3     |Vector              |Tegel               |NL      |
+-------------+--------+--------------------+--------------------+--------+
(4 rows)

The next update will trigger the procedure proc_upd_airport_x100_02 which also displays a message when fired. We’ll apply a number of updates to the airport table to show and then select from the analytics table airport_x100 to show how the above databases and rules are automatically applying changes to our analytics data based on transaction that affect our transactional data:

update airport set ap_iatacode = 'AA9' where ap_iatacode = 'AA1'g

     MESSAGE 0: apiatacode = AA9
     MESSAGE 0: apiatacodeold = AA1
     (1 row)

select ap_id, ap_iatacode as iatacode, cast(ap_place as varchar(20)) as ap_place, cast(ap_name as varchar(20)) as ap_name, ap_ccode from airport_x100;g

+-------------+--------+--------------------+--------------------+--------+
|ap_id        |iatacode|ap_place            |ap_name             |ap_ccode|
+-------------+--------+--------------------+--------------------+--------+
|        99999|AAA     |Vector              |Aomori              |AU      |
|        50001|AA9     |Vector              |Tegel               |AU      |
|        50002|AA2     |Vector              |Tegel               |NL      |
|        50003|AA3     |Vector              |Tegel               |NL      |
+-------------+--------+--------------------+--------------------+--------+
(4 rows)

update airport set ap_place = 'VectorUPD' where ap_place = 'Vector'g

(4 rows)

select ap_id, ap_iatacode as iatacode, cast(ap_place as varchar(20)) as ap_place, cast(ap_name as varchar(20)) as ap_name, ap_ccode from airport_x100;g

+-------------+--------+--------------------+--------------------+--------+
|ap_id        |iatacode|ap_place            |ap_name             |ap_ccode|
+-------------+--------+--------------------+--------------------+--------+
|        99999|AAA     |VectorUPD           |Aomori              |AU      |
|        50001|AA9     |VectorUPD           |Tegel               |AU      |
|        50002|AA2     |VectorUPD           |Tegel               |NL      |
|        50003|AA3     |VectorUPD           |Tegel               |NL      |
+-------------+--------+--------------------+--------------------+--------+
(4 rows)

update airport set ap_name = 'TegelUPD' where ap_name = 'Tegel'g

(3 rows)

select ap_id, ap_iatacode as iatacode, cast(ap_place as varchar(20)) as ap_place, cast(ap_name as varchar(20)) as ap_name, ap_ccode from airport_x100;g

+-------------+--------+--------------------+--------------------+--------+
|ap_id        |iatacode|ap_place            |ap_name             |ap_ccode|
+-------------+--------+--------------------+--------------------+--------+
|        99999|AAA     |VectorUPD           |Aomori              |AU      |
|        50001|AA9     |VectorUPD           |TegelUPD            |AU      |
|        50002|AA2     |VectorUPD           |TegelUPD            |NL      |
|        50003|AA3     |VectorUPD           |TegelUPD            |NL      |
+-------------+--------+--------------------+--------------------+--------+
(4 rows)

Now we create a procedure and rule to mirror deletes. In many cases we would want to retain historical data in our analytics tables but we may need to delete rows for compliance reasons. The next procedure and rule we create will delete data from airport_x100 automatically when rows are deleted from the airport table:

CREATE PROCEDURE proc_del_airport_x100_01 (apiatacodeold NCHAR(3) NOT NULL NOT DEFAULT)
     AS BEGIN
          DELETE from airport_x100 WHERE ap_iatacode = :apiatacodeold;
     ENDg
CREATE RULE trg_del_airport_x100_01 AFTER DELETE FROM airport
     EXECUTE PROCEDURE proc_del_airport_x100_01 (apiatacodeold = old.ap_iatacode)g

delete from airport where ap_iatacode = 'AA9'g

(1 row)

This select shows that the row deleted from airport has been automatically deleted from airport_x100:

select ap_id, ap_iatacode as iatacode, cast(ap_place as varchar(20)) as -
     ap_place, cast(ap_name as varchar(20)) as ap_name, ap_ccode from airport_x100;g

+-------------+--------+--------------------+--------------------+--------+
 |ap_id        |iatacode|ap_place            |ap_name             |ap_ccode|
 +-------------+--------+--------------------+--------------------+--------+
 |        99999|AAA     |VectorUPD           |Aomori              |AU      |
 |        50002|AA2     |VectorUPD           |TegelUPD            |NL      |
 |        50003|AA3     |VectorUPD           |TegelUPD            |NL      |
 +-------------+--------+--------------------+--------------------+--------+
 (3 rows)

delete from airport where ap_name = 'TegelUPD'g

(2 rows)

Select shows 2 rows deleted from airport are automatically deleted from airport_x100:

select ap_id, ap_iatacode as iatacode, cast(ap_place as varchar(20)) as ap_place, cast(ap_name as varchar(20)) as ap_name, ap_ccode from airport_x100;g

+-------------+--------+--------------------+--------------------+--------+
|ap_id        |iatacode|ap_place            |ap_name             |ap_ccode|
+-------------+--------+--------------------+--------------------+--------+
|        99999|AAA     |VectorUPD           |Aomori              |AU      |
+-------------+--------+--------------------+--------------------+--------+
(1 row)

These examples show basic mirroring of inserts, updates and deletes from transactional tables to analytics tables to get you started. Much more sophisticated database procedures can be built and the changes applied to your analytics data does not need to be a mirror of the changes to your transactional data. You may want to trigger an insert into your analytics table after a row is deleted from your transactional data to establish an audit trail.

We’d love to hear how you are seamlessly and automatically generating your analytics data from your existing transactional tables. Please post a reply in our community forums to let us know about the cool things you are doing with the Ingres Hybrid Database.

About Alex Hanshaw

Alex Hanshaw is Director of Engineering at Actian, drawing on 20+ years of hands-on experience developing solutions for complex customer challenges. Alex joined the Ingres Sustaining Engineering team in 1997 and has continually prioritized resolving customer issues across Actian Ingres and Vector. He has overseen multiple product enhancements that reduce downtime and simplify maintenance. Alex frequently collaborates with industry peers at conferences and user groups (e.g., Ingres Community meetups) to share best practices. On the Actian blog, Alex covers topics such as database migration strategies, performance optimization, and troubleshooting. Check out his recent posts for actionable tips on keeping your data systems running smoothly.

Data Management

Python & Btrieve 2 on Windows: Accessing Zen Data With NoSQL

Actian Corporation

February 2, 2018

Connecting ServiceNow to other applications' data doesn’t have to be difficult

In my previous blog (“Programming the Easy Way: Accessing a PSQL Zen Database with Python and ODBC“), I showed how to easily access an Actian Zen database with Python and ODBC. My next project was to use the new Btrieve 2 feature with Python to directly interface with the data files without using any SQL.

Long-time Btrieve/PSQL/Zen customers are familiar with the old Btrieve function, a classic 3GL programming interface that hasn’t really changed in over 30 years. It provides direct access to the engine so that you can manipulate data files with options like Create, Open, Insert, Update, Delete, GetFirst, GetNext, GetEqual, etc. Now, with Actian Zen v13 we have Btrieve 2, which offers a simplified, more intuitive object-oriented interface covering the full set of Btrieve calls to the database engine. This interface is provided for C/C++ developers, but we also supply SWIG (Simplified Wrapper & Interface Generator) files so the interface can be used by Python, Perl, and PHP developers.

Getting setup to use Btrieve 2 with Python requires a few steps:

Prepare your Windows environment:
- Install the Actian Zen v13 database engine
- Install Python for Windows – I used v3.6 64-bit from python.org
- Download & unzip Swig for Windows – I used Swigwin-3.0.12 from swig.org
- Download & unzip the Btrieve 2 SDK
- Visual Studio 2015 or later is also required; you can install a Community Version from Microsoft if you don’t already have one from here.
Create a MyPrograms folder for your Python code, and copy in the following files from the Btrieve 2 SDK:
- btrieveCpp.lib & btrieveC.lib from win64
- btrieveC.h & btrieveCpp.h from include
- btrievePython.swig & btrieveSwig.swig from swig
Open a command prompt and change to your MyPrograms folder, and enter the following command:
<swigwin_location>swig -c++ -D_WIN64 -python btrievePython.swig
This creates the following files: btrievePython.py & btrievePython_wrap.cxx

Create a text file in MyPrograms called setup.py containing the following:

from distutils.core import setup, Extension
btrievePython_module = Extension('_btrievePython', sources=['btrievePython_wrap.cxx'],
     library_dirs=['<MyPrograms location>'], libraries=['btrieveCpp'],  )
setup (name='btrievePython', version='1.0', author='Actian',
     description="""Compile Btrieve 2 Python module""",
     ext_modules=[btrievePython_module], py_modules=["btrievePython"], )

In a command prompt, in the MyPrograms folder, run this command to build the btrievePython module:
python setup.py build_ext --plat-name="win-amd64"
This creates the compiled Python file buildlib.win-amd64-3.6_btrievePython.cp36-win_amd64.pyd
Rename the .pyd file to just _btrievePython.pyd and copy it to MyPrograms or the DLLs folder under your Python installation.

Once these steps are completed, you should be able to “import btrievePython” in a Python environment or program.

Once the btrievePython import library is setup, you are ready to write Btrieve 2 applications in Python! Here’s an example that performs the same database operations as the ODBC version from my previous blog – create the same file, insert records, and retrieve the total count inserted. This code is about twice as long as the ODBC version, but it accesses the file directly without going through the SQL layer. (Note – only minimal error checking has been included):

import os
import sys
import struct
import btrievePython as btrv

btrieveFileName = "Test_Table.mkd"
recordFormat = "<iB32sBBBH"
recordLength = 42
keyFormat = "<i"

# Create a session:
btrieveClient = btrv.BtrieveClient(0x4232, 0) # ServiceAgent=B2

# Specify FileAttributes for the new file:
btrieveFileAttributes = btrv.BtrieveFileAttributes()
rc = btrieveFileAttributes.SetFixedRecordLength(recordLength)
# Specify Key 0 as an autoinc:
btrieveKeySegment = btrv.BtrieveKeySegment()
rc = btrieveKeySegment.SetField(0, 4, btrv.Btrieve.DATA_TYPE_AUTOINCREMENT)
btrieveIndexAttributes = btrv.BtrieveIndexAttributes()
rc = btrieveIndexAttributes.AddKeySegment(btrieveKeySegment)
rc = btrieveIndexAttributes.SetDuplicateMode(False)
rc = btrieveIndexAttributes.SetModifiable(True)

# Create the file:
rc = btrieveClient.FileCreate(btrieveFileAttributes, btrieveIndexAttributes,
btrieveFileName, btrv.Btrieve.CREATE_MODE_OVERWRITE)
if (rc == btrv.Btrieve.STATUS_CODE_NO_ERROR):
     print('nFile "' + btrieveFileName + '" created successfully!')
else:
     print('nFile "' + btrieveFileName + '" not created; error: ', rc)

# Allocate a file object:
btrieveFile = btrv.BtrieveFile()
# Open the file:
rc = btrieveClient.FileOpen(btrieveFile, btrieveFileName, None, btrv.Btrieve.OPEN_MODE_NORMAL)
if (rc == btrv.Btrieve.STATUS_CODE_NO_ERROR):
     print('File open successful!n')
else:
     print('File open failed - status: ', rc, 'n')

# Insert records:
iinserting = True
while iinserting:
     new_name = input('Insert name (Q to quit): ' )
     if new_name.lower() == 'q':
          iinserting = False
     else:
          record = struct.pack(recordFormat, 0, 0, new_name.ljust(32).encode('UTF-8'), 0, 22, 1, 2018)
          rc = btrieveFile.RecordCreate(record)
          if (rc == btrv.Btrieve.STATUS_CODE_NO_ERROR):
               print(' Insert successful!')
          else:
               print(' Insert failed - status: ', rc)

# Get Record count:
btrieveFileInfo = btrv.BtrieveFileInformation()
rc = btrv.BtrieveFile.GetInformation(btrieveFile, btrieveFileInfo)
print('nTotal Records inserted =', btrieveFileInfo.GetRecordCount())

# Close the file:
rc = btrieveClient.FileClose(btrieveFile)
if (rc == btrv.Btrieve.STATUS_CODE_NO_ERROR):
     print('File closed successful!')
else:
     print('File close failed - status: ', rc)

Since the above example doesn’t actually do any data reads, I went back and added a little more code before the closing the file to demonstrate a file scan that looks for a name:

# Look up record by name
ireading = True
while ireading:
     find_name = input('nFind name (Q to quit): ' )
     if find_name.lower() == 'q':
          ireading = False
     else:
          foundOne = False
          record = struct.pack(recordFormat, 0, 0, ' '.ljust(32).encode('UTF-8'), 0, 0, 0, 0)
          readLength = btrieveFile.RecordRetrieveFirst(btrv.Btrieve.INDEX_NONE, record, 0)
          while (readLength > 0):
               recordUnpacked = struct.unpack(recordFormat, record)
               if (recordUnpacked[2] == find_name.ljust(32).encode('UTF-8')):
                    print(' Matching record found: ID:', recordUnpacked[0], ' Name:', recordUnpacked[2].decode())
                    foundOne = True
               readLength = btrieveFile.RecordRetrieveNext(record, 0)
          if (foundOne == False):
               print(' No record found matching "'+find_name+'"')
          status = btrieveFile.GetLastStatusCode()
          if (status != btrv.Btrieve.STATUS_CODE_END_OF_FILE):
               print(' Read error: ', status, btrv.Btrieve.StatusCodeToString(status))

The simplicity of Python programming combined with the new Btrieve 2 development interface will allow for the fast turn-around of new Zen applications!

If you have any questions about Zen or other Actian products please feel free to ask in our community forums.

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.

Data Management

Simplifying Development With OpenROAD JSON-RPC

Actian Corporation

January 17, 2018

With the release of the new patches at the end of December for version 6.2, Actian OpenROAD (our rapid application development and flexible deployment solution) is providing new functionality in the form of JSON support and JSON-RPC support for OpenROAD Server.

Those familiar with OpenROAD would know that an application server was available from version 4.1. This allowed the ability to place the business logic on the server side while a thin client had to deal solely with the user interface. The thin clients could be an OpenROAD application or even web clients, but they had to have the appropriate shared libraries installed. Managing that overhead was sometimes too complicated.

The JSON-RPC interface provides the advantage of allowing for connectivity without the overhead. A simple client, such as an HTML page with Javascript, can connect to the OpenROAD Server via JSON-RPC with no additional libraries or plugins/add-ons. Of course, there are many other options, such as PHP or .Net, all without the need to register or install anything.

Any legacy client applications will work the same way. They can be upgraded from older versions with the Property Changer or deployed using OpenROAD LoadNRun. They could be also easily migrated to use a thin web client or any type of client that supports connectivity this way.

Here is how simple the code could be, using the jQuery ajax method in this example.

$.ajax({
   type: 'POST’,
   url: baseUrl,
   data: JSON.stringify({
      "jsonrpc": "2.0",
      "method": methodName,
      "id": 3,
      "params": methodParams),
   async: false,
   dataType: 'json’,
   contentType: 'application/json’,
   success: function (data) {
      // Your SUCCESS code
   },
   error: function (data) {
      // Your ERROR code
   }
});

Here we make a standard AJAX call in which we pass the JSON-RPC parameters such as method name and parameter list.

The JSON-RPC feature in OpenROAD will certainly enable developers to deploy a variety of client applications connecting to OpenROAD Server applications much easier than before.

See examples and demos for different client implementations at Actian OpenROAD JSON-RPC Examples. For more information about OpenROAD please visit our product page here, and if you have any questions feel free to ask our active community here.

About Actian Corporation

Data Architecture

Integrating Python With Vector or Actian Ingres

Actian Corporation

January 12, 2018

Performance testing production workloads for Ingres and Vector

Introduction

Today we’re going to look at how to get started with Actian Vector, our high-performance in-memory analytics database, and Python, one of the most oft-used languages by data scientists in recent times. Using the techniques below, you’ll be able to get started with Python using Vector ODBC and JDBC. (You’ll be able to use some of these techniques with Actian Ingres as well; see the References section at the bottom for more information.) We’ll also discuss some simple basic functions of CURD (Create, Update, Read, and Delete) using Python.

The following code is tested with Python 2.7 32-bit. If you’re new to Vector, this setup should take 2-3 hours, and a bit less than that if you’re familiar with both Vector and Python.

Modules Needed for ODBC and Vector

Pyodbc (which you can get at https://mkleehammer.github.io/pyodbc/)
Vector ODBC driver

Note: pypyodbc would also work for same configuration however the following code is tested with pyodbc.

ODBC Connection Using DSN

In this section we will discuss the basics of making DSN in both Linux and Windows to be used by the ODBC connection.

Linux

In order to get DSN working with Linux, the environment needs to be setup correctly for this to work. Along with other Vector parameter that you source using source .ingVWsh (if installation code is VW) following needs to be exported or you can add the following in .bashrc to get ODBC running.

ODBCSYSINI=$II_SYSTEM/ingres/files
export ODBCSYSINI

Note: please set II_ODBC_WCHAR_SIZE if needed as per your ODBC driver manager.

‘iiodbcadmin’ Ingres utility can be used in Linux to create a DSN as example shown below.

Actian Vector VI Ingres Screenshot

Note: If you are familiar with the Ingres ODBC driver and its settings, you could also edit odbc.ini as an example given below. The details added will be reflected in ‘iiodbcadmin’ utility.

Example:

[ODBC Data Sources]
MYDSN=Ingres

[MyDSN]
Driver=/opt/Actian/VectorVI/ingres/lib/libiiodbcdriver.1.so
Description=
Vendor=Actian Corporation
DriverType=Ingres
HostName=(local)
ListenAddress=VI
Database=test
ServerType=Ingres
user=actian
password=actian

For more details on this check Configure a Data Source (Linux)

Windows

If you are using 32-bit python, you would need 32-bit client runtime to make use of 32-bit ODBC DSN. Likewise for 64-bit python, you would need 64-bit client runtime to make 64-bit ODBC DSN.

More details can be found at Configure a Data Source (Windows).

Here is a screenshot of a 32-bit DSN:

Ingres VT ODBC Screenshot

ODBC Connection in Python

You can use DSN to connect:

import pyodbc as pdb
conn = pdb.connect("dsn= MYDSN " )

With this usename and password:

conn = pdb.connect("dsn=TestDSN;uid=username;pwd=password" )

Here’s another example:

conn = pdb.connect("dsn=TestDSN;uid=actian;pwd=actian123" )

Or you can directly connect without any DSN using the various parameters as example shown below.

conn =  pdb.connect("driver=Ingres;servertype=ingres;server=@localhost,tcp_ip,VW;uid=actian;pwd=actian123;database=test")

Modules Needed for JDBC Connection

jaydebeapi (https://pypi.python.org/pypi/JayDeBeApi/)
Vector JDBC driver ( installed using clientruntime from ESD)

Jaydebeapi can be easily installed using ‘pip install JayDeBeApi’. However if due to certain issues you cannot use pip then you can also install manually. You can download the source code from https://github.com/baztian/jaydebeapi on your computer. Unzip and run ‘python setup.py install’ . However you would need to install dev tools on root (yum install gcc-c++ ) to install it manually.

JDBC Connection in Python

import jaydebeapi as jdb
conn = jdb.connect('com.ingres.jdbc.IngresDriver','jdbc:ingres://localhost:VI7/test' ,driver_args={'user': 'actian', 'password': 'actian'}, jars='iijdbc.jar')

cursor = conn.cursor()

Example: Create Table and Insert Values (Create in CURD)

A Simple Insert

cursor = conn.cursor()

#Drop table:

cursor .execute("DROP TABLE IF EXISTS customer")

#Create table:

cursor.execute("CREATE TABLE customer(customer_no INT NOT NULL PRIMARY KEY,last_name VARCHAR(40),first_name CHAR(40))")

conn.commit()

#Insert rows into customer table:

cursor .execute("INSERT INTO customer VALUES (1, 'Harry', 'Potter')")

cursor .execute("INSERT INTO customer VALUES (2, 'Ron','Weasley')")

cursor .execute("INSERT INTO customer VALUES (3, 'Draco', 'Malfoy')")

To execute single statements, .execute() is used.

Insert using Bind Variables

To insert many rows using bind variables you can do the following:

#Insert rows into customer table:

cursor .execute("INSERT INTO customer VALUES (?, ?, ?)", 4,'A', 'B')

Inserting multiple rows using .executemany()

data =[

(1, ‘Harry’, ‘Potter’),
(2, ‘Ron’,’Weasley’),
(3, ‘Draco’, ‘Malfoy’)]

cursor.executemany(“INSERT INTO customer VALUES (?,?,?)”,data)

#In case you have to insert data only to certain columns,

data =[

(8, 'A', ),
(9, 'B',),
(10, 'C', )]

cursor.executemany("insert into customer(customer_no, first_name) VALUES(?, ?)", data)

Another Example to Insert Many Rows

In case values to the column is to be added in certain ranges like in the following example elements in testid of table test will be added from 0 to 99:

cursor .execute("DROP TABLE IF EXISTS test")

#CREATING TABLE to insert many rows using executemany()

cursor .execute("CREATE TABLE test(testid varchar(100))")

data_to_insert = [(i,) for i in range(100)]

cursor.executemany("INSERT INTO test (testid) VALUES (?)", data_to_insert)

conn.close();

Update Data (the Update in cUrd)

Updating a Single Row

updatedata= ('X', 'Y', 10)

sql = 'UPDATE customer SET last_name = ? , first_name=? where customer_no =? '

cursor.execute(sql, updatedata)

Fetching Data (the Read in cuRd)

Fetching One Row

There are many functions to fetch data from like #fetchone(), fetchall(), etc.:

cursor.execute("select count(*) from customer")

result=cursor.fetchone()

print(result[0])

cursor.close()

conn.close()

Fetching Many Rows

cursor.execute("select First 3 * from customer")

for row in cursor:

print ("Cust_no: ",row[0]," First Name: ",row[2]," Last Name: ",row[2])

cursor.close()

conn.close()

Results will be displayed as:

Cust_no: 1 First Name: Potter Last Name: Potter

Cust_no: 2 First Name: Weasley Last Name: Weasley

Cust_no: 3 First Name: Malfoy Last Name: Malfoy

Deleting Data (the Delete in curD)

Deleting a Single Row

sql = 'DELETE from customer where customer_no =9'

cursor.execute(sql)

Deleting Multiple Rows

#Delete multiple rows using ‘in’:

id_list = [1,2,3]

query_string = "delete from customer where customer_no in (%s)" % ','.join(['?'] * len(id_list))

cursor.execute(query_string, id_list)

Complete Code for Both ODBC /JDBC

Import pyodbc as pdb:

conn = pdb.connect("dsn=TestDSN;uid=vidisha;pwd=vidisha" )

#conn=  pdb.connect("driver=Ingres;servertype=ingres;server=@localhost,tcp_ip,VW;uid=vidisha;pwd=vidisha;database=test”)

# Python 2.7

conn.setdecoding(pdb.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pdb.SQL_WCHAR, encoding='utf-8')
conn.setencoding(str, encoding='utf-8')
conn.setencoding(unicode, encoding='utf-8')
conn.autocommit= True
cursor = conn.cursor()

print("DROPPING TABLE")
cursor .execute("DROP TABLE IF EXISTS customer")
conn.commit

print("nCREATING TABLE ")
cursor .execute("CREATE TABLE customer(customer_no INT NOT NULL PRIMARY KEY,last_name VARCHAR(40),first_name CHAR(40))")

print("INSERTING ROWS TO TABLE customer")
cursor.execute("INSERT INTO customer VALUES (1, 'Harry', 'Potter')")
cursor.execute("INSERT INTO customer VALUES (2, 'Ron','Weasley')")
cursor.execute("INSERT INTO customer VALUES (3, 'Draco', 'Malfoy')")

data =[

(5, 'Harry', 'Potter'),
(6, 'Ron','Weasley'),
(7, 'Draco', 'Malfoy')]

cursor.executemany("INSERT INTO customer VALUES (?,?,?)",data)

#or

data =[

(8, 'A', ),
(9, 'B',),
(10, 'C', )]

cursor.executemany("insert into customer(customer_no, first_name) VALUES(?, ?)", data)

print("DROPPING TABLE")
cursor.execute("DROP TABLE IF EXISTS test")
print("CREATING TABLE to insert many rows using executemany()")
cursor.execute("CREATE TABLE test(testid varchar(100))")
data_to_insert = [(i,) for i in range(100)]
print("Insert multiple data to test")
cursor.executemany("INSERT INTO test (testid) VALUES (?)", data_to_insert)

print("Fetching COUNT OF TABLE")

cursor.execute("select count(*) from customer")
result=cursor.fetchone()
print(result[0])

print("FETCHING MANY ROWS")

cursor.execute("select First 3 * from customer")
results=cursor.fetchall()
for row in results:
    print ("Cust_no: ",row[0]," First Name: ",row[2]," Last Name: ",row[2])

print("UPDATING SINGLE ROW")
updatedata= ('X', 'Y', 10)
sql = 'UPDATE customer SET last_name = ? , first_name=? where customer_no =? '
cursor.execute(sql, updatedata)

print("DELETING A ROW")

sql = 'DELETE from customer where customer_no =9'
cursor.execute(sql)

print("DELETING MULTIPLE ROWS USING 'IN'")
id_list = [1,2,3]
query_string = "delete from customer where customer_no in (%s)" % ','.join(['?'] * len(id_list))
cursor.execute(query_string, id_list)
#close connection
cursor.close()
conn.close()

For JDBC code is same just the connection string is different:

import jaydebeapi as jdb
conn = jdb.connect('com.ingres.jdbc.IngresDriver','jdbc:ingres://localhost:VW7/test' ,driver_args={'user': 'vidisha', 'password': 'vidisha'},jars='iijdbc.jar')
cursor = conn.cursor()

print("DROPPING TABLE")
cursor .execute("DROP TABLE IF EXISTS customer")
conn.commit

print("nCREATING TABLE ")
cursor .execute("CREATE TABLE customer(customer_no INT NOT NULL PRIMARY KEY,last_name VARCHAR(40),first_name CHAR(40))")

print("INSERTING ROWS TO TABLE customer")
cursor.execute("INSERT INTO customer VALUES (1, 'Harry', 'Potter')")
cursor.execute("INSERT INTO customer VALUES (2, 'Ron','Weasley')")
cursor.execute("INSERT INTO customer VALUES (3, 'Draco', 'Malfoy')")

Additional References:

Actian Vector product page and guides
pyodbc
JayDeBeAPI
A more detailed example using Python, Vector and Ingres

About Actian Corporation

Insights

When BIG Data Doesn’t Have to Be

Mary Schulte

January 9, 2018

Using a Vector FARM to Make Big Data Analytics Fast and Easy

I’ve been working in the IT industry for a long time. What goes around comes around. My favorite example is how centralized mainframe timesharing gave birth to decentralized client-server computing. Now we are going back to timesharing, but in the cloud. Technology does evolve. Sometimes new ideas seem like good ideas, but they end up being not-so-great, so we go back and reinvent older, but proven ideas.

Take Data Warehousing and Big Data, for example. We’re all amassing all this big data; it doesn’t easily fit all together on one small computer, so we go create a giant computer or, even better, a cluster of small computers that looks like a giant computer to handle the workload.

Managing Computer Clusters IS HARD

cluster solution big data

On paper, as in this diagram, the cluster looks simple, but in reality, they are complex to set up, hard to keep balanced and running, they introduce a bunch of new problems like sharding skew and workload management, and they actually not that easy to expand… it’s nuts! The whole ecosystem is fragile and difficult.

Unless you’re one of those few corporations that truly have big singular dataset, the ideal of the giant centralized Data Warehouse of everything, from a practical perspective, is just not that productive or necessary. Why make things harder than they need to be?

What makes more sense is a performant, easy-to-understand, easy-to-setup, easy-to manage, easy-to-change environment. For analytics, I propose you take a step back and consider individual servers, but performant and easy to manage ones.

Consider a Vector Farm to easily harvest business objectives.

Vector Farms can be very flexible. You could have a Homogenous Vector Farm in which you have collection of individual servers that look alike, each with the same Actian Vector setup on each one. Management is easy because: each server is independent so there’s no cluster complexity, each one has the same database setup, and it’s easy to add/delete servers without affecting the others. Plus, Actian Vector itself requires little to no tuning so there is little individualized setup. You can even introduce slight variations such that you have a Heterogenous Vector Farm. In which case, even though the application setup might be a little different. Actian Vector provides extreme performance and easy-admin, without the complexity of a cluster. Here are examples:

Independent Groups of Like-Users Using the Same Application but Needing Separate Data

This scenario is the simplest. The Vector Farm is especially useful because it easily allows you to separate user data. An example of this could be a multi-national corporation that legally needs to keep European data on-premises and separate from Canadian and United States data etc. Notice here that everything is the same about the database server setup including the database name. Management of all these servers is the same. Actian Vector provides tremendous query performance to all users.

homogenous vector farm

Independent Groups of Different-Size Users Using the Same Application but Needing Separate Data

In reality, most groups of users are not the same. Take a software-as-a-service (SAAS) provider for example. A SAAS provider might provide a cloud service to a variety of customers, big and small, over the internet. In this case, the provider certainly would not want to dedicate an entire server (virtual or on-premises) to very small customers. In this case, using schemas (a method of creating separate ownership in a singular database), the provider can group small customer usage on one Vector instance, while servicing other with their own instances. In cases, where these is a particularly large customer, they could make that server bigger, but still maintaining the same structure. Management of all these servers is still same using schemas. Actian Vector provides tremendous query performance to all users.

schemas with vector farm

A VERY Large Group of Demanding Users

Another scenario is when the analytics user-base is big and these users need reliable, fast performance. An example of this is a financial services firm that offers a trading application. Traders need very-fast, complex analysis on real time data. A complex cluster, with its overhead of many moving parts, physically cannot provide this. In this scenario, a Homogenous Vector Farm can be used as a pool to service all users. A real-time service bus or message queue can be used to synchronize the multiple-servers in real time. Users are balanced to any one of the available servers in the Farm. Again, management is easy because all the servers are exactly the same. Actian Vector (THE WORLD’S FASTEST ANALYTIC DATABASE) provides tremendous response times.

vector farm large

Independent Users and Applications

The final scenario is a truly Heterogenous Vector Farm. In this scenario, users are not necessarily segmented, there are different analytics applications with different database structures. The applications are different so there is no operational need to keep data in a central location on a complex cluster. Because Actian Vector is so performant, easy to set up and manage, a Vector Farm is an effective way to facilitate support these users and applications.

independent users vector farm

Conclusion

Why torture yourself with a complex cluster if you don’t have to. Because of performance, ease of administration, and little to no custom tuning, it is easy to take advantage of individual servers in an Actian Vector Farm. Because you don’t have to go through the complex analysis of building and maintaining a cluster and because Vector requires little tuning, you can spin up components of your Vector Farm quickly and start “harvesting” business value right away!

More About Actian Vector

Want to learn more about Actian Vector?

You can also download and try Actian Vector yourself (or you can try Vector Community Edition on AWS without having to get an evaluation license). You won’t be disappointed, and if you need help figuring things out, just ask the Community.

About Mary Schulte

Mary Schulte is Senior Sales Engineer at Actian, drawing upon decades of experience with powerhouse database vendors like Informix and Netezza. She has written thousands of lines of Informix 4GL and ESQL/C for global clients, including American Airlines' groundbreaking datablade implementation. Mary has delivered countless training sessions, helping organizations optimize their database environments. Her posts on the Actian blog center on query performance, analytics databases like Vector, and practical tips for leveraging Informix. Browse her articles for expert guidance.

Data Management

Accessing a PSQL Zen Database With Python and ODBC

Actian Corporation

January 4, 2018

As an old-school programmer from a few decades ago, I decided to see how hard it would be to access a Zen (formerly known as PSQL) database from my newly acquired knowledge of Python. In case you’re not familiar with it, Zen is Actian’s embedded zero-admin, nano-footprint hybrid NoSQL and SQL database software, and Python is a high-level programming language frequently used in data science and internet applications (among many other things).

Already armed with some knowledge of Actian Zen all I had to do was complete an EdX introductory course in Python programming, then download and install Python 3.6 along with the corresponding pyodbc library.

The simple program to connect Zen with Python shown below is about 30 lines of code! It connects to the “demodata” database, allocates a cursor, and executes a series of SQL statements to DROP TABLE, CREATE TABLE, INSERT, and SELECT. How easy is that? I even managed to prompt for input and use a parameter in the INSERT for the user-provided value. Here is the code:

import os
import sys
import pyodbc

def main():
   conn_str = 'Driver={Pervasive ODBC Interface};server=localhost;DBQ=demodata'
   db = pyodbc.connect(conn_str)
   c = db.cursor()
   c.execute("DROP TABLE IF EXISTS test_table")
   c.execute("CREATE TABLE test_table (id identity, name char(32), create_date date)")

   isql_dml = """INSERT INTO test_table VALUES (0, ?, CURRENT_DATE())"""
   iinserting = True
   while iinserting:
      new_name = input('Enter name to insert, Q to quit: ' )
      if new_name.lower() == 'q':
         iinserting = False
      else:
         if new_name == '':
            print(' Please enter a non-empty string.')
         else:
            print(' Inserting:', new_name)
            c.execute(isql_dml, (new_name,))
            c.commit()

   c.execute("SELECT COUNT(*) FROM test_table")
   row = c.fetchone()
   if row:
      print('You inserted', row, 'name(s)')
   return 0

if __name__ == "__main__":
   sys.exit(main())

It’s easy to see how simple yet powerful this access is, and how easily it could be used for testing and customer support scenarios.

Download the Actian Zen trial today and write your first Python ODBC app! 30-day trial versions are available right here. If you have any questions about Zen or other Actian products please feel free to ask in our community forums.

About Actian Corporation

Data Integration

Building Actian DataConnect Integrations That Work

Actian Corporation

December 29, 2017

Recently I wrote an article that was entitled, “Why Actian DataConnect Goes Head to Head with the Big Boys”. I thought it might be beneficial to show a use case for why this is important.

As a developer and data integrator, many concepts are important when designing a product. Some of these are repeatability, scalability, stability, reliability, usability, and more. I have two great examples where DataConnect meets this mark.

The first use case was a project for a customer that used an ERP solution for Government Contractors, called Deltek Costpoint. In late 2008, I built 5 integrations that fed data into Costpoint, or pulled data out of Costpoint and sent it outbound to a 3^rd party company using Maximus. These integrations went through customer acceptance testing with few changes needed during the first three months, and very minor changes since then. The great thing about these integrations is that they have been running consistently and reliably for more than 6 years without any maintenance. This speaks highly for the product in terms of stability and reliability.

Actual Case:

The process below takes an Inbound file from IBM Maximo (Enterprise software used to track the operation, maintenance, and disposal of assets), and builds out a work order in Deltek’s Costpoint.

inbound file imb maximo

Ralph Huybrechts, CFO of the Keta Group, LLC, had this to say, “We contracted with Deltek, the software supplier, who assigned David Byrd to write several Pervasive integrations between the prime and subcontractor’s accounting and timekeeping software and Maximo. David wrote, tested and finalized these integrations in a 45-day phase-in period prior to the start of our large base operations support contract with the Army. This contract requires 350 employees and handles 5,000 service orders per month. The integrations have performed flawlessly since the start of the contract in 2010.

The second example I recently spoke about in another article called “Web-Services Best Practice: Using parallel queueing to streamline web-service data loads” and how it is important. For this use case, I designed an integration that would run Accumulator Webservices messages, as well as others, that were stored in a database table. These messages were created by multiple integrations and fed into the table. Here is the cool part – a single integration picks up these different types of messages and then sets the connection parameters on the fly from data stored in the table with the message like the URL endpoint, the user & password credentials and seamlessly processes the web service call and stores the response in the table for later processing. Simply put, this integration connects to multiple Web-Service endpoints without hardcoding the required parameters in the integration, thereby demonstrating Scalability.

Actual Case:

The process below reads a database table for messages to process, takes that message and required credentials, and submits the message to a webservices, waits for the response and stores that back in a table. In addition, the process is run from the command line where a queue number is passed in. This process using similar bat files has run 75 queues at the same time.

webservices bat files

bat files 75 queues

But it goes further, the same design concept can be taken a step further using Oracle CX endpoints. One might consider a similar integration to load Oracle Sales Cloud from data provided from exports from CRM on Demand, however, this is could run into some issues with CRM On Demand Attachments, especially large ones. The bulk export out of CRM OnDemand provides all the small file attachments, but not the large ones. It can provide their Attachment ID, which you feed into the database via CRM Attachment Export requests in bulk, and store each attachment in the database response. So, in this case, not only is data fed into a web service for loading but also to fetch data out of a web service, all through this single integration. The best part of this integration design is that the integration does not care where the XML Message came from or is going to, it just takes the message, connects, and sends it, and then stores the response. This is the height of repeatability.

Actual Case:

The process below is very similar to the process written above. In fact, it pretty much does the same thing. The first exception here is the different types of messages are not going into Healthedge (a solution for the healthcare payor market), they are actually used in a data migration project taking data export from Oracle CRM on Demand, which is then built into message which will be loaded to Oracle Sales Cloud. And the second exception is that this was built in Actian DataConnect V10.

The Bat file below is very similar to the above process. It shows the differences in running a v10 process from the command line compared to DataConnect v9.

bat file dataconnect

task scheduler bat

Chris Fuller-Wigg, Director of Sales Automated Services, stated, “The efficiency gains we experienced when loading data in parallel is kind of unreal, almost 10x faster than serial. We found ourselves losing a whole day for Accounts to load, only to push the button to load Contacts the following day. Cutting out the wait time and letting the system process multiple loads at once allows us to load data 1.5 weeks earlier on average.”

Lawrence Chan, Sr Sales Automated Services Consultant, added, “The value of this solution is not only limited to the incredible improvements in data migration speed. With one click, we can have your system’s data up to date the day before going live with one click of a button. With proper planning in place, those late nights getting your data up to date will be a thing of the past.”

The fact the Actian DataConnect can be found to fulfill the meaning of these terms satisfies the ultimate customer experience. The customer here is two-fold, the first is the developer being able to define and build a trusted flexible integration, and the second is the end-customer getting the data to work the way they want it. This is a win for Actian, a win for the Developer and a win for the End-Customer.

You can read more about Actian DataConnect here and if you have any questions feel free to ask the DataConnect Community right here.

About Actian Corporation

Data Architecture

High-Performance Real-Time Analytics on Hadoop Data

Mary Schulte

December 27, 2017

The Challenge

I have spent many years working with Actian’s customers on database solutions, and thought it’d be useful to discuss a recent customer experience at a large media company (I will call it “XYZCo”). This experience is similar to what I continue to see with other customers, and the lessons learned apply to both Hadoop and non-Hadoop use cases alike.

In this instance, the Big Data Analytics team at XYZCo came to Actian because they needed high-performance real-time analytics for their Hadoop data, and their current systems were not able to meet business expectations. They were under tremendous pressure to enhance the productivity of their data analysts as analytics on some large data sets would never finish, and they needed to reduce data processing overhead and delays to enable real-time data warehouse analytics and reporting.

The Situation

The customer had two very large fact and dimension tables in their database. To answer business questions about customer churn, lifetime value, and market, they needed to join the two together to run queries.

Because legacy databases and Hadoop solutions are so inefficient at joining data (sometimes, large joins never finish), XYZCo had adopted a practice of joining the tables together and creating an intermediate pre-joined/materialized table, against which subsequent queries were run. This methodology functioned but it had several problems: the pre-join table, materialization, was HUGE; it was exponentially the size of table1 and table2 (not just copies of table1 plus table2), plus immediately after creation, the materialization was “stale”; it could not be constantly updated from ever-changing table1,table2, but remained with the old values at the time of creation. Furthermore, the query performance against this materialization was not that great, and they had some queries that just wouldn’t work at all… the data was just too big.

Furthermore, this customer had large volumes of structured data in a Hadoop data store. They were currently using Hive along with SQL-based analytics tools, including Tableau. The customer was used to pre-joining data across multiple source tables with the “perceived” benefit of simpler/faster querying by other tools and processes downstream. Plus, most of these aggregates/joins were exported to CSV for input into other tools or materialized for “perceived” simpler SQL, because the schema became one flattened source table.

Comparing Solutions

Old-School Materialization Test

XYZCo asked us to also perform the old-school style materialization test using Actian VectorH, Actian’s high-performance columnar SQL database that runs natively in Hadoop, with the idea that this would be a direct comparison to Hive. We did so.

A large part of any materialization is the time incurred doing I/O to write out the large, materialized result set. Although VectorH was much faster than Hive on the join itself, the I/O portion of the entire materialization process dwarfed any join component – fast or slow. Despite being much faster on the join itself, VectorH, in this instance, performed the entire materialization process in about the same timeframe as Hive because so much of the time was writing out the result set, which was the same for both products.

Once built, we ran the queries against the materialization, VectorH was about twice as fast as the other product. Remember for later reading that the materialization is exponential in size to the source tables.

Although impressive, these results pale in comparison to what I will tell you next about the Actian on-demand approach.

Real-Time Analytics With Actian VectorH On-Demand

After performing the above, old-school intermediate materialization test, we wanted to show the customer a better way. We demonstrated that with VectorH, queries against the base data (without intermediate materialization step) with the join done on-demand, was orders of magnitude faster.

actian vectorh on demand approach

Here is the raw data captured on customer Hive and VectorH systems for single run results. Note that the VectorH tests were run on a cluster half the size of the Hive cluster. VectorH exhibits excellent, linear-scalability characteristics so we normalized the results for an equivalent platform.

As mentioned before, a materialization was in the same ball park, VectorH was twice as fast on queries against a materialization. What is more interesting, however, are the cases where Hive was unable to complete any on-demand join scenarios and Actian VectorH showed tremendous speed.

customer evidence actian vectorh

The above benefits are for single runs. What happens in real life situations with hundreds of customer queries and ongoing updating of data?

Below is a table that shows what happens with just 100 and 200 ensuing customer queries, plus one refresh of customer data. Notice that with the speed of VectorH, the old solution, even using pre-materializations is always slower, incurs time for re-materialization, and will just never catch up. The Actian VectorH approach crushes the competitor in performance. This is in addition to the other benefits of the results being on current data rather than stale data, saved disk space, less admin, and so forth.

serial query example actian

Please also note, that these results don’t even allow for concurrency. VectorH has incredible concurrency. The numbers noted above would be even better for VectorH had we captured such numbers in a concurrent situation.

Conclusion

In the on-demand solution, query results ALWAYS reflected current data and not results from a stale materialization, which allowed for real-time analytics and more accurate results.

Analysts no longer had to wait around for a snapshot to be created and their joins could be done ad-hoc.

Net performance was significantly faster than against a giant pre-aggregation because each query scans only the data it needs rather than every row in a gigantic materialization, resulting in less expensive-I/O and no downtime for rebuilding intermediate materialization.

Views were used to mask join-logic to end users, thereby simplifying queries. Since there is no duplication of data via materialization, storage space is kept to a minimum.

Without a fast JOIN engine like VectorH, this approach is usually impractical for very large data sets.

Next Steps

Want to learn more about Actian VectorH?

You can also download and try Actian Vector yourself. You won’t be disappointed, and if you need help figuring things out, just ask the Community or contact eval@actian.com to request a 30 day evaluation of VectorH with free Enterprise Support.

About Mary Schulte

Data Architecture

Fast. Faster. Fastest.

Actian Corporation

December 22, 2017

I was fortunate enough to attend the Actian Hybrid Data Conference at which Craig Strong, who is Chief Technology and Product Officer at Hubble, talked about how they provide real-time corporate performance management (CPM) through high-speed analytics dashboards. You can watch a video recording of this presentation on the conference sessions page linked.

For me, the highlight of the presentation was when Craig shared the results of the performance tests they had conducted across a selection of databases including SQLServer, MemSQL, SAP, Presto, Spark and RedShift. I was blown away by just how much faster Actian Vector was in comparison to the other databases. In the chart below I have organized the results so you can see the fastest query times at the top:

Hubble Results

Hubble used complex queries against an ERP data set. I am interested to see similar tests against other data sets as the CPU optimization technology Vector uses is equally applicable to a broad range of other analytical use cases.

About Actian Corporation

Data Management

How I Analyzed 165 Million Flight Records in Seconds on My Laptop

Actian Corporation

December 19, 2017

It was surprisingly easy to analyze 165 million flight records my laptop. It took me just an afternoon following the Actian Evaluation guide that you can download from here.

Scientists with Intel over the years needed to bring down the cost of high-performance computing. The key vector processing technology feature they needed was to analyze large arrays of data in a single CPU instruction cycle. Actian has accelerated standard SQL database requests to take advantage of vectorization. Actian Vector translates standard SQL into relational algebra so your queries can respond often in 100th of the time it would have with a standard relational database. Since joining Actian, I have seen demonstrations and heard customers rave about Actian Vector, so I jumped on the idea of trying it for myself so I could create a how-to video. The evaluation guide stepped me through the database install, sample data load, and provided queries to run against the 165 million row data set containing historic airline flight records.

My laptop has a multi-core 64-bit Intel processor and an available 106 GB of disk space needed to try Vector for myself. It took me just an afternoon to to run through the process of downloading the software with the raw flight data, create database, installing, loading and running the six supplied queries. Unzipping the more than 300 CSV files for the raw data was the longest step. The supplied load scripts create a fact table and a single-dimension table. I didn’t create any indexes or perform any tuning. I created the tables and generated statistics to inform the query optimizer about the data.

I have installed databases including relational databases Oracle, DB/2 and SQL/DS. Never has getting to this kind of performance been so easy. I recorded the whole process and edited it down to a seven-minute video so you can see every step for yourself by clicking here.

About Actian Corporation

Databases

Database History in the Making: Ingres Meets X100

Emma McGrattan

December 19, 2017

Doug Inkster is an Actian Fellow with a long history in the database market, starting with IDMS in the 1970s and moving to Ingres in the 1990s. We asked him to reminisce about the most exciting times in his long career. Here are some of his thoughts:

In my 40+ years of working with and developing database management software, one of the best days was meeting Peter Boncz and Marcin Żukowski for the first time. I was in Redwood City leading query optimizer training for the performance engineering team at Ingres Corp. (now Actian), and Peter and Marcin were in the Bay Area to give a lecture at Stanford University.

Dan Koren, director of performance engineering, invited them to discuss the MonetDB/X100 technology, which was the subject of Marcin’s Ph.D. research under Peter’s guidance. Dan was a great fan of the MonetDB research done largely by Peter at CWI (the Dutch government-funded centre for research in mathematics and computer science) in Amsterdam and X100 was a follow-on from MonetDB.

The day started with just the four of us in a conference room at Ingres headquarters and Marcin kicked it off with a quick overview of their Stanford presentation. Peter & Marcin experimented by comparing a variety of row store DBMS’s running the first query from the TPC H benchmark to a hand-written C program equivalent to the same query. The hand-written program was far faster than the fastest DBMS and led to the X100 research project (thus named because of their “modest” goal of beating current database performance by a factor of 100).

X100 Performance Graph

Their research quickly concluded that the complexity of row stores was not limited solely to their representation on disk. The processing of row store data once in memory of a database server is still highly complex. The complexity of the code defies cache attempts to take advantage of locality and in-order instruction execution. In fact, some column stores suffer the same processing problems by converting the column store format to rows once the data is in server memory. Addressing the columns in question, then performing the operations cell by cell consumes many machine cycles.

They had already addressed some of the problem issues with MonetDB, but it still was bound by issues of query complexity and scalability. X100 introduced the idea of processing “vectors” of column data at a single time and streaming them from operator to operator. Rather than computing expressions on the column of one row at a time, or comparing column values from single rows at a time, the X100 execution engine processes operators on vectors of column values with a single invocation of expression handling routines. The routines take the vectors as parameters and consist of simple loops to process all the values in the supplied vectors. This type of code compiles very well in modern computer architectures, taking advantage of the pipelining of loops, benefitting from locality of reference and, in some cases, introducing SIMD instructions (single instruction, multiple data) which can operate on all values of the input vector at the same time.

The result was the concurrent reduction of the instructions per tuple and cycles per instruction, leading to a massive improvement in performance. I had remembered the old scientific computers of the 1970s (CDC, Cray, etc.), which also had the ability to execute certain instructions on vectors of data simultaneously. Back in the day however, those techniques were reserved for highly specialized scientific processing – weather forecasting and so forth. Even the modern re-introduction of such hardware features was more directed towards multi-media applications and computer games. The fact that Peter and Marcin had leveraged them to solve ancient database processing problems was brilliant!

Of course, there was more to their research than just that. A major component of X100 was the idea of using the memory hierarchy – disk to main memory to cache – as effectively as possible. Data is compressed (lightly) on disk and only decompressed when the vectors of values are set to be processed. Sizes of vectors are optimized to balance the I/O throughput with the cache capacity. But for me, the excitement (and amusement at the same time) was in seeing hardware designed for streaming movies and playing Minecraft could be used so effectively in such a fundamental business application as database management.

The subsequent uptake of the X100 technology by Ingres led quickly to record breaking (smashing was more like it) TPC H performance and some of the most enjoyable years of my professional career.

Note: Ingres is still a popular row-oriented RDBMS supporting mission critical applications, while X100 delivers industry-leading query performance in both the Actian Vector analytic database and the Actian X hybrid database, a combination of Ingres and X100 technologies capable of handling both row-based and column-based tables.

About Emma McGrattan

Emma McGrattan is CTO at Actian, leading global R&D in high-performance analytics, data management, and integration. With over two decades at Actian, Emma holds multiple patents in data technologies and has been instrumental in driving innovation for mission-critical applications. She is a recognized authority, frequently speaking at industry conferences like Strata Data, and she's published technical papers on modern analytics. In her Actian blog posts, Emma tackles performance optimization, hybrid cloud architectures, and advanced analytics strategies. Explore her top articles to unlock data-driven success.

Data Management

Connecting to Actian Ingres With PHP and NGINX

Actian Corporation

December 12, 2017

Not long ago I spoke at the Actian Hybrid Data Conference in London about connecting Actian Ingres, Actian’s combination of industry-leading transactional and high-speed analytic databases, to the Web. Here we present how to use a very popular web server (reverse proxy, load balancing, etc.) on Linux, NGINX, to do just that. Here’s how you can set up NGINX for the Ingres ODBC driver and a generic PHP ODBC connector on CentOS. Adapting these steps to other Linux distributions, like Ubuntu or SUSE should not be too difficult.

Setting Up PHP With NGINX

Instructions on setting up NGINX can be found online in multiple places. One good set of instructions is How To Install the LEMP stack On CentOS 7. You would only need NGINX and PHP (no need for MySQL – obviously you would use Actian Ingres) and additionally, you would have to have Ingres installed. Once PHP and NGINX are completely set up, you can proceed to the next step.

Setting Up Ingres ODBC With PHP

A generic ODBC package is required for the Ingres ODBC driver to work with PHP. One popular choice is the php-odbc extension. It doesn’t come out of the box with the php package but is usually available on all major Linux distributions as an add-on and it can be easily installed. On CentOS you would run

yum install php-odbc

Note: The PHP version may differ and because of that the name of the packages may differ. For example php may be the name of package for PHP 5.4, but if you want PHP 7.0, then you would install php70w. The name of the additional packages would differ in the same way (e. g. php-odbc vs. php70w-odbc).

Another common ODBC PHP extension is PHP Data Objects (PDO).

NGINX Configuration

Those of you who are familiar with Apache would note that setting up NGINX is a little more complex given that there are two pieces that are interconnected to run PHP for the web. The NGINX engine is one, but a PHP process manager is also required. This is why there are two sets of settings, one for the NGINX server, the other for php-fpm (PHP process manager). For the Ingres ODBC driver only php-fpm needs to be configured.

As discussed, this example is for CentOS, but it works similarly for other distributions, though the location of the configuration file may be different. To find out what that is, I suggest checking the instructions for setting up php-fpm for the desired distribution.

Edit the php-fpm configuration file (/etc/php-fpm.d/www.conf). Add the II_SYSTEM directory value and the value of the LD_LIBRARY_PATH environment variable to this file as environment parameters as shown in the example below.

env[II_SYSTEM] = /opt/Actian/IngresII
   env[LD_LIBRARY_PATH] = /lib:/usr/lib:/opt/Actian/IngresII/ingres/lib:/opt/Actian/IngresII/ingres/lib/lp32
   env[ODBCSYSINI] = /opt/Actian/IngresII/files

A restart of the php-fpm service would be required after making these configuration changes.

Other Resources

Detailed instructions on setting up PHP with the Ingres ODBC driver along with examples are available at Actian Knowledge Base – Ingres ODBC with PHP.

Automatically Synchronize Your OLTP and OLAP Tables in Ingres 11

About Alex Hanshaw

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Python & Btrieve 2 on Windows: Accessing Zen Data With NoSQL

About Actian Corporation

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Simplifying Development With OpenROAD JSON-RPC

About Actian Corporation

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Integrating Python With Vector or Actian Ingres

Introduction

Modules Needed for ODBC and Vector

Windows

ODBC Connection in Python

Modules Needed for JDBC Connection

JDBC Connection in Python

Example: Create Table and Insert Values (Create in CURD)

A Simple Insert

Insert using Bind Variables

Inserting multiple rows using .executemany()

Another Example to Insert Many Rows

Update Data (the Update in cUrd)

Updating a Single Row

Fetching Data (the Read in cuRd)

Fetching One Row

Fetching Many Rows

Deleting Data (the Delete in curD)

Deleting a Single Row

Deleting Multiple Rows

About Actian Corporation

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

When BIG Data Doesn’t Have to Be

Using a Vector FARM to Make Big Data Analytics Fast and Easy

Managing Computer Clusters IS HARD

Independent Groups of Like-Users Using the Same Application but Needing Separate Data

Independent Groups of Different-Size Users Using the Same Application but Needing Separate Data

A VERY Large Group of Demanding Users

Independent Users and Applications

Conclusion

More About Actian Vector

About Mary Schulte

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Accessing a PSQL Zen Database With Python and ODBC

About Actian Corporation

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

Building Actian DataConnect Integrations That Work

About Actian Corporation

Related Tags

Subscribe to the Actian Blog

Subscribe

Thank you for subscribing to the Actian Blog!

Ready to Get Started?

High-Performance Real-Time Analytics on Hadoop Data

The Challenge

The Situation

Comparing Solutions