Data Management

How I Analyzed 165 Million Flight Records in Seconds on My Laptop

Actian Corporation

December 19, 2017

blue background depicting flights

It was surprisingly easy to analyze 165 million flight records my laptop. It took me just an afternoon following the Actian Evaluation guide that you can download from here.

Scientists with Intel over the years needed to bring down the cost of high-performance computing. The key vector processing technology feature they needed was to analyze large arrays of data in a single CPU instruction cycle. Actian has accelerated standard SQL database requests to take advantage of vectorization. Actian Vector translates standard SQL into relational algebra so your queries can respond often in 100th of the time it would have with a standard relational database. Since joining Actian, I have seen demonstrations and heard customers rave about Actian Vector, so I jumped on the idea of trying it for myself so I could create a how-to video. The evaluation guide stepped me through the database install, sample data load, and provided queries to run against the 165 million row data set containing historic airline flight records.

My laptop has a multi-core 64-bit Intel processor and an available 106 GB of disk space needed to try Vector for myself. It took me just an afternoon to to run through the process of downloading the software with the raw flight data, create database, installing, loading and running the six supplied queries.  Unzipping the more than 300 CSV files for the raw data was the longest step. The supplied load scripts create a fact table and a single-dimension table. I didn’t create any indexes or perform any tuning. I created the tables and generated statistics to inform the query optimizer about the data.

I have installed databases including relational databases Oracle, DB/2 and SQL/DS. Never has getting to this kind of performance been so easy. I recorded the whole process and edited it down to a seven-minute video so you can see every step for yourself by clicking here.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Databases

Database History in the Making: Ingres Meets X100

Emma McGrattan

December 19, 2017

background for strata data new york

Doug Inkster is an Actian Fellow with a long history in the database market, starting with IDMS in the 1970s and moving to Ingres in the 1990s.  We asked him to reminisce about the most exciting times in his long career.  Here are some of his thoughts:

In my 40+ years of working with and developing database management software, one of the best days was meeting Peter Boncz and Marcin Żukowski for the first time. I was in Redwood City leading query optimizer training for the performance engineering team at Ingres Corp. (now Actian), and Peter and Marcin were in the Bay Area to give a lecture at Stanford University.

Dan Koren, director of performance engineering, invited them to discuss the MonetDB/X100 technology, which was the subject of Marcin’s Ph.D. research under Peter’s guidance. Dan was a great fan of the MonetDB research done largely by Peter at CWI (the Dutch government-funded centre for research in mathematics and computer science) in Amsterdam and X100 was a follow-on from MonetDB.

The day started with just the four of us in a conference room at Ingres headquarters and Marcin kicked it off with a quick overview of their Stanford presentation. Peter & Marcin experimented by comparing a variety of row store DBMS’s running the first query from the TPC H benchmark to a hand-written C program equivalent to the same query. The hand-written program was far faster than the fastest DBMS and led to the X100 research project (thus named because of their “modest” goal of beating current database performance by a factor of 100).

X100 Performance Graph

Their research quickly concluded that the complexity of row stores was not limited solely to their representation on disk. The processing of row store data once in memory of a database server is still highly complex. The complexity of the code defies cache attempts to take advantage of locality and in-order instruction execution. In fact, some column stores suffer the same processing problems by converting the column store format to rows once the data is in server memory. Addressing the columns in question, then performing the operations cell by cell consumes many machine cycles.

They had already addressed some of the problem issues with MonetDB, but it still was bound by issues of query complexity and scalability. X100 introduced the idea of processing “vectors” of column data at a single time and streaming them from operator to operator. Rather than computing expressions on the column of one row at a time, or comparing column values from single rows at a time, the X100 execution engine processes operators on vectors of column values with a single invocation of expression handling routines. The routines take the vectors as parameters and consist of simple loops to process all the values in the supplied vectors. This type of code compiles very well in modern computer architectures, taking advantage of the pipelining of loops, benefitting from locality of reference and, in some cases, introducing SIMD instructions (single instruction, multiple data) which can operate on all values of the input vector at the same time.

The result was the concurrent reduction of the instructions per tuple and cycles per instruction, leading to a massive improvement in performance. I had remembered the old scientific computers of the 1970s (CDC, Cray, etc.), which also had the ability to execute certain instructions on vectors of data simultaneously. Back in the day however, those techniques were reserved for highly specialized scientific processing – weather forecasting and so forth. Even the modern re-introduction of such hardware features was more directed towards multi-media applications and computer games. The fact that Peter and Marcin had leveraged them to solve ancient database processing problems was brilliant!

Of course, there was more to their research than just that. A major component of X100 was the idea of using the memory hierarchy – disk to main memory to cache – as effectively as possible. Data is compressed (lightly) on disk and only decompressed when the vectors of values are set to be processed. Sizes of vectors are optimized to balance the I/O throughput with the cache capacity. But for me, the excitement (and amusement at the same time) was in seeing hardware designed for streaming movies and playing Minecraft could be used so effectively in such a fundamental business application as database management.

The subsequent uptake of the X100 technology by Ingres led quickly to record breaking (smashing was more like it) TPC H performance and some of the most enjoyable years of my professional career.

Note:  Ingres is still a popular row-oriented RDBMS supporting mission critical applications, while X100 delivers industry-leading query performance in both the Actian Vector analytic database and the Actian X hybrid database, a combination of Ingres and X100 technologies capable of handling both row-based and column-based tables.

emma mcgrattan blog

About Emma McGrattan

Emma McGrattan is CTO at Actian, leading global R&D in high-performance analytics, data management, and integration. With over two decades at Actian, Emma holds multiple patents in data technologies and has been instrumental in driving innovation for mission-critical applications. She is a recognized authority, frequently speaking at industry conferences like Strata Data, and she's published technical papers on modern analytics. In her Actian blog posts, Emma tackles performance optimization, hybrid cloud architectures, and advanced analytics strategies. Explore her top articles to unlock data-driven success.
Data Management

Connecting to Actian Ingres With PHP and NGINX

Actian Corporation

December 12, 2017

circles depicting data processing

Not long ago I spoke at the Actian Hybrid Data Conference in London about connecting Actian Ingres, Actian’s combination of industry-leading transactional and high-speed analytic databases, to the Web. Here we present how to use a very popular web server (reverse proxy, load balancing, etc.) on Linux, NGINX, to do just that. Here’s how you can set up NGINX for the Ingres ODBC driver and a generic PHP ODBC connector on CentOS. Adapting these steps to other Linux distributions, like Ubuntu or SUSE should not be too difficult.

Setting Up PHP With NGINX

Instructions on setting up NGINX can be found online in multiple places. One good set of instructions is How To Install the LEMP stack On CentOS 7. You would only need NGINX and PHP (no need for MySQL – obviously you would use Actian Ingres) and additionally, you would have to have Ingres installed. Once PHP and NGINX are completely set up, you can proceed to the next step.

Setting Up Ingres ODBC With PHP

A generic ODBC package is required for the Ingres ODBC driver to work with PHP. One popular choice is the php-odbc extension. It doesn’t come out of the box with the php package but is usually available on all major Linux distributions as an add-on and it can be easily installed. On CentOS you would run

yum install php-odbc

Note: The PHP version may differ and because of that the name of the packages may differ. For example php may be the name of package for PHP 5.4, but if you want PHP 7.0, then you would install php70w. The name of the additional packages would differ in the same way (e. g. php-odbc vs. php70w-odbc).

Another common ODBC PHP extension is  PHP Data Objects (PDO).

NGINX Configuration

Those of you who are familiar with Apache would note that setting up NGINX is a little more complex given that there are two pieces that are interconnected to run PHP for the web. The NGINX engine is one, but a PHP process manager is also required. This is why there are two sets of settings, one for the NGINX server, the other for php-fpm (PHP process manager). For the Ingres ODBC driver only php-fpm needs to be configured.

As discussed, this example is for CentOS, but it works similarly for other distributions, though the location of the configuration file may be different. To find out what that is, I suggest checking the instructions for setting up php-fpm for the desired distribution.

Edit the php-fpm configuration file (/etc/php-fpm.d/www.conf). Add the II_SYSTEM directory value and the value of the LD_LIBRARY_PATH environment variable to this file as environment parameters as shown in the example below.

env[II_SYSTEM] = /opt/Actian/IngresII
   env[LD_LIBRARY_PATH] = /lib:/usr/lib:/opt/Actian/IngresII/ingres/lib:/opt/Actian/IngresII/ingres/lib/lp32
   env[ODBCSYSINI] = /opt/Actian/IngresII/files

A restart of the php-fpm service would be required after making these configuration changes.

Other Resources

Detailed instructions on setting up PHP with the Ingres ODBC driver along with examples are available at Actian Knowledge Base – Ingres ODBC with PHP.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Insights

Fast Load From Amazon S3 to Actian Vector via Apache Spark

Actian Corporation

December 6, 2017

actian vector and apache spark

One of the questions we get asked for Vector Cloud deployments is how to load data from Amazon S3 into Vector in a fast and convenient way. This Blog should help answer some of your questions with a step-by-step guide.

S3 is a popular object store for different types of data – log files, photos, videos, static websites, file backups, exported database/CRM data, IoT data, etc. To perform meaningful analytics on this data, you must be able to move it quickly and directly into your choice of an analytic database for rapid insights into that data.

For the purpose of this blog we are going to be using our recently announced Vector Community Edition AMI on the AWS Marketplace. This free AMI gives the Developer Community a 1-Click deployment option for Vector and is the fastest way to have it running in the AWS Cloud.

Different vendors offer different solutions for loading data and we wanted to deliver a parallel, scalable solution that uses some of the best open-source technologies to provide direct loading from S3 into Vector.

In this blog, we introduce the Spark Vector loader. It’s been built from the ground up to enable Spark to write data into Vector in a parallel way. You don’t need to be an expert on Apache Spark to follow the instructions in this Blog. You can just copy the steps to learn as you go along!

NOTE: If you’re familiar with Vector, vwload is Vector’s native utility to load data in parallel into Vector — it’s one of the fastest ways to get data into Vector. vwload currently supports a local filesystem or HDFS for reading input files. With the Spark Vector loader, you can directly load from filesystems such as S3, Windows Azure Storage Blob, Azure Data Lake, and others. Secondly, you also can achieve parallelization within the same file since Spark automatically partitions a single file across multiple workers for a high degree of read parallelism. With vwload, you need to split the files manually and provide the splits as input to vwload. A third benefit of using the Spark loader is that it selects the file partitions based on the number of machine cores which makes the data loading scale with the number of cores even with a single input file. vwload scales with more cores too, but you need to increase the number of source input files to see this benefit.

Step 1: Access to a Vector Instance

Go ahead and spin up a Vector instance using the Vector Community Edition on the AWS Marketplace. For this demonstration, we recommend launching the instance in the US East Region (N. Virginia) and specifying at least a m4.4xlarge instance (8 physical cores).

NOTE: For performance reasons, you would want to have the EC2 instance in the same region as the S3 bucket where your data resides. In this tutorial, our S3 data resides in US East (N. Virginia).

Step 2: Login to the Vector Instance

After you have your Vector instance running, ssh into it as user actian using your private key and the EC2 instance:

ssh -i <your .pem file> actian@<public DNS of the EC2 instance>

NOTE: For more information about connecting to the Vector instance, see Starting the Vector Command Line Interface.

Step 3: Download Spark

After you are logged in to Vector, create a directory to store the temporary files you will be working with and switch to it:

mkdir ~/work
cd ~/work

Download and extract the pre-built version of Apache Spark:

wget https://www.namesdir.com/mirrors/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

If the previous wget command does not work or is too slow, point your browser to https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz and replace the URL parameter for wget above with one of the mirrors on that page.

Extract the downloaded Spark archive:

tar xvf spark-2.2.0-bin-hadoop2.7.tgz

Step 4: Setup JRE in Your PATH

A Java Runtime is required to run the Spark Vector loader.

Vector includes a bundled JRE. Set the PATH to include it:

export PATH=/opt/Actian/VectorVW/ingres/jre/bin:${PATH}

Step 5: Download the Spark Vector Loader

Get the Spark Vector loader for Spark 2.x and extract it:

wget https://esdcdn.actian.com/Vector/spark/spark_vector_loader-assembly-2.0-2.tgz

tar xvzf spark_vector_loader-assembly-2.0-2.tgz

Step 6: Set Up Database and Create the Schema

Create the vectordb database that we you use to load data into:

createdb vectordb

Connect to the database using the sql tool:

sql vectordb

Now you will enter a couple of SQL commands in the interactive shell to create the schema that matches with the demo on-time data that you are about to load.

Copy the following commands below and paste them into the shell:

create table ontime(
year integer not null,
quarter i1 not null,
month i1 not null,
dayofmonth i1 not null,
dayofweek i1 not null,
flightdate ansidate not null,
uniquecarrier char(7) not null,
airlineid integer not null,
carrier char(2) default NULL,
tailnum varchar(50) default NULL,
flightnum varchar(10) not null,
originairportid integer default NULL,
originairportseqid integer default NULL,
origincitymarketid integer default NULL,
origin char(5) default NULL,
origincityname varchar(35) not null,
originstate char(2) default NULL,
originstatefips varchar(10) default NULL,
originstatename varchar(46) default NULL,
originwac integer default NULL,
destairportid integer default NULL,
destairportseqid integer default NULL,
destcitymarketid integer default NULL,
dest char(5) default NULL,
destcityname varchar(35) not null,
deststate char(2) default NULL,
deststatefips varchar(10) default NULL,
deststatename varchar(46) default NULL,
destwac integer default NULL,
crsdeptime integer default NULL,
deptime integer default NULL,
depdelay integer default NULL,
depdelayminutes integer default NULL,
depdel15 integer default NULL,
departuredelaygroups integer default NULL,
deptimeblk varchar(9) default NULL,
taxiout integer default NULL,
wheelsoff varchar(10) default NULL,
wheelson varchar(10) default NULL,
taxiin integer default NULL,
crsarrtime integer default NULL,
arrtime integer default NULL,
arrdelay integer default NULL,
arrdelayminutes integer default NULL,
arrdel15 integer default NULL,
arrivaldelaygroups integer default NULL,
arrtimeblk varchar(9) default NULL,
cancelled i1 default NULL,
cancellationcode char(1) default NULL,
diverted i1 default NULL,
crselapsedtime integer default NULL,
actualelapsedtime integer default NULL,
airtime integer default NULL,
flights integer default NULL,
distance integer default NULL,
distancegroup i1 default NULL,
carrierdelay integer default NULL,
weatherdelay integer default NULL,
nasdelay integer default NULL,
securitydelay integer default NULL,
lateaircraftdelay integer default NULL,
firstdeptime varchar(10) default NULL,
totaladdgtime varchar(10) default NULL,
longestaddgtime varchar(10) default NULL,
divairportlandings varchar(10) default NULL,
divreacheddest varchar(10) default NULL,
divactualelapsedtime varchar(10) default NULL,
divarrdelay varchar(10) default NULL,
divdistance varchar(10) default NULL,
div1airport varchar(10) default NULL,
div1airportid integer default NULL,
div1airportseqid integer default NULL,
div1wheelson varchar(10) default NULL,
div1totalgtime varchar(10) default NULL,
div1longestgtime varchar(10) default NULL,
div1wheelsoff varchar(10) default NULL,
div1tailnum varchar(10) default NULL,
div2airport varchar(10) default NULL,
div2airportid integer default NULL,
div2airportseqid integer default NULL,
div2wheelson varchar(10) default NULL,
div2totalgtime varchar(10) default NULL,
div2longestgtime varchar(10) default NULL,
div2wheelsoff varchar(10) default NULL,
div2tailnum varchar(10) default NULL,
div3airport varchar(10) default NULL,
div3airportid integer default NULL,
div3airportseqid integer default NULL,
div3wheelson varchar(10) default NULL,
div3totalgtime varchar(10) default NULL,
div3longestgtime varchar(10) default NULL,
div3wheelsoff varchar(10) default NULL,
div3tailnum varchar(10) default NULL,
div4airport varchar(10) default NULL,
div4airportid integer default NULL,
div4airportseqid integer default NULL,
div4wheelson varchar(10) default NULL,
div4totalgtime varchar(10) default NULL,
div4longestgtime varchar(10) default NULL,
div4wheelsoff varchar(10) default NULL,
div4tailnum varchar(10) default NULL,
div5airport varchar(10) default NULL,
div5airportid integer default NULL,
div5airportseqid integer default NULL,
div5wheelson varchar(10) default NULL,
div5totalgtime varchar(10) default NULL,
div5longestgtime varchar(10) default NULL,
div5wheelsoff varchar(10) default NULL,
div5tailnum varchar(10) default NULL,
lastCol varchar(10) default NULL
)
g

create table carriers(ccode char(2) collate ucs_basic, carrier char(25) collate ucs_basic )
g

INSERT INTO carriers VALUES ('AS','Alaska Airlines (AS)'), ('AA','American Airlines (AA)'), ('DL','Delta Air Lines (DL)'), ('EV','ExpressJet Airlines (EV)'), ('F9','Frontier Airlines (F9)'), ('HA','Hawaiian Airlines (HA)'), ('B6','JetBlue Airways (B6)'), ('OO','SkyWest Airlines (OO)'), ('WN','Southwest Airlines (WN)'), ('NK','Spirit Airlines (NK)'), ('UA','United Airlines (UA)'), ('VX','Virgin America (VX)')
g

Now that you’ve setup the schema, exit out of the sql shell. Enter:

q

You are back in the Linux shell.

Step 7: Get and Set AWS Keys

To access the demo data on S3, you must provide your AWS access keys associated with the IAM user. These are 2 values: access key ID and secret access key.

If you are not familiar with IAM access keys, please read https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey to understand how to retrieve or create access keys.

After you have retrieved your access keys, please set them in your environment as follows:

export AWS_ACCESS_KEY_ID=<Your Access Key ID>
export AWS_SECRET_ACCESS_KEY=<You Secret Access Key>

Step 8: Run Spark-Submit to Perform the Actual Load

Now you’re ready to run the Spark loader. The demo data is supplied in 4 CSV files. Each file part is about 18 GB and contains approximately 43 million rows.

Run the following command to load Part 1:

spark-2.2.0-bin-hadoop2.7/bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.2 --class com.actian.spark_vector.loader.Main /home/actian/work/spark_vector_loader-assembly-2.0.jar load csv -sf "s3a://esdfiles/Vector/actian-ontime/On_Time_Performance_Part1.csv" -vh localhost -vi VW -vd vectordb -tt ontime -sc "," -qc '"'

This runs a Spark job and use the Spark Vector loader to load data from the file On_Time_On_Time_Performance_Part1 into Vector.

On my m4.4xlarge instance in the US East (N. Virginia) region, this took about 4 minutes and 23 seconds.

Once the loading completes, you will see an INFO message on the console log:

INFO VectorRelation: Loaded 43888241 records into table ontime

Repeat for the other 3 parts:

spark-2.2.0-bin-hadoop2.7/bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.2 --class com.actian.spark_vector.loader.Main /home/actian/work/spark_vector_loader-assembly-2.0.jar load csv -sf "s3a://esdfiles/Vector/actian-ontime/On_Time_Performance_Part2.csv" -vh localhost -vi VW -vd vectordb -tt ontime -sc "," -qc '"'

spark-2.2.0-bin-hadoop2.7/bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.2 --class com.actian.spark_vector.loader.Main /home/actian/work/spark_vector_loader-assembly-2.0.jar load csv -sf "s3a://esdfiles/Vector/actian-ontime/On_Time_Performance_Part3.csv" -vh localhost -vi VW -vd vectordb -tt ontime -sc "," -qc '"'

spark-2.2.0-bin-hadoop2.7/bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.2 --class com.actian.spark_vector.loader.Main /home/actian/work/spark_vector_loader-assembly-2.0.jar load csv -sf "s3a://esdfiles/Vector/actian-ontime/On_Time_Performance_Part4.csv" -vh localhost -vi VW -vd vectordb -tt ontime -sc "," -qc '"'

Step 9: Run Queries on the Loaded Data

Let’s quickly verify that the data was loaded into the database.

Connect with the terminal monitor:

sql vectordb

In the sql shell, enter:

rt

All query times henceforth will be recorded and displayed.

Get a count of the rows in the table:

SELECT COUNT(*) from ontimeg

This will display about 175 million rows.

Run another query that lists by year the percentage of flights delayed more than 10 minutes:

SELECT t.year, c1/c2 FROM (select year,count(*)*1000 as c1 from ontime WHERE DepDelay>10 GROUP BY Year) t JOIN (select year,count(*) as c2 from ontime GROUP BY year) t2 ON (t.year=t2.year);g

You would see the query results as well as the time that Vector took to execute the query. You can also now run more sample analytic queries listed at https://docs.actian.com/vector/AWS/index.html#page/GetStart%2FMoreSampleQueries.htm%23 and observe the query times.

To Delete vectordb

To delete the demo database, enter the following command in the Linux shell:

destroydb vectordb

Summary

That concludes the demo on how you can quickly load S3 data into Vector using the Spark Vector loader.

On a side note, if you would like to alter the data before loading it into Vector, you can do the necessary transformations in Spark and then load data into Vector using our Spark Vector Connector.

If you have any further comments or questions visit the Actian Vector forum and ask away! We’ll try to get your post answered as soon as we can. The Knowledge Base is also a great source of information should you need it.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Actian Life

Jana Whitcomb Joins as VP, Channel Partnerships & Business Development

Actian Corporation

December 5, 2017

Jana Whitcomb

I am excited to be at Actian and working with a Bay Area company again to help build our partner and business development organization! While driving the 101 (with all its crazy traffic), I’m reminded of my first job out of college – sales at Otis Elevator. Yes, people actually sell elevators – I helped build out the Cisco campus and the Crossroads buildings. It was while selling elevators in the Bay Area and living in Palo Alto that I got to wondering what everyone is up to here.  Seems like a lot of growth and nice cars, so I asked my neighbor, “What do you do?” and he said, “Sell software.” Long story short, I ended up at my first software sales job at Oracle.

With over 20 years in the business now it seems like a long time since my first day rollerblading on University Avenue. Born and raised in Kirkland, Washington, I moved back pretty soon after I arrived in the Bay Area and now live in Sammamish, Washington with my wonderful husband and stud of a dog Fred. I enjoy driving around in Nanabug, running, golfing and hanging with friends. And rooting for the Huskies – go DAWGS!

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Databases

Insight to Insights

Actian Corporation

November 28, 2017

Insight to Insights

A fellow recent hire at Actian, Walter Maguire, told me that many organizations get frustrated with having to wait into the late morning for decision support databases to be loaded and indexed or OLAP cubes to be populated with transactional data before sales analysis can be performed. This is not a new problem. I remember when I worked for British Telecom, our biggest worry was to complete the overnight batch updates to our mainframe CA-IDMS database, so we could start the IBM CICS transaction processing service, allowing employees to accept bill payments and check balances.

In the retail business, knowing how products are sold is critical. In the days before in-store POS systems sent daily updates to HQ, I worked at Coppernob, which owned 126 Top Shop stores. Every Saturday night, couriers collected Kimble tags containing bar codes that we scanned on Sunday to create reports showing sales across the UK. The Ingres database would index the sales tables to work out what designs were hot that week.

Fast forward to the present, and modern fashion titans like Kiabi are leveraging the power of Actian Vector for an in-depth analysis of sales data, utilizing cutting-edge predictive analytics to fine-tune their marketing strategies, particularly focusing on markdowns. By doing so, they can now efficiently track and optimize their marketing promotions, pinpointing the most effective markdowns to drive sales. Kiabi’s integration of Actian Vector has revolutionized the speed at which they can extract valuable insights, propelling their decision-making process to new heights. The comparison between using their traditional Oracle RDMS and the game-changing Actian Vector is starkly illustrated in the performance difference showcased in the chart below:

Kiabi’s performance test of Actian Vector query acceleration compared to standard Oracle

Having the ability to bypass the bulk update and indexing process gives organizations more time to gain more insights.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Actian Life

The Big Show: Visit Actian at AWS re:Invent 2017 in Las Vegas

Actian Corporation

November 27, 2017

aws reinvent

The largest cloud computing conference of 2017 is now here in AWS re:Invent and it’s taking over Las Vegas in a big way. An estimated 40,000 engineers, product leads, marketers, technical architects, and expert users from around the world will be descending on The Strip, attending keynotes, boot camps, demos, hackathons, and in-depth hands-on training sessions at the Aria, Venetian, Mirage, MGM Grand and other venues from November 27 to December 1.

Actian is a partner of AWS re:Invent 2017 and everyone is invited to visit us in the Expo Hall of The Venetian at Booth #1538 (as you enter the Hall from the front, we are in a center column of booths near the AWS Village at the far end). We’ll be at the booth (and the Welcome Reception) on these dates and times:

  • Tuesday, November 28: 10:30 AM – 3:00 PM and 5:00 PM – 7:00 PM (Welcome Reception)
  • Wednesday, November 29: 10:30 AM – 6:00 PM
  • Thursday, November 30: 10:30 AM – 6:00 PM

If you’re new to Actian products, here are some of the products in our portfolio we’ll be happy to talk to you about:

We’ll be liveblogging our experience at the show with our Instagram account that you can follow here (or visit @actiancorp when you get the chance). You’ll get to see AWS re:Invent from a unique perspective and learn a bit about Actian along the way. If you happen to visit and post on Instagram (or other social media), please be sure to tag us with #ActianCorp!

Not sure what to do or expect to see at AWS re:Invent when you’re not visiting the Actian booth? You can check out the Campus page to get info about what to see at each venue and how to get between venues (either through the shuttle bus or walking). You can also learn about all of the Keynotes, Bootcamps, Sessions, or just have some fun at the Tatonka Challenge or the Robocar Rally.

Along with the aforementioned Instagram account, you can follow us on Twitter and on LinkedIn to stay connected with what we are up to. If you fancy a job to pursue your passion in data management, data integration, and data analytics, check out our careers page and come join our team – WE’RE HIRING!

We hope you have a fantastic time at AWS re:Invent and we look forward to meeting all of you in person to learn more about Actian’s products, community and customers.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Product Launches

Actian Vector – Community Edition Support for Mac OS X via Docker

Actian Corporation

November 21, 2017

Apple and Docker logos

We’re pleased to announce that Actian Vector 5.0 Community Edition now includes support for Mac OS X via Docker. Over the last few years Mac OS X has become increasingly popular with Developers, as has Docker, the virtualization platform that allows you to bundle applications along with their native OS and run them in lightweight containers across a variety of platforms including OS X. Deploying Actian Vector on a Mac using Docker dramatically reduces setup time and complexity compared to prior versions that relied on Linux virtual machines.

Actian Vector Community Edition Mac OS X users will also benefit from the latest release of Actian Director, which now also includes Mac support via a native Mac installer. Actian Director is a desktop application that makes it easier for Actian Vector users to manage databases, tables, servers (and their components), administer security (users, groups, roles, and profiles), and create, store, and execute queries.

Getting Started With Actian Vector and Docker for Mac OS X

To get started, first register and download the Actian Vector Community Edition.

Then create a work directory, and copy the .tgz file downloaded into that location.

Next, download the zip from the Actian Github repository, extract the Dockerfile and copy that to the work location. You should now have a work directory that looks something like this:

mymac:Docker hanje04$ pwd
/Users/hanje04/Projects/Docker
mymac:Docker hanje04$ ls
Dockerfile
actian-vector-5.0.0-412-community-linux-x86_64.tgz

If you haven’t already done so, download and install Docker and Kitematic (optional, but quite handy if you don’t want to just use Docker from the command line).

Open a Terminal window (Applications->Utilities->Terminal) and run the following command which will download a minimal Centos 7 machine image, then install Actian Vector into it:

mymac:Docker hanje04$ docker build -t actian/vector5.0:community .
Sending build context to Docker daemon  31.74kB
Step 1/17 : FROM centos:7

<loads and loads of output>

Successfully built 7cb12d07e583
Successfully tagged actian/vector5.0:community

If all goes well, a new image will be created:

mymac:Docker hanje04$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
actian/vector5.0    community           7cb12d07e583        29 seconds ago      1.92GB

Now we can launch a container from this image.

mymac:Docker hanje04$ docker run –name vector actian/vector5.0:community
Actian Vector (5.0.0)

Vector/ingstart

Checking host “localhost” for system resources required to run Vector…

A new container gets created, and Vector will be started. Running the container this way will show you the Vector startup details, and continue running in the foreground. To stop it, hit Ctrl-C and Vector and the container will shut down.

You can also launch the image in the background. We can use the –d flag to run it as a daemon:

mymac:Docker hanje04$ docker run –name vector -d actian/vector5.0:community ef050ac8643cdb8ed04f909c622b1c3b4c49fcc08e731e3c2bbc6e774f260752

mymac:Docker hanje04$ docker ps
CONTAINER ID        IMAGE                        COMMAND             CREATED             STATUS              PORTS                                        NAMES
ef050ac8643c        actian/vector5.0:community   “dockerctl”         20 hours ago        Up 20 hours         16902/tcp, 27832/tcp, 27839/tcp, 44223/tcp   vector

If you’ve already created an image, you will get an error if you issue a second “docker run” command using the same name. After the container has been initially created it can be stopped and restarted without losing the data:

mymac:Docker hanje04$ docker stop vector
vector
hanje04-osx:tmp hanje04$ docker start vector
vector

If you need to recreate the image, you need to stop and remove it first:

mymac:Docker hanje04$ docker stop vector
vector
hanje04-osx:tmp hanje04$ docker rm vector
vector

NOTE: This will completely destroy the image along with any data or databases.

Once we have a running container we can login and use the instance:

mymac:Docker hanje04$ docker exec -it vector bash
[root@1118133200b1 /]# createdb mydb
Creating database ‘mydb’ . . .

Creating DBMS System Catalogs . . .
Modifying DBMS System Catalogs . . .
Creating Standard Catalog Interface . . .
Creating Front-end System Catalogs . . .

Creation of database ‘mydb’ completed successfully.
[root@1118133200b1 /]# sql mydb
TERMINAL MONITOR Copyright 2016 Actian Corporation
Vector Linux Version VW 5.0.0 (a64.lnx/412) login
Thu Nov 16 12:18:15 2017
Enter g to execute commands, “help helpg” for general help,
“help tmg” for terminal monitor help, q to quit

continue
*

Getting Started With Actian Director for Mac OS X

You may also want to connect to this instance from an external application, e.g. via Actian Director which has also recently been made available for OS/X. To do so you need to map the corresponding ports in the image to those on the host machine. For Director this would be 16902 and 44223.

hanje04-osx:Docker hanje04$ docker run –name vector -d -p 16902:16902 -p 44223:44223 actian/vector5.0:community
be6b51f7a2e7d1997b94e370c44d93d1a099761bee20cd2cceea0cb76c349e15
hanje04-osx:Docker hanje04$ docker port vector
44223/tcp -> 0.0.0.0:44223
16902/tcp -> 0.0.0.0:16902

For JDBC connections 27839 needs to be mapped, for ODBC or Net connections 27832.

NOTE: If you change these mapped ports after the container is started (e.g. via Kitematic), the container will be re-created from scratch, destroying any data you may have loaded into the database.

Before you can connect, a database password must be set in the container. To do this, connect to the container as before and run the following:

hanje04-osx:Docker hanje04$ docker exec -it vector bash
[root@be6b51f7a2e7 /]# sql iidbdb
TERMINAL MONITOR Copyright 2016 Actian Corporation
Vector Linux Version VW 5.0.0 (a64.lnx/412) login
Thu Nov 16 16:13:56 2017
Enter g to execute commands, “help helpg” for general help,
“help tmg” for terminal monitor help, q to quit

continue
* alter user actian with password=newdbpasswordg
Executing . . .

continue
*
Your SQL statement(s) have been committed.

Vector Version VW 5.0.0 (a64.lnx/412) logout

After installing and launching Actian Director, the Vector instance in Docker will appear locally to Director. Simply click on the “Connect to instance” button to connect. Enter “localhost” as the instance, “actian” as the Authenticated User, the password you just set and hit connect.

VectorDocker

Experience the Power of Vector Analytics

Now that you are all set up, it’s time to experience the power of Vector Analytics. You can refer to the Actian Vector 5.0 user guide to get familiar with configuring and managing Vector, important concepts, usage scenarios, and developing applications.

Please check the Actian Vector forum and Knowledge Base if you get stuck or have any questions.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Product Launches

Actian Vector – Community Edition Launches on AWS Marketplace

Actian Corporation

November 16, 2017

Find Actian on AWS Marketplace

I’m excited to announce that Actian Vector – Community Edition is now available on the Amazon Web Services (AWS) Marketplace.

The Community Edition is delivered as an Amazon Machine Image (AMI) and is compatible with the AWS 1-Click launch experience.

Why is This Important for You as a Developer?

If you are a SQL analytics application or solution developer who is considering building a high-performing SQL analytics solution in the Cloud or struggling with the cost, performance, or maintenance overhead of your existing SQL analytics application in the Cloud, then consider Vector Community Edition as the SQL database that drives your application.

What Makes Vector Different?

Vector is different in a number of ways. It was built from the ground up to utilize modern hardware architecture in ways that no other product can. Vector includes a number of innovations that exploit available features in a modern CPU such as SIMD2 instructions, larger chip caches, super-scalar functions, out-of-order execution, and hardware-accelerated string-based operations to name a few.

These innovations enable Actian Vector achieve record performance and price/performance levels for the Transaction Processing Performance Council’s Industry Standard TPC-H benchmark. Additionally, Vector’s innovations can significantly impact your application performance. The users or services that interact with your application can take action much more quickly since you can provide them with insights much faster than before.

If doing fast SQL analytics is important to your application or service and you’re struggling to do it economically, Vector can help you.

Vector also requires minimal tuning. You load your large analytic datasets into it and run your queries to get back results almost instantaneously. This frees you from doing complex DBA tasks to focus on your application/service instead.

Could You Highlight Some Use Cases for Vector?

Vector can be used for different kinds of analytics applications such as Micro-Segmentation, Campaign Optimization, Market Basket Analysis, Churn Analysis, and so on. For more details you can look at some of the use cases at Actian.com.

What’s Special About This AWS Marketplace Version?

With integration into AWS Marketplace, we’ve greatly simplified Vector deployment. You don’t have to “find” a machine to “download” and “install” Vector anymore. You can deploy it on the AWS Cloud on your choice of hardware and in your selected region with literally one click.

We’ve also added a few nice touches such as automatically configuring Vector when you launch it. This ensures that Vector is tuned to the EC2 instance type that you have chosen to deploy on so that it provides great out-of-the-gate performance. We’ve also ensured that if you change the instance type, it will detect this and use the newly available resources (RAM, CPUs, etc.); this supports both a scale-up or scale-down scenario.

Last, but not the least, we’ve provided a real-world sample airline dataset of 175 million rows from the Bureau of Transportation Statistics that enables you to analyze flight delays by running provided SQL queries. You will be impressed by the speed of these queries. If you’re feeling adventurous, you can use that analysis insight to influence booking your next flight 🙂

How Much Does it Cost?

Vector – Community Edition is free to use, and you are welcome to develop whatever you want. You pay only for the underlying AWS infrastructure. The AWS Marketplace presents this cost in a very transparent manner so that you can make a good upfront decision. You won’t be charged for computing resources if you stop your instance (for example, during idle time) and restart it later.

The Community Edition is limited to 250 GB of uncompressed data. If your requirements exceed this, we’d be happy to set you up with an Evaluation version that works with larger data sizes.

How Can I Try the Community Edition?

Head over to the AWS Marketplace listing and proceed with deployment. All you need is an AWS account.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Actian Life

Sparking a Revolution at Big Data London

Actian Corporation

November 15, 2017

The essential guide to Actian Hybrid Data Conference 2017 in London, UK

If you’re visiting what’s billed as the United Kingdom’s largest data and analytics events, you know that Big Data London at the Olympia promises to be a massive gathering of experts and analysts across the fields of Big Data, machine learning, AI, cloud technologies, and more. You’ll be able to learn from numerous pioneers and visionaries of the data community as well as get a unique look at the current state of the local data economy.

Over five thousand attendees are expected to visit this unique two-day event, which is open to everyone, starting on November 15. Big Data London will feature 80 exhibitors, over 100 speakers and use cases, and five theaters with live demos. A keynote presentation will mark the start of each day of the conference, with Neha Narkhede, co-founder and CTO of Confluent, speaking on the rise of the streaming platform and building large-scale operable data systems on November 15, and Amr Awadallah, CTO of Cloudera, discussing machine learning, AI and data analytics on November 16.

Be sure to visit Actian at booth #426 (located near the center of West Hall Level 1 next to the AI Lab Feature) and meet members of our technical and sales team members who can answer all of the questions you may have. In addition, Actian technologists will be giving two presentations during the conference:

  • November 15 at 11:10 AM – 11:40 AM in the Fast Data Theater – Mary Schulte, Senior Systems Engineer, will discuss scale-up and scale-out Big Data deployment options for common Enterprise use cases, with simple tried-and-true solutions.
  • November 16 at 3:10 PM – 3:40 PM in the Fast Data Theater – Keith Bolam, Engineering Solutions Manager, demonstrates how to quickly analyze a Billion rows, and covers the 5W’s and H of Interpreting Fast and Fresh Data.

Note that while attendance is free, you’ll still need to register online to get in.

If you’re new to Actian products, here are some of the products in our portfolio we’ll be featuring at Big Data London:

  • Actian NoSQL accelerates agile development for complex object models at enterprise scale.
  • Actian Zen Embedded Database enables zero-admin, nano-footprint, hybrid NoSQL & SQL data management.
  • Actian Vector in-memory analytics database is a consistent performance leader on the TPC-H Decision Support Benchmark over the last 5 years.
  • Actian DataConnect provides lightweight, enterprise-class hybrid data integration.

We hope you have a fantastic time at the conference and we look forward to meeting all of you in person to learn more about Actian’s products, community and customers.

Follow us on Twitter, and on LinkedIn to stay connected with what we are up to. If you fancy a job to pursue your passion in data management, data integration, and data analytics, check out our careers page and come join our team – WE’RE HIRING!

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Platform

The X-Vector

Actian Corporation

November 1, 2017

The X-Vector

As an employee new to Actian I decided to dig into what makes Actian Vector a star performer. Three specific qualities, covered below, caught my eye as I reviewed the technical overview.

Vectorization: When I first heard this term, my memory went back to 30 years ago, when IBM offered my employer, Watson Calculating Services Limited, a free trial of the Vector Facility for our ES9000 mainframe. It was cool because we saw a massive improvement in our FORTRAN applications without having to rewrite them. A simple compiler directive was all that it took to take advantage of vectorization. So, how does this extend to what Vector does? Actian has applied the techniques developed from the acceleration of floating-point operations and high-performance computing using specialized hardware to accelerate database workloads. The result is 100x performance improvements without specialized hardware. Actian provides these performance improvements on industry-standard Intel x86 architecture server processors, transparently, without having to rewrite standard SQL queries.

blue-x-actian-blog

Hybrid Column Store: Relational databases store data that is optimized for row-at-a-time access. However, for fast analytics on a subset of columns, storing data in a compressed columnar format is the way to go because analytics workloads in traditional data warehouses tend to use de-normalized tables to optimize read performance, but rarely analyze whole rows. Vector goes a step further, by optimizing the in-memory block format to minimize cache misses. This boosts memory access speeds to maximize performance.

Positional Delta Trees: Allowing incremental changes, while maintaining transactional read consistency is a tough challenge for columnar databases. Actian Vector maintains full multi-version read consistency, so every new transaction will see all previously committed transactions, so you don’t have to rely on large bulk data loads alone for updates. Actian Vector’s Positional Delta Trees (PDTs) store small incremental changes, as well as updates and deletes, so queries run lightning fast and any calculations add up despite changes that occur while the query executes.

In my judgment, these are some of the many qualities that make Vector stand out from the crowd.

Want to know more? Then, visit us at the Actian’s Hybrid Data Conference at London’s Amba hotel on the 9th of November to discuss in person with Actian engineering, executives, customers. Check out the full agenda and register. It’s free to attend.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Actian Life

Walt Maguire to Lead Actian Pre-Sales Engineering

Actian Corporation

October 26, 2017

Walt Maguire to Lead Actian Pre-Sales Engineering

Walt Maguire has been a voracious consumer of science fiction novels from an early age, and now he loves getting hands-on experience with the technology of the future. He started his technology career taking apart clock radios to explore how they worked and disassembling the family washing machine. Needless to say, his parents had mixed feelings about his curiosity, which turned to happiness once computers came along. Walt got to play with bits in a CPU, and mom & dad got to keep their washer functional. In the years since then, Walt has spent his career doing nearly everything that can be done with data.

Today he leads the global pre-sales team for Actian and plans to adopt a very hands-on approach to demonstrating Actian performance and features to show how they can solve data challenges in today’s business environment. In his spare time, he channels his creative needs turning trees into furniture, and loves to travel. Please welcome Walt to Actian!

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.