How I Analyzed 165 Million Flight Records in Seconds on My Laptop

December 19, 2017

It was surprisingly easy to analyze 165 million flight records my laptop. It took me just an afternoon following the Actian Evaluation guide that you can download from here.

Scientists with Intel over the years needed to bring down the cost of high-performance computing. The key vector processing technology feature they needed was to analyze large arrays of data in a single CPU instruction cycle. Actian has accelerated standard SQL database requests to take advantage of vectorization. Actian Vector translates standard SQL into relational algebra so your queries can respond often in 100th of the time it would have with a standard relational database. Since joining Actian, I have seen demonstrations and heard customers rave about Actian Vector, so I jumped on the idea of trying it for myself so I could create a how-to video. The evaluation guide stepped me through the database install, sample data load, and provided queries to run against the 165 million row data set containing historic airline flight records.

My laptop has a multi-core 64-bit Intel processor and an available 106 GB of disk space needed to try Vector for myself. It took me just an afternoon to to run through the process of downloading the software with the raw flight data, create database, installing, loading and running the six supplied queries.  Unzipping the more than 300 CSV files for the raw data was the longest step. The supplied load scripts create a fact table and a single-dimension table. I didn’t create any indexes or perform any tuning. I created the tables and generated statistics to inform the query optimizer about the data.

I have installed databases including relational databases Oracle, DB/2 and SQL/DS. Never has getting to this kind of performance been so easy. I recorded the whole process and edited it down to a seven-minute video so you can see every step for yourself by clicking here.

