Data Warehouse

Delivering Real-Time Reporting at Speed and Scale

Emma McGrattan

September 21, 2017

Delivering Real-Time Reporting at Speed and Scale

When a major UK logistics company wanted to improve reporting for its large accounts, they turned to Actian to design, implement and support the underlying database system (“LARS”) using Ingres, HVR and Vector products for its architecture.

The Brief

The customer had around 100 customer accounts representatives dedicated to large accounts, with each rep manually producing their own set of spreadsheet-based daily, twice-daily and ad hoc reports for emailing to their account contacts, based on a range of daily extracts from an Ingres operational-level database.

The customer wanted to standardize the format of the reports and to automate their production in order to save reps’ time, to deliver reports to their accounts in a consistent and timely manner, and ultimately to make it feasible to outsource the function.

The challenge was not just to provide the capability of producing the volume of scheduled complex analytical reports (over 1000 per day, tightly clustered around critical times in mid-morning and mid-afternoon) and simultaneously supporting ad hoc complex report production for 200 users with a response time of seconds, but also a) to do this without significant overhead on the source operational-level database and b) reduce the need for the range of existing Extracts from the operational-level database. An additional requirement was that it should be possible to ‘switch’ other existing applications from the operational-level database to this new database at a future stage, thus mandating the database design to be as similar as possible to the existing operational-level database.

Because of delays to the start of the project (due to changes within the customer’s organization), there was considerable pressure to deliver the project in as short a timescale as possible.

The Architecture

To provide the user-visible front-end analytical and reporting facility a semi-customized package from a partner organization was chosen, based on the Logi Analytics product.

The database schema design was constrained by the source database schema design, which resulted in the need to provide a range of database views involving joins over 12 tables, with some of the tables having over 300 million rows. In order to provide interactive users with realistic response times whilst also servicing the needs of scheduled Reports, Vector was chosen as the ideal DBMS for this database, due to its very high speed of processing complex retrieval queries and its ability to mirror the Ingres source database structure virtually unchanged.

Since the source Ingres database and the target Vector database had essentially similar schemas, HVR (High Volume Replicator) was chosen as the software solution to keeping the Vector database in-line with the source Ingres database. The HVR Capture process reads the Ingres source database transaction log, passes insert and update operations via the HVR Hub to the target machine where the HVR Integrate process reflects the inserts and updates as ‘upserts’ into the Vector database (‘deletes’ were suppressed within HVR, to avoid the regular purges of the source database also resulting in purges of the target database), placing very little load on the source database machine.

The Implementation

Ingres source database runs on an older HP-UX platform, so HVR was installed on a dedicated Linux server to act as its Hub. The Vector database sits on a separate dedicated Linux server. An HVR ‘capture’ component runs on the Ingres machine, captures the source database changes from the transaction log and sends them via the HVR Hub to the HVR ‘integrate’ component running on the Vector server which applies the same changes (via ‘upserts’) to the target Vector database.

To meet the customer’s need for reduced development timescales the project was delivered ready for user acceptance testing in 3 months from the start of development, thanks to Vector’s ability to mirror an Ingres schema with little change.

In order to reduce the number of table joins in the views from 12 down to a more manageable 9, a regularly-scheduled job (running every 10 minutes) was created to maintain a de-normalized table.

The denormalization update job, HVR’s ‘upsert’ job, the large number of scheduled reports, and the interactive users happily co-exist on the Vector server.

Vector Performance

It is often fairly meaningless to quote retrieval response times from a system since there are so many variables involved, but we can provide a flavour of the retrieval performance of the Vector database compared with its Ingres source database. A member of the customer’s IT staff needed to run an unreasonably heavy ad-hoc SQL query against the Ingres source database which ran for 10 minutes before she killed it as untenable. We ran the same SQL against the live Vector database, during ‘prime-time’ activity – it completed in 0.05 seconds. Although this is not a direct comparison since the two databases were running on different platforms and hardware configurations, it does illustrate the dramatic retrieval speed of which Vector is capable.

In fact the performance of Vector was so impressive it changed the specified requirements from the client facing team. The envisioned work practice was to allow up to ~200 complex reports to run between 10AM and 10:30 but Vector was so fast and comfortable at scale that these reports are now all run within 5 minutes of 10AM and that was only limited by the resources (cores, memory, etc.) on the machine.

Customer Satisfaction

The customer was sufficiently impressed with the novel architecture of the LARS implementation that they commissioned a second more challenging Vector-based project to be fed from a continuous message stream. This will be the subject of a future blog entry.

Emma McGrattan headshot

About Emma McGrattan

As SVP of Engineering at Actian, Emma leads research and development for the Actian Vector, Actian Vector in Hadoop, Actian X and Ingres teams. A recognized authority in DBMS and Big Data technologies, Emma is a sought-after speaker at industry conferences. Emma has recently celebrated over 25 years in Ingres and Actian Engineering. Educated in Ireland, Emma holds a Bachelors of Electrical Engineering degree from Dublin City University.