Data Architecture

Performance Testing Production Workloads for Ingres and Vector

Sean Paton

November 9, 2021

Performance testing production workloads for Ingres and Vector

One of the main concerns of Ingres and Vector customers whose systems need to undergo a significant change, such as major version upgrades, OS migration, major software releases, increased volumes of usage, etc., is how to properly test the database under realistic conditions in a test environment.

Some customers have heard of the Query, Recording and Playback (QRP) tool, developed by Actian Professional Services, at user meetings. The tool aims to test the Ingres or Vector database in near-to-production conditions by recording and then playing back a production workload in a test environment. The rest of this post describes the tool and its usage in more detail.

Query Recording

The Query Recording piece of QRP uses the “SC930” tracepoint, which produces a recording of all queries run against the database during the time the tracepoint was turned on. The output from SC930 is essentially a series of text files, containing the queries, parameters, data types, data, etc. – i.e. the load we are aiming to test against. Usage of the tracepoint, and details of the output format that it produces, are further described here:

https://github.com/ActianCorp/Dynamic_Playback/wiki/Ingres-DBMS-Server-Query-Tracing

https://github.com/ActianCorp/Dynamic_Playback/wiki/SC930-Output-Page

This tracepoint – which is usually an undocumented way of amending the behavior of Ingres and Vector – is now officially supported and is documented in the product manuals. In fact, extra syntax has been added in the latest releases to make it even easier to work with (look for “SET SERVER_TRACE” in the manuals), so you can use it yourself for ad-hoc query analysis. For example, you can now enable query tracing only for specific users running queries against a specific database that take longer than a certain time period to run, making its use for diagnostics much easier – see here for more details for Ingres and here for Vector v5.

Using the SC930 trace point means that the query recording part of the operation can work on older versions of Ingres and Vector that don’t yet support server_trace.

Query Playback

Playback is a separate set of tools, written initially by the Professional Services team at Actian, described in the Actian Github repository here (with binary releases also published to Github but the leader source code maintained in SVN here), which is designed to replay each query against your test system in the same order and relative timing that it was recorded in.

“Query” in this context means any SQL statement that can be run against the Ingres or Vector database. In fact, even if the “Query” is invalid SQL, Playback will try and replay it and if it is invalid, it will fail just like it would have when it was recorded.

Playback is, in the main, made up of 2 core pieces.

  • Preprocess:
    • Preprocess reformats the SC930 output files so that they are split into individual sessions and every session, connection and query is ordered by a timestamp. This was done to ensure we could play each session back as a separate thread at the correct time, regardless of what else is happening to other threads.
    • Preprocess is written in C# and can run with MONO or .Net environments.
    • This only needs to be run once per recording.
    • The output files can be run on one server and played back elsewhere.
  • Playback:
    • A basic playback job opens individual files produced by preprocess and submits a separate thread for each session in the same order and timing. The threads should then run all queries in that file in the same order and timing.
    • Playback is written in C calling standard Ingres Embedded SQL library functions, which is arguably the fastest way to submit a query to an Ingres or Vector database either locally or over a network.
    • This can be run as many times as you choose.

Playback can help test the consequence of adding load to the system, changes in hardware, such as reducing cores or memory, increasing data volumes or users and also taking performance baselines then testing upgrades or patches. For example, we can run playback at twice the speed to test the impact of greater contention for system resources. Greater load can unearth issues with system, application, database configuration, etc. so that these can be addressed before the cut-over.

Playback Considerations

Earlier I said the tool could re-create “near-to production conditions”, as this is never going to be perfect. Firstly, hardware inconsistencies even on servers that have apparently identical specs can cause differences. Then the database can be accessed in a variety of ways, and using a variety of languages, each with their own quirks. There are also some interesting aspects with the timing of query submission(s) in a contentious environment. Also, often the application is housed on one or many separate servers or client machines that arrive over the network in a host of ways.

Because of this, formal benchmarking that delivers consistently repeatable results is not straightforward. We have found that when running multiple playbacks of the same data files you can get different runtime results. Each session should start in the same order as the original session, at the same interval from the overall start time and each query should be in the same order as the original query and all things being equal, should start at the same interval (to the start time), however, this is rarely the case on big playback jobs.

After many hours working with the tool we believe that even very small changes, which are often out of our control, can have a significant influence on the overall runtime. Areas like a “slow” or faulty disk, shared resources in a VM environment, minor differences in memory/core speeds, etc. are all factors. Remember that the Playback queries already run in 1/10 and 1/100 of a second on powerful boxes and at times 100+ threads can be running concurrently. If a query ever gets out of sync with other threads, this can lead to unpredictable results, such as queries acting on empty tables (as the query ran before the data was loaded), or long running queries where the query started before the optimizer stats were generated. Actian recommends that 3-4 runs are performed and an average taken, when running performance tests.

Playback also has features to measure individual query performance, row counts, sessions and number of queries. The query performance stats can obviously help identify long running queries, although because of the previous paragraph, you should only focus on areas that re-occur consistently across multiple sessions or replays. A set of CSV files are produced by every playback job, which can be loaded into a spreadsheet to help drill down on this information.

Actian’s Services team has used the tool in quite a few customer engagements, and had very good results with it. One recent example is using it with a customer who was doing a platform migration project to move to a new and larger server, where we were able to use the QRP tool to configure the new server to better support the actual production workload, and hence produced a much smoother go-live experience for the users than would have otherwise been possible, and a better experience that had ever been delivered in previous upgrade projects.

In summary QRP is really useful and quite straightforward to use, though it is a community-led project, rather than a licensed product, so some hands-on engineering will be needed (e.g. to build it from the latest source code), so you should have someone with development skills on hand during the setup. There is some good info around it on the Actian Community pages, such as pre-requisites and platform availability, which I recommend you read before attempting it.

Detailed information on Playback setup and usage, can be found here:

https://github.com/ActianCorp/Dynamic_Playback/wiki/Dynamic-Playback-set-up-and-usage-page

Of course, new features and bug fix contributions from our community are very welcome! Many improvements would be possible, such as building a user interface to monitor progress as it happens for instance – and I’m sure other features will suggest themselves as you use it.

If you need any in depth help then contact me or one of the team at services@actian.com. We would be very happy to assist you in getting started with the tool, or in running an upgrade project using it and hence in delivering a smooth and predictable outcome.

Kunal Shah - Headshot

About Sean Paton

Sean Paton is Delivery Manager for the UK Professional Services team, where he has led the delivery of many platform migration and upgrade engagements for a number of years. He has over 20 years of experience of working with Ingres.
Data Platform

Will Cloud Data Warehouses Really Help You Cut Costs?

Teresa Wingfield

November 8, 2021

will cloud data warehouses help you save money

If the first question you consider when thinking about a cloud data warehouse involves how much money you will save by moving your on-premises solution to the cloud, you may not like my answer: There’s no guarantee that you will realize any cost savings at all. But let me suggest, however humbly, that this may be the wrong question. Even if you could save money—and you could—cost reduction shouldn’t be driving your cloud data warehouse strategy. The strategic benefits that arise from moving to a cloud data warehouse should be driving that strategy, and for this reason, your organization can gain valuable operational and strategic advantages even if you don’t reduce your costs.

Caveat Emptor

Several Actian customers have told me that they believe promises of cost reductions in the cloud have proven difficult, if not impossible, to realize. Here are a few of their top reasons:

  • Our entire data center needs to be on-premises, so the cost for an on-premises data warehouse is minimal.
  • We won’t actually save money by spinning down cloud services during idle processing times because our data warehouse must be up and running at all times.
  • We are early in our hardware refresh cycle. Even if we could reduce costs in the cloud, that won’t offset the cost of hardware we just purchased.
  • The argument that moving to the cloud can lower costs by enabling us to avoid over-provisioning capacity on-premises is overhyped. Over-provisioning isn’t just an on-premises phenomenon. Expensive “reserved instances” in the cloud can also be overused and are often larger—and therefore more costly—than they really need to be. Conversely, unplanned spikes in demand can lead to paying premium prices for using on-demand resources.
  • When cloud data warehouses are used frequently, usage-based billing agreements can quickly rack up expenses greater than we would incur on-premises.
  • We would frequently need to move large volumes of data to and from the cloud, which would lead to significant costs that we do not encounter in an on-premises configuration.

Your organization’s needs may enable you to discount each of the points made above—or at least enough of them to justify a migration to the cloud based on anticipated cost savings. Or you might be able to counter these assertions by identifying ways in which a cloud data warehouse would indeed help you cut costs, such as through the use of less expensive storage and the avoidance of hardware and OS maintenance and support costs.

If you can do any of these, consider it the icing on the cake. And to that point, consider that the cake itself isn’t cost savings. The cake is the ability of a cloud data warehouse to help your organization gain advantages that are out of reach in an on-premises environment.

The Real Benefits of a Cloud Data Warehouse

So how could a cloud data warehouse have a significantly more positive impact on your business than an on-premises data warehouse? Let us count the ways:

Enable Greater Agility

A cloud data warehouse combines the immediate availability of infrastructure with unlimited scalability. And, with the help of automation and orchestration in the cloud, provisioning, de-provisioning, and re-deploying resources can be simpler and far faster than on-premises. Simply put, you can do more—and you can do it more quickly.

Overcome Resource Constraints

Fully-managed cloud data warehouses are particularly useful when you don’t have a deep bench of database administrators and other in-house resources to manage a data warehouse.

Accelerate Innovation

The pace of innovation by cloud service provides such as Amazon, Microsoft, and Google is amazing. Rich suites of developer tools and a compelling user experience, coupled with the ability to run DevOps for rapid iterations of features and functionality, are key reasons that the cloud has become the primary incubator for countless new initiatives. In addition, ongoing advancements in analytics, data services, security, and integration are at your disposal immediately when you migrate to the cloud.

Ensure Availability and Seamless Access

Servers, storage, data servers, or an entire network can easily recover and continue operations in the cloud. The level of resiliency if far greater than physical facilities that are more vulnerable to adverse events and that can take longer to recover from them.

Expand Your Reach

The cloud makes it easier to facilitate connectivity among locations spread across the world, which in turn empowers more users to derive business value from the data warehouse.

Summary

I’m not stealthily trying to say that the cloud is going to be more expensive than you think or that you won’t really save money. I’m simply trying to say the real value of the cloud lies beyond questions of cost reduction—so don’t let yourself get distracted. For most companies, a mixture of both on-premises and cloud environments will be the norm for the foreseeable future. If you want to learn more about the business and operational benefits to be gained through use of the Actian Data Platform, click here.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.