Provisioned IOPS on AWS RDS

Provisioned IOPS on AWS RDS


Amazon recently announced a new feature for AWS RDS databases called Provisioned IOPS. In short, this feature is designed to reliably guarantee the amount of I/O operations per second for your RDS instance. Prior to its public release, Shine Technologies was able to work with the Amazon RDS team to give this new feature a test-run with one of our products.

After some initial tweaking, the results were ultimately very rewarding. In this post I’ll talk about the product we tested against, what performance gains we achieved with Provisioned IOPS, and how we got there.

Introducing NBV

Shine Technologies has several products in the energy, telecommunication, finance and retail industries. For this exercise we decided to target our largest energy-industry product: Network Billing Validation (NBV), which manages billions of dollars per year in energy invoicing charges. Our clients for NBV include the largest electricity and gas retailers in Australia.

Architecturally, NBV is a fairly traditional three-tier Enterprise application developed in Java. We have an Oracle database, which in several cases exceeds 500GB worth of existing data. We also use Ubuntu EC2 instances to host our application server and a batch-processing engine that churns through large amounts of data every day.

NBV is over 10 years old and still incorporates many batch processes that were designed for hardware from that era. So we were asking ourselves at the outset: is this application going to be able to benefit, or is it just too old to have its batch-engine throughput improved by something like Provisioned IOPS?

Setting Up

Using pre-release software doesn’t just mean getting advanced access to great new features. It also means dealing with the standard quota of rough edges. Setting up an RDS instance with Provisioned IOPS was not possible through the AWS console at the time, so we had to use the AWS command-line tool. However, whilst the documentation still needed some polish, setup proved mostly painless to those familiar with the tool.

Both pre- and post-release, Provisioned IOPS comes with the caveat that it cannot be retrofitted onto existing snapshots. For us, this meant several days worth of exporting an existing database and re-importing it again into a blank RDS instance. Amazon does, however, promise that modification of existing instances will be available in the near future.

Our first attempts to utilize Provisioned IOPS were based on one of the performance tests that we execute for every release. This test comprises 3.7 million records, or approximately one month’s worth of market data. As a result, we had a solid benchmark that we could use to ensure we could actually get something extra from Provisioned IOPS.

The test comprises three main phases:

  • Phase 1: Lots of inserts. Effectively the initial load components from a flat CSV file.
  • Phase 2: Data-validation phase. Large numbers of reads and some inserts of calculated results.
  • Phase 3: Simple calculation and insert phase. Less data than Phase 1 being inserted, as well as less decision-making than Phase 2.

We used a 1TB RDS instance so that we could dial up the settings of Provisioned IOPS to the full 10,000 IOPS. In a production environment you might choose to be more precise with these settings, but we were trying to get something going as quickly as possible.

Results

Our first result was…no improvement whatsoever. After confirming that we’d actually switched Provisioned IOPS on correctly, we contacted the Amazon RDS team for their expert assistance. The verdict was that our application was not actually pushing the database instance hard enough. This might sound strange, but NBV was originally built to work in the database world of 10 years ago, in which databases were much easier to overwhelm.

Luckily NBV also has the ability to be configured to use more resources – for example, threads. Needless to say, we increased them considerably. To be honest, it was something of a sledgehammer approach, as we wanted to get a result in a small amount of time. Either way, it didn’t matter because the results were still fairly comparable to the standard benchmarks.

Next we tried upgrading to a very large EC2 server, the theory being that if we could get our batch-processing engine working harder, it would have the effect of pushing the database harder. But once again, this proved not to be the case and there was still no improvement.

For our final attempt, we increased the size of the RDS instance-type to the largest available for our purposes – 4 x X-Large. The results were immediately impressive. Our existing benchmarks for Phase 1 were around 110 records/second. With the 4 x X-Large instance and Provisioned IOPS, we were getting over 500 records/sec – nearly 5 times the rates for previous tests.

To determine the degree to which provisioned IOPS had contributed to this improvement (as opposed to the increase in the size of the RDS instance type), we turned Provisioned IOPS off in the 4 x X-Large RDS environment and ran the test again. Our initial benchmarks were still exceeded, but not by as much – without Provisioned IOPS, we were getting an average around 350 records/sec.

So we finally had conclusive evidence that Provisioned IOPS could indeed help our application gain an improvement for Phase 1. In this case, it was around 45% on top of what we gained when we used a 4 x X-Large RDS instance. This would shave hours off our longest-running processes!

Here’s a chart showing how the write IOPS compared through the file-loading period. The blue line is with Provisioned IOPS, the green line is without:

What about the other phases?

Phase 3 also jumped in speed, but not as much – around 30%. As for Phase 2, we still observed no discernable difference.

We believe this variation in findings can be explained by a difference in software architectures. Phases 1 and 3 use the same architectural foundation, whilst Phase 2 was added later and uses an architectural approach that is not as easy to configure to push the database harder.

Just For Kicks

Given the success of our initial tests, we decided to conduct some more using a slightly different dataset. This dataset was for a customer that receives one very large batch of data over a few days, as opposed to over the course of an entire month.

We didn’t consider this dataset suitable for our initial tests as it was fairly atypical. However, the customer was curious as to just how fast we could get it loaded. We took their challenge – without them knowing we now had a secret weapon.

The dataset contained around 5.3 million records, spread across 106 flat CSV files. For the last three monthly loads, this customer had only managed 90 records per second at best. We could not resist seeing how Provisioned IOPS could help them.

The results were amazing. For Phase 1 of the application, we reached an average of 654 records/second. From a business perspective, this had the potential to shave approximately 20 hours off their processing time! Achieving this sort of performance on traditional server hardware would be prohibitively expensive.

Summing Up

Even with some rudimentary changes, we have managed to find significant measurable benefits for our customers by using Provisioned IOPS – and we haven’t even pushed it to its full extent. With more testing and a few more smarts, our ten year-old application may go even faster. What’s more, new applications and components that we develop can be designed from the ground-up to take further advantage of Provisioned IOPS.

Those who opt to use AWS along with features like Provisioned IOPS will face significantly lower costs than those using traditional server hardware. We look forward to bringing these savings to our customers – and speeding things up too!

1 Comment

Leave a Reply

%d bloggers like this: