10 May 2024 Replicating Oracle to Aurora PostgreSQL using DMS
Recently, I got involved in an cloud application uplift project. As part of the project we wanted to use an Aurora PostgreSQL database to offload all read activities from a RDS Oracle Database Standard Edition.
Since this was such an exciting and educational experience, I wanted to share my experience with you. Summarising my journey and collecting recommendations might help you to get through a similar journey.
How hard can it be ?
How hard can it be? This was my initial, very naive, thought.
AWS has done an amazing job of reducing the complexity of managing databases and database migration projects.
AWS offers various services, like the AWS Database Migration Service (DMS) to manage the data migration, and tools like the AWS Schema Tool to review an existing data structure, analyse the compatibility between old & new database engines, and prepare the target database.
Before starting, I suggest doing your research. I know it is easy to complain about missing or incomplete documentation, but the documentation from the AWS Database team is outstanding.
AWS provide comprehensive Step by Step guides, migration playbooks, and best practices for using AWS DMS. They detail various use cases, and are very helpful in understanding the target state you are trying to achieve, and the journey to reach that target state.
The documentation for AWS SCT is outstanding as well. It helps you understand how to use the tool, both to review the source data structure, and to prepare the target database.
A collection of important documentation:
- AWS Schema Tool Documentation
- AWS Schema Tool User Interface Documentation
- Additional Resource to configure the AWS Schema tool
- Database Migration Best Practices
- Database Migration Playbooks
- Database Migration Step-By-Step guides
- AWS Database Blog
- (Video) Deep Dive on Amazon Aurora PostgreSQL Performance Tuning
Be familiar with the Source Database
The AWS Schema Conversion Tool (SCT) is a very important part of your migration journey. It supports your analysis of the source database, and helps you to familiarise yourself with the source data structure, and to prepare the target database.
A handful of things that helped me using AWS SCT and preparing the target database …
- Information is only ever read from the source database, never written
- All actions, apart from Apply to Database, happen within the AWS SCT Tool
- Read the database migration assessment report thoroughly, paying special attention to any action items
- It’s best to test the converted schema by applying it to an empty database
- Make use of extensions packs, provided by AWS SCT, to support source database functions not supported in the target database.
- You can store the target schema as a SQL text file and apply the prepared schema after the replication has been completed.
- Translate database configurations, such as memory configuration, from the source database to the target database platform
In my case, because I’ve decided to do a DROP_AND_CREATE full load replication, I stored the prepared schema as a SQL file and applied it manually after the replication has been completed. I have also translated memory related configuration parameters of the Oracle database engine, like large_pool_size, pga_aggregate_target and pga_aggregate_limit, to their PostgreSQL equivalents, and applied them to the target database.
Configure the Target Database
Once you are familiar with the source database and the converted schema, the next step is to create the target database. Create a Parameter Group and configure the previously identified source database configuration parameters like memory settings.
I can not provide specific recommendations for configuration parameters, but I can share that following the AWS documentation, and James Finnerty’s recommendations from his 2019 talk Deep Dive on Amazon Aurora PostgreSQL Performance Tuning, significantly enhanced the performance of my Aurora PostgreSQL server. Furthermore, ensure you have Amazon RDS Performance Insights enabled, as it provides you with detailed monitoring information, and helps you to identify performance problems.
Replicate the Tables
Once you are familiar with the source database, the converted schema, and have configured the target database, you can proceed with the data migration. When you are at that stage, you should already be familiar with best practices, migration playbooks, and limitations for your migration use-case.
As shared above, there is plenty of documentation, including step-by-step guides, available from AWS to support you.
A few recommendations which helped me create the target database:
- Don’t replicate all tables; replicate only the tables you really need
- Create multiple replication tasks
- Create tasks for different DMS Task configurations
- Create tasks to group tables based on table size
- Monitor the resource consumption of the Replication Instance and adjust the instance type
- Limit the table replications per Task
- I allowed each DMS Task to replicate one table at a time
- If your target database is an Aurora PostgreSQL instance make sure to set the DB in replica mode during the replication [link]
- Identify the column max size (kb) of each LOB Type and adjust the LOB Settings in the Target metadata task settings
- Make sure to adjust
commit_ratein the Full-load task settings
- Make sure to adjust
- Be familiar with the Full-Load task settings and the different table preparation modes.
- Enable Task logging to monitor the replication process and review logs in AWS CloudWatch.
- Enable Data validation
I ended up creating multiple DMS replication tasks, and separated the tables based on the table sizes and LOB type column sizes. Thankfully, Oracle makes it easy to determine the sizes of tables and columns.
The largest tables (>10 million rows) were grouped together into one task. To replicate tables with LOB type columns, I have decided to follow AWS recommendation and use the Limited LOB mode. Tables where LOB type columns max size exceeded the default value of LobMaxSize in the Target metadata task settings were grouped together in multiple tasks.. Tasks with a LobMaxSize greater than 63kb had also an adjusted CommitRate in the Full-Load task settings, to avoid out-of-memory errors. Last, but not least, I had two tasks to replicate all other tables. To reduce the load on the database server, I only allowed them to replicate one table at a time per replication task, by configuring the Full-Load task setting MaxFullLoadSubTasks to 1.
Wait, that’s it?
Congratulations, you hopefully managed to successfully replicate an existing database instance to a new database. Now it’s time to run your first queries. This is the most exciting and thrilling part of the journey. You are about to discover if all the effort was worthwhile. Firing off a few queries should provide you with assurance of the data integrity and performance.
This was the point where I discovered something was not quite right. Data integrity was fine but performance was horrendous, compared to the Oracle database. Queries taking a few milliseconds on the Oracle database were taking minutes on the Aurora PostgreSQL database. After further analysis I discovered that PostgreSQL was generating bad query plans. How you can encourage Aurora PostgreSQL to generate better plans is something I will explain in detail in my next blog post. Enabling the Aurora PostgreSQL query plan management extension solved my performance issue.
Conclusion
AWS offers some powerful tools to support you along the way to creating a replica of your database and keeping it in sync with your source database. The AWS Schema Conversion Tool help you analyse and understand the source database structure. Furthermore it helps you to prepare the target database. It generates a database migration assessment report to identify potential issues. It provides suggestions to make the schema compatible with the target database engine. It also provides extensions packs to emulate database engine specific functions not present in your selected target database engine.
The AWS Database Migration Service allows you to:
- configure replication tasks to replicate tables and views across databases
- enable logging to track the replication process and watch for replication errors
- conduct comprehensive testing of the new database to validate the data integrity and performance.
In my upcoming blog post, I will provide insights into diagnosing and resolving query performance problems on Aurora PostgreSQL. Until then, happy migration!
No Comments