Recurring Batch Tasks with Spring Scheduler

02 Oct 2024 Managing Recurring Batch Tasks with Spring Scheduler

Posted by Dalia Sinclair

Introduction

In a recent project I worked on for a client, I needed to design a system to synchronise updates from one database to another, in a brittle architecture which already saw high usage from daily traffic. This is how I set up a recurring batch job to perform the synchronisation outside of business hours, without interfering with normal traffic.

Problem

The problem we started with was that we were storing user identities in one database, and performing most business operations in another one. The other database still required most of the PII data from the identity database however, meaning that synchronisation was required. This mostly travelled in one direction, from the identity database to the business one. On certain occasions, however, changes were made to the business database, which needed to be synced to the identity one. The existing solution had some issues, as will be seen, which needed to be remedied.

Existing Solution

The existing solution involved an extra table in the business database, populated using SQL triggers, which contained information about changed records. A scheduling service would periodically fetch these records and place them in a queue. These records were then picked up by a batch job service, which then attempted to sync all these records with the identity database at once. The syncing was done via a microservice which interfaced with the identity database.

This approach naturally resulted in numerous 429 errors from the identity database, which had a comically low limit on API calls. This was a likely cause for a number of outages which would occur weekly on the identity database, causing disruption to normal user traffic.

Spring Scheduler Solution

Our new solution had two main requirements. The first was that the calls to the identity database would be made out of hours, to avoid disrupting user traffic. The second was that the API calls should be made at a rate of no more than two per second, which was the limit set by the identity provider.

We initially considered using some cloud infrastructure, involving lambdas and schedulers, SQS queues and so on. These would ultimately all pass the handling of the requests to the microservice which interfaced with the identity provider.

Since this service was still going to handle all the requests, we ended up deciding to use this service to handle the entire process, which massively simplified the architecture. Now all that was required was to design the code that would manage this process.

Overview

The high level overview of our solution is as follows:

The Spring Scheduler library uses our configuration to run a job on a schedule, much like a cron job
This job gets locked, using the Shedlock library, to ensure that only one running instance can perform this job at a time
The Spring Retry library provides the ability to retry API calls with exponential backoff in case the API gets overloaded
The Google Guava library is used to provide simple rate limiting, to ensure that the API limits are observed
The business logic is all contained in the classes managed by Spring Scheduler. The class fetches the records from the business database table, and syncs them with the identity database

What follows is a detailed description of the usage of each of these libraries. A demonstration of the code in this project can also be found here. This demo only logs messages to the command line, but it illustrates most of the same principles.

Scheduler

Since the microservice in question was now going to handle the entire process of fetching records from the business database and syncing them with the identity database, the service needed to manage all the other project requirements as well. For scheduling the tasks to run outside of business hours, the Spring Framework provides a library which perfectly fulfils this need.

Spring Scheduler provides a convenient SpEL interface, allowing the user to specify cron expressions in properties files. (The same applies to most of the other libraries used in this project.) The Spring Guides contain a helpful tutorial walking the user through scheduling a task.

The class which contains the scheduled tasks must be annotated with @EnableScheduling. Alternatively, this annotation can be placed on the Application class, if scheduling will be performed on multiple classes.

The cron expressions supported by the Scheduled annotation allow for 6 fields, as opposed to the standard 5 in a Unix cron job. That means we can specify precision in seconds instead of just minutes. Details can be found here. (Other alternatives such as a fixed rate or initial delay expression are available too.) For our use case, we supplied a cron expression which started at 10pm and ended at 6am, and ran every five minutes, leading to the expression "0 */5 22-23,0-5 * * *". The full annotation can use SpEL expressions, with the cron statement and timezone stored in a properties file, like this:

@Scheduled(cron = "${scheduling.tasks.schedule}", zone = "${scheduling.timezone}")

The job is set to run from 10pm to 12am, and from 12am to 6am. The notation here might look confusing if you’re less familiar with cron, as I was when I started this project, but simply writing 22-5 will cause an error where the end hour is less than the start hour, which is why this list is needed. The job will run during the full hour specified as the end hour, meaning that the last job will kick off at 5:55am and run to completion, after which no further jobs will start. If you provide the value 0-6, using the hour you want the job to finish as the end hour, it will kick off a new run at 6am, leading to the job running for an hour longer than desired.

It would be advisable to run this job overnight in a non-production environment with logging, to ensure that the expression functions as expected.

The other important consideration in setting up the scheduling was ensuring that we picked up the right number of records to process every five minutes. Since the job only ran every five minutes, if we picked too few records, the service would be idle for the remainder of that time period. If we picked too many, the job would not be finished at the end of the five minutes, meaning the scheduled method would not return, so the scheduler would skip the next iteration, leading to further downtime. As we will see when we discuss Shedlock, there would also be a risk of two instances starting the job in parallel, leading to an excessive load on the external API.

The scheduled method, which runs every five minutes in our case, is the one which handles fetching records from the business database. The method fetches a parameterised number of records, and processes them one at a time, invoking the retryable method which actually calls the identity database. More on that later.

Shedlock

Shedlock is a library which solves the problem of concurrent execution in distributed services. For our purposes, we were able to use it to ensure that only one instance of our service would start the job at a time.

Shedlock works by writing an entry to a pre-configured database table, containing the name of the method being locked, the time at which the job was started, and the length of the time the method should be locked for. If an instance attempts to start a job, it will first check this table, and will only begin if the lock_until value is earlier than the current time.

For our purposes, this led to the following annotation on each locked method:

@SchedulerLock(name = "runTask",
   lockAtLeastFor = "${scheduling.shedlock.lockAtLeastFor}",
   lockAtMostFor = "${scheduling.shedlock.lockAtMostFor}")

The name provided in the annotation needs to match the name of the method it annotates. This is the method which contains the business logic of calling the external API.

This article provides a very detailed explanation of how the values in this annotation are used by Shedlock. The short version is that if the instance running the job has finished, it can update the lock_until value in the table to the current time, allowing the job to be started again by any instance. Otherwise, all other instances must wait until the lockAtMostFor time has elapsed.

It’s very important to ensure that a large enough period of time is selected for the lockAtMostFor value, otherwise multiple instances could end up executing the same job at the same time. On the other hand, it is still important to ensure that this timeframe is not too long. If the running instance crashes for some reason, another instance will only take over after the lockAtMostFor value has elapsed, leading to some idle time. Further discussion can be found in the Shedlock README.

The format for the time values in Shedlock configuration appears to be a string containing a number followed by the letter ‘m’. I couldn’t find an explicit discussion of this format in the README, so it’s not clear if any format other than minutes is available.

As with the @EnableScheduling annotation, an annotation is required over the class, like this:

@EnableSchedulerLock(defaultLockAtMostFor = "10m")

The default value in this annotation can be used in case no specific value is configured.

LocalStack

For the purposes of testing Shedlock, a DynamoDB table can be set up locally, provided by localstack. Obviously in a production environment you will need to connect to an AWS account. Shedlock supports a number of other database providers as well. The details can be found in the README.

What follows is an example configuration for a Shedlock lock provider, using DynamoDB:

@Bean
public DynamoDbClient dynamoDbClient(
final @Value("aws.dynamodb.accessKey") String accessKey,
       final @Value("aws.dynamodb.secretKey") String secretKey) {
   AwsCredentialsProvider credentials = StaticCredentialsProvider
           .create(AwsBasicCredentials.create(accessKey, secretKey));
   return DynamoDbClient.builder()
           .region(Region.AP_SOUTHEAST_2)
           .credentialsProvider(credentials)
           .endpointOverride(
URI.create("https://localhost.localstack.cloud:4566"))
           .build();
}

@Bean
public LockProvider lockProvider(DynamoDbClient dynamoDbClient) {
   return new DynamoDBLockProvider(dynamoDbClient, "Shedlock");
}

As can be seen, one bean needs to provide the connection to DynamoDB, and the other bean needs to provide a LockProvider. The LockProvider is used automatically by Shedlock in the background. Once a method has been annotated, nothing else needs to be done to ensure that it gets used. The only other step required is to create a DynamoDB table.

The localstack CLI makes it easy to get a DynamoDB table provisioned and running. The docs for getting started contain all the instructions to get localstack installed and started. You will need docker installed and running. Localstack credentials should not be needed. Ensure that localstack is running, then install awslocal, which is localstack’s CLI tool that uses the same interface as the AWS CLI, and allows you to run the same commands against localstack.

Once awslocal is installed, I used the following command to create the required table:

awslocal dynamodb create-table --table-name Shedlock \
--key-schema AttributeName=_id,KeyType=HASH \
--attribute-definitions AttributeName=_id,AttributeType=S \
--billing-mode PAY_PER_REQUEST \
--region ap-southeast-2

Unfortunately the documentation for Shedlock seems to leave a few gaps. I needed to figure out the required schema through some googling. I only found mention of the partition key, _id, through a StackOverflow question. Once this table is created, Shedlock should function in the application without any issues.

Retryable

There are a couple of other issues that need to be addressed in setting up this process. The first is the issue of retrying an API call if the first attempt fails. For this we can use the Spring Retry library. This appears to have no official affiliation with Spring, but is a widely used solution.

Spring Retry can be configured using annotations, as described in the project’s README. The annotations allow for SpEL configuration, meaning that the parameters could all be controlled from the properties file. An annotation might look like this:

@Retryable(retryFor = Exception.class,
maxAttemptsExpression = "${scheduling.retry.maxAttempts}",
backoff = @Backoff(
       delayExpression = "${scheduling.retry.delayMs}",
       multiplierExpression = "${scheduling.retry.backoffMultiplier}"
))

The type of exception can be set in the annotation, meaning that calls could be retried only for a specific type of HTTP error for example. The rest of the annotation merely controls the number of attempts and the backoff behaviour. The default number of attempts is 3, which worked for our use case, but I prefer to make the configuration explicit, allowing us to easily tweak it in future, and avoiding any negative impact if the library default changes in future. The values used in these parameters could even be set in the AWS Parameter Store if so desired, allowing developers to adjust the values and reload without having to redeploy.

The other important thing to note about this annotation is that it works using Spring Aspects. This means that it needs to be created in a separate class from the Scheduler.

One other annotation is available from this library, namely @Recover. A method annotated with @Recover will run if maxAttempts is exceeded and the main method does not return successfully. This allows us to create the following method in addition to the annotated Retryable method:

@Recover
public void recover() {
   log.error("Failed to sync record");
}

In the previous solution to our problem, the failed records were being written back to the business database with a failed status straightaway. With the use of Spring Retry, we were able to ensure that each record was attempted multiple times, allowing for the occurrence of 429 HTTP errors. The exponential backoff also made it more likely that we could recover from those errors, giving the external service more time to recover. Only once this had failed, would our recovery method be invoked, making this solution much more robust.

RateLimiter

The final piece in our solution was the rate limiter. This was provided by Google’s Guava library. The Java docs can be found here. This library appears to be in beta, and has been for a long time, but it has worked without issue for our solution. An alternative library which may have suited this purpose equally well would be the rate limiter from Resilince4J, but we already used the library provided by Guava, so I didn’t research alternatives at the time.

The use of this library is simple, at least in our use case. Firstly a bean is created which returns a RateLimiter:

@Bean
public RateLimiter loggingRateLimiter(
       final @Value("${scheduling.rateLimiter.maxRatePerSecond}") int maxRatePerSecond) {
   return RateLimiter.create(maxRatePerSecond);
}

This doesn’t have to be a bean. The RateLimiter could be created in the class it is used in, but a bean allows for reuse.

Obviously the maxRatePerSecond value, which is an int or double, should not exceed the maximum allowed rate for the external API being called. Again, setting this in the AWS Parameter Store can allow for easy tweaking if the service is getting overwhelmed.

Once a RateLimiter has been injected into the class using it, usage is as simple as calling rateLimiter.acquire();. This call needs to be made inside the method annotated with @Retryable, which is where all business logic takes place.

Configuration

Before wrapping up, I want to provide two final snippets of configuration, to make it clear how this project works. Firstly, the dependencies in the build.gradle file look like this:

dependencies {
   implementation 'org.springframework.boot:spring-boot-starter'
   implementation 'net.javacrumbs.shedlock:shedlock-spring'
   implementation 'net.javacrumbs.shedlock:shedlock-providerdynamodb2'
   implementation 'software.amazon.awssdk:dynamodb-enhanced'
   implementation 'com.google.guava:guava'
   implementation 'org.springframework.retry:spring-retry'
   implementation 'org.springframework:spring-aspects'
   testImplementation 'org.springframework.boot:spring-boot-starter-test'
   testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
}

Be sure to define the version numbers either in the dependencies section or in another section of gradle configuration. The latest version numbers can be found in the Maven Repository.

Finally, the application.yml file looks like this:

aws:
 dynamodb:
   accessKey: "access key"
   secretKey: "secret key"
scheduling:
 tasks:
   schedule: "0 */5 22-23,0-5 * * *"
 timezone: "Australia/Melbourne"
 rateLimiter:
   maxRatePerSecond: 1
 retry:
   maxAttempts: 3
   delayMs: 1000
   backoffMultiplier: 2
 shedlock:
   lockAtLeastFor: "5m"
   lockAtMostFor: "9m"

Obviously don’t actually store AWS access keys in plain text, this was only done for use with Localstack. These values can all be stored in the AWS Parameter Store (or Secrets Manager for the access keys), or an equivalent service, allowing for easier adjustment and redeployment.

Conclusion

To recap, we needed a solution which would synchronise records only after hours, with exponential backoff, only calling the API as often as permitted, and ensuring that only one instance ran this job at a time.

The Spring Scheduling library allowed us to use a cron expression to run the job after hours. Shedlock ensured that only one instance would run this job at a time. Spring-Retry handled retrying with exponential backoff and a recovery method. And the Google Guava library provided easy rate limiting.

The class which handles scheduling fetches records, and invokes a method on the class which handles retrying, in order to call the external API with exponential backoff capability.

Managing Recurring Batch Tasks with Spring Scheduler

02 Oct 2024 Managing Recurring Batch Tasks with Spring Scheduler

Introduction

Problem

Existing Solution

Spring Scheduler Solution

Overview

Scheduler

Shedlock

LocalStack

Retryable

RateLimiter

Configuration

Conclusion

No Comments

Leave a ReplyCancel reply

Menu

Get in touch

Connect with us

Doug

Marcela

Trudi

Joy

James

Managing Recurring Batch Tasks with Spring Scheduler

02 Oct 2024 Managing Recurring Batch Tasks with Spring Scheduler

Introduction

Problem

Existing Solution

Spring Scheduler Solution

Overview

Scheduler

Shedlock

LocalStack

Retryable

RateLimiter

Configuration

Conclusion

No Comments

Leave a ReplyCancel reply

Menu

Get in touch

Connect with us

Discover more from Shine Solutions Group