Improving cold start times of Java AWS Lambda functions using GraalVM and native images

cold snow internet winter

Improving cold start times of Java AWS Lambda functions using GraalVM and native images

TL;DR

We’ve all heard that Lambdas written in Java are slow, but there’s more that we developers can do to help improve Java Lambda execution times, especially from a cold start. There can also be significant improvements made from a warm startup.

Java’s 0.7s cold start time isn’t as bad as it once was, but compared to JavaScript‘s 0.16s it may be a touch slow, especially when chaining several Lambdas. We can narrow the gap between Lambdas written in Java and JavaScript by using the Quarkus framework for our Java Lambda project to compile it Ahead Of Time (AOT) and create a native image and custom runtime using GraalVM. Running a native image and custom runtime can yield cold start times of around 0.31s, eclipsing a Java Lambda running on Amazon’s runtime, and significantly closing the gap to JavaScript Lambdas.

In short, it’s worth checking out if you’re interested in writing Lambdas in Java.


Introduction

Java has been around for a long time, and is a trusted language for use in large scale server applications due to its secure compiled code, ease of coding and extensibility. Over the years however, we have seen a rise in the popularity of JavaScript for use in server applications, with Node.js and, more recently, the advent of Serverless Functions (or Lambdas in AWS speak). It’s even easier to code in, and many say it runs quicker than Java.

Memory

It seems the main reason for the popularity of JavaScript in Lambdas is due to its speed of execution, especially the cold start times of Serverless Functions, and its overall memory consumption. And let’s face it, Java can be a pig when it comes to the way that it handles memory. Its JAR files alone can be huge just to load its dependencies before it even runs. Reducing the memory allocated to the Java Serverless Function can slow it down, or in extreme circumstances, not allow it to load at all!

Serverless Functions are also charged for every GB-second consumed, so a small memory footprint and fast execution time is key to reducing costs, and JavaScript covers both these cases just fine.

Execution time

Increasing the memory allocated to a Serverless Function can have the side effect of increasing the speed that it takes to execute, but it doesn’t necessarily make it cheaper because cost is a function of both time and memory.

JavaScript seems to have the edge on Java in both aspects here.

Compiler and runtimes

Java uses a Just In Time (JIT) compiler which, as its name suggests, initially compiles the bare minimum of what is needed to start the application, and then compiles additional classes as needed while it runs. This gives it the benefit of being able to run classes efficiently, only loading the classes it needs as it uses them, and allows the use of reflection. This flexibility also slows the application down on its first run. Let’s just reiterate that: the application is slow when it executes a section of code for the first time, due to the JIT compiler compiling the next section of code before executing it. On subsequent runs, Java is much faster.

JavaScript, on the other hand, doesn’t have a compiler. It’s an interpreted language, so it gets loaded immediately then runs through an interpreter which executes the code.

Let’s put this in the context of Serverless Functions. These are small snippets of code that are run on compute capability somewhere in the cloud. You don’t need to manage it, you don’t need to keep it running, and you don’t get charged for a server that isn’t running. Pretty neat, huh? Yes, one of the biggest perks of Serverless Functions is that you only pay for the time that they actually run.

Lambda lifecycle

However, Serverless Functions don’t necessarily hang around waiting to be called. They get loaded and unloaded into any available server before being executed. The loading of a function into memory before execution is called a “Cold Start”. Once a Serverless Function is loaded once, it just needs to be executed for any subsequent calls, and doesn’t suffer from the loading time for each one. This is known as a “Warm Start”.

This sounds awfully like Java!

Example problem

We’ll develop a Serverless Function that accepts two parameters in a body, a name and a greeting, and returns a message in the format “<greeting> <name>”, unless the name is “Stuart” (apologies to all the Stuarts out there, it’s nothing personal). If the name is “Stuart”, the message “Can only greet nicknames” is returned instead.

The goal is to see which language and runtime will execute Serverless Functions the quickest and most efficiently, ultimately costing us the least.

Preliminary results

If we were to write this Serverless Function in both JavaScript and Java, and deploy them each to AWS Lambda, we might see response times like this:

Initialisation
(ms)
Cold Start Duration
(ms)
Warm Start Duration
(ms)
Max Memory
(MB – Cold/Warm)
JavaScript150.2288.6541.15165/66
Java 11 – Vanilla AWS
(Amazon Corretto RT)
197.768456.9141.71492/97
JavaScript vs Java Lambda Timings

Over the average of 5 test executions, we can see that Java and JavaScript Lambdas actually run quite similarly when warm, though Java uses an extra 50% memory. But Java Lambdas really struggle when they’re cold. This is a known fact that often causes people not to choose Java when writing Lambdas.

The results above also demonstrate the differences between the runtimes: JavaScript interprets the code and executes it immediately, whereas Java requires a long time to load all its libraries and compiles the code in a JIT fashion. Once the Java Lambda has warmed up (i.e. fully loaded and compiled), it’s barely 500µs slower on average than JavaScript.

A note to take into account about the Lambdas: these have been written using the instructions and libraries as directed by AWS.

In a warm production environment, 500µs is likely an acceptable delay, but it’s the cold start times that just don’t compare. About 650ms isn’t too bad for a cold start, but Java still has a long way to go if it’s going to beat our Hare.

Time for JavaScript to have a little break. It’s got this race in the bag!

GraalVM

So we know that Java can just about keep up with JavaScript when they’re both warm. But how can we speed things up from a cold start?

Enter GraalVM.

GraalVM is a high-performance JDK distribution designed to accelerate the execution of applications written in Java and other JVM languages along with support for JavaScript, Ruby, Python, and a number of other popular languages.

graalvm.org/docs/introduction

Yep, that’s great, but how does it help poor Java speed up?

That’s where its Native Image Runtime Mode comes to the fore. GraalVM has the ability to compile Java code into a standalone binary executable. The Java bytecode that is processed during the native image build includes all application classes, dependencies, third party dependent libraries, and any JDK classes that are required. A self-contained native executable is generated, specific to each individual operating system and machine architecture, that does not require a JVM to run.

So essentially, it bundles what is required into its own binary rather than bytecode, thus eliminating the Java runtime. It does this by compiling the application, in our case a Serverless Function, Ahead Of Time (AOT) so there is no further compilation required at runtime.

But how do we incorporate GraalVM into our Lambda preparation?

Coding using an alternative framework: Quarkus

Let’s try an alternative framework to improve our Lambda execution times. Quarkus claims to fit the bill, with its claim to be “Supersonic Subatomic Java”, its super fast stats and charts, and Developer Joy! Look no further!!!

Let’s take a closer look at Quarkus briefly just to understand how it can help the Java Lambda achieve Lambda superiority.

Quarkus is a Kubernetes Native Java stack tailored for GraalVM & OpenJDK HotSpot, crafted from the best of breed Java libraries and standards. It is also focused on developer experience, making things just work with little to no configuration.

No longer will Java be the laughing stock of the Lambda world!

Project setup

Quarkus comes with a Maven archetype to scaffold a very simple starting Lambda project.

mvn archetype:generate \
    -DarchetypeGroupId=io.quarkus \
    -DarchetypeArtifactId=quarkus-amazon-lambda-archetype \
    -DarchetypeVersion=2.1.1.Final

If we have a closer look inside, we will see that there is a simple starter project that looks exactly like our example!!! Developer joy indeed!

Let’s build the project.

mvn clean package

This will create your deployment package and a manage.sh script for deploying and updating your application in the /target directory. The manage.sh will get regenerated each time a build occurs.

Tweaks to the manage.sh

Before we go ahead and deploy this Lambda to the cloud, let’s make a copy of the manage.sh script and make some changes to it. This will make deploying the Lambda easier later.

cp target/manage.sh .

Create a Role for executing Lambda Functions and copy its ARN, or copy the ARN of an existing Role. At the top of the script (make sure you edit your copy, and not the script defined in build or you’ll lose your changes at the next build), define a variable called LAMBDA_ROLE_ARN and set it to the ARN you copied earlier.

LAMBDA_ROLE_ARN="arn:aws:iam::1234567890:role/lambda-role"

In the middle of the script are some variables that the script uses for the Lambda Function name (FUNCTION_NAME), handler (HANDLER), runtime (RUNTIME) and zip file location and name (ZIP_FILE). Update the Function name to something more appropriate for the project: GreetingLambdaGraal.

Scroll to near the bottom and update the Function name that will be used for the native implementation.

FUNCTION_NAME=${FUNCTION_NAME}Native

Let’s make the memory allocated to the Lambda easily configurable by adding another variable at the top, and using it in the cmd_create() function in the script.

MEMORY_SIZE=256
...
--timeout 15
--memory-size ${MEMORY_SIZE} \
${LAMBDA_META}

Make sure that your script has execute capabilities.

chmod u+x manage.sh

And finally, ensure you have a payload.json file with your test parameters in the root directory of your project.

{
    "name": "Darrow",
    "greeting": "Hello"
}

Feel free to check the repository for a complete manage.sh file.

Now we’re set!

Execution and results

Firstly, we need to create the Lambda Function in the cloud.

./manage.sh create

From here on in, every time we build this function, we will redeploy it with the following command:

./manage.sh update

Then we can invoke the Lambda from the command line.

./manage.sh invoke

You can also just run the Lambda from the AWS Console, where you will need to ensure that the function parameters are passed appropriately.

After execution, let’s review the logs. We got the following:

REPORT RequestId: 71fd5d84-dcbd-4a95-8991-f0814cf53620
	Duration: 177.15 ms
	Billed Duration: 178 ms
	Memory Size: 256 MB
	Max Memory Used: 164 MB
	Init Duration: 3112.37 ms

After running it a few times and getting the average values, let’s add it to our results table.

Initialisation
(ms)
Cold Start Duration
(ms)
Warm Start Duration
(ms)
Max Memory
(MB – Cold/Warm)
JavaScript150.2288.6541.15165/66
Java 11 – Vanilla AWS
(Amazon Corretto RT)
197.768456.9141.71492/97
Java 11 Quarkus
(Amazon Correto RT)
3158.056151.2281.302164/164
The addition of the timings of the Quarkus developed Lambda running on Amazon’s Corretto Runtime

Over the average of 5 test executions, we can see that the Quarkus Lambdas actually run more similarly to the JavaScript Lambdas when warm, with only a 150µs difference. Even the cold start duration times take only a third of the time of the Java Lambdas. But look at the initialisation duration!! Over 3 seconds! What happened?! The only thing we can conclude from this is that the libraries that are used to produce the Quarkus Lambda package are either:

  • Too large
  • Inefficient
  • Not optimised for Amazon Corretto Java Runtime
  • A combination of any/all of the above

And look at the memory consumption! It uses 100MB more memory than the JavaScript Lambdas. Though Quarkus has shown some promise, it hasn’t quite delivered.

HEY! Why IS it so slow?! Wasn’t GraalVM supposed to speed things up?!

upset young black guy covering face with hand while working remotely on netbook
Photo by Alex Green on Pexels.com

D’OH! We forgot to tell GraalVM to build a native image. We have left out the key ingredient — GraalVM. The race is back on.

Docker GraalVM build

Before we start, make sure that you have Docker running AND that there is enough memory allocated to Docker to do a build (4-8GB should be more than enough). It isn’t enough to pass a max memory value to Maven via the command line, as it’s not the Maven process that needs the memory, it’s Docker that needs it. We will build our image in Docker so that we can compile the native image specific to the environment we will be deploying it to. We plan on running this in a Linux environment, so we’ll use a Linux Docker image.

mvn clean install -Pnative -Dquarkus.native.container-build=true

A word of warning: this process can take several minutes! Even on subsequent builds. Making native images takes time, and Ahead Of Time compilation also takes time. We have to compile the whole application, including all the dependencies. No JIT for us! And your fans (if you have them) may start whirring like crazy. Don’t worry, it’ll return to normal soon.

This command tells Maven to build a native image, and to build it in a Docker container. When it completes, it will have compiled and created a native executable image, along with a generated zip file in target/function.zip. This zip file contains the native executable image renamed to bootstrap, which is a requirement of the AWS Lambda Custom (Provided) Runtime.

HANG ON! Let’s just digest that for a minute. Not only are we creating a binary executable, but we are also providing a custom runtime to run our Lambda function in! Double Supercharge!! JavaScript won’t see us coming!

A friendly note here: if we were developing this in a Linux environment, we wouldn’t need Docker at all, and could’ve just created the native executable image directly on our machines, and wouldn’t need the pesky quarkus.native.container-build=true parameter. But as I am developing on a Mac, I need this additional parameter.

So let’s create a new lambda function, one for the native image.

./manage.sh native create

Note the inclusion of the ‘native’ parameter in the command above.

Final results

Now invoke the native Lambda function…

./manage.sh native invoke

And look at the output…

REPORT RequestId: 1c34d93d-8de4-425d-9297-d4921405ae50
	Duration: 2.82 ms
	Billed Duration: 320 ms
	Memory Size: 256 MB
	Max Memory Used: 76 MB
	Init Duration: 316.57 ms

WHOA!!! That’s fast. Let’s add it to our table.

Initialisation
(ms)
Cold Start Duration
(ms)
Warm Start Duration
(ms)
Max Memory
(MB – Cold/Warm)
JavaScript150.2288.6541.15165/66
Java 11 – Vanilla AWS
(Amazon Corretto RT)
197.768456.9141.71492/97
Java 11 Quarkus
(Amazon Correto RT)
3158.056151.2281.302164/164
Java 11 Quarkus / GraalVM native
(Custom RT)
308.4802.756 🔥0.838 🔥76/76
The addition of the timings of the Quarkus native image Lambda running on a custom GraalVM Runtime
Cold Start times, including initialisation and execution times

Wow! The run durations are incredible, outrunning JavaScript in both the cold and warm starts. We can barely see the execution durations on the chart! But we do still have a bit of a hit during the initialisation, with the GraalVM taking twice as long to initialise the environment as JavaScript. This still equates to less than a third of a second, so that sounds pretty good to me, and that includes the execution of it. It takes a little over half the time to execute from cold compared to a standard Java Lambda.

Warm start execution times

Look at that warm start! Less than 1ms!!! That’s INSANE!!! It beats all of the tested Lambdas hands down.

Memory usage

All of this is also achieved using just 76MB of memory, which is only 15% more than JavaScript. Still a bargain to run.

Conclusion

With the addition of GraalVM and native images, with their AOT compilation and custom runtime, Java Lambdas can be fast, even from cold. The initialisation needs a little longer, but these times are really quite negligible. AOT compilation is something that benefits most other compilable languages, such as C/C++ and Rust, and makes them seem faster than Java. But Java with AOT is every bit as fast as them.

Even Vanilla Java Lambdas are not the sloths everyone makes them out to be. Amazon has worked hard at achieving some form of speed enhancement. Cold starts used to be measured in the seconds (a bit like the non-native Quarkus Lambda above), whereas now, we are talking in fractions of seconds, though we can still provide some help to get those fractions even smaller.

Other frameworks

Quarkus isn’t the only framework out there that integrates well with GraalVM. Other micro-frameworks, such as Helidon allow the creation of native images for use in a GraalVM. Larger frameworks, such as our trusty friend Spring, also have ways of achieving this integration as well (see the Spring Native project).

Final thoughts

I acknowledge that this was a very trivial example (, but I feel it demonstrates the benefits of using GraalVM, native images and AOT compilation.

One last word on AOT. Reflection hasn’t been completely removed from Java applications, but a little more work is required to allow it to happen. You can provide a configuration file with a list of all the classes and all the available methods for use in a reflective manner, so that any existing uses of reflection do not break. We have always been told that reflection is slow, so this is one of those times where you really need to think about whether you really need reflection, especially when you see the benefits of AOT compilation of your code and getting it optimised for production.

branko.minic@shinesolutions.com

I'm primarily a backend developer, with an appreciation of making frontends pretty and usable. I have a passion for Java, Kotlin, Spring and other JVM based frameworks, but will also dabble in JS/node.js, Dart/Flutter, C/C++, and Rust when given the opportunity.

No Comments

Leave a Reply