
03 Jan 2017 Highly concurrent applications with Java and Akka
There are applications that execute such complex tasks that if they didn’t use a concurrent processing model, they would be so slow as to be unusable. This group of applications includes data analytics, real-time games and recommendation systems.
Even with modern programming languages that support concurrency, we are faced with the task of coordinating multiple threads, handling synchronisation and the constant possibility of race conditions. These make it difficult to write, test and maintain code, discouraging many developers from implementing better and faster solutions for their problems.
In this blog entry we are going to take a quick look at the Akka toolkit, its main concepts and some code examples in Java. For further information about this topic, please check the official documentation at http://akka.io/
What is akka?
Akka is an event-driven middleware toolkit for building high performance and reliable distributed applications in Java and Scala. Akka decouples business logic from low-level mechanisms such as threads, locks and non-blocking IO. Our Scala or Java program logic lives in lightweight actor objects which send, receive and process messages. Thanks to Akka, we can easily configure how actors will be created, destroyed, scheduled and restarted upon failure.
They define themselves as the following:
“We believe that writing correct distributed, concurrent, fault-tolerant and scalable applications is too hard. Most of the time it’s because we are using the wrong tools and the wrong level of abstraction. Akka is here to change that. Using the Actor Model we raise the abstraction level and provide a better platform to build scalable, resilient and responsive applications—see the Reactive Manifesto for more details. For fault-tolerance we adopt the “let it crash” model which the telecom industry has used with great success to build applications that self-heal and systems that never stop. Actors also provide the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.” Source: What is akka?
What is an Actor?
Key to Akka is the concept of an ‘Actor’. An actor is basically an object that receives messages from another actor (the sender), and processes them in sequence. All of them – the message, sender and actor – are completely decoupled from each other.
All messages are sent to a reference of an actor (called an ActorRef, we will see more of later) which maintains a Mailbox, which is like a message queue. In fact, we can say that an actor is a small programmable message queue. These messages are temporarily stored in the actor’s mailbox to be later processed in the same order they arrived.
Actors are lightweight components that do not map directly to VM threads. Akka’s documentation specifies that the toolkit could handle around 2.7 million actors in just 1 Gb of memory, whereas in a normal machine we could only create up to 4096 threads. In a standard Akka application, it would be common to create millions of actors, although many of those will not execute any actions unless a message is sent to them.
Akka actors receive messages one at a time and, unlike traditional message queues, they can send messages to other actors (known as cross-actor communication). Everything executed by an actor is executed in an asynchronous way, so most of the senders can fire a message without waiting for a response.
When an actor receives a message, it can trigger one of the following behaviours:
- Execute an action based on the type of the message. For example, it could store data, calculate a value or call an external system.
- Forward the message to another actor
- Instantiate a new actor based on some specified logic and then, forward the message
- Stash it for later processing
- Just ignore the message
A Simple Akka Actor
Now that we know what an actor is, what does an actor actually look like in Java? Here’s an example:
import akka.actor.UntypedActor;
public class MyUntypedActor extends UntypedActor {
@Override
public void onReceive(Object message) throws Exception {
if (message instanceof PerformCalculation) {
// Use the message payload to calculate some values
calculateValues(message)
} else {
// Execute the default behaviour for un-handled messages
unhandled(message);
}
}
}
All normal actors extend from the UntypedActor class, which provides a bunch of tools with which we can define our actor’s behavior.
In addition to onReceive(), you can override the lifecycle methods preStart() and postStop(). You also have the following methods available for use:
- getSender(): Returns an ActorRef of the actor that sent the currently processed message.
- getContext(): Returns the context of the actor, giving us some information about it and allowing us to create more actors on it.
- getSelf(): Returns an ActorRef of this actor.
Messages
Additionally to actors, the other fundamental piece of the system is the message. A message can be just a simple String object, or a more complex one defining its own behaviour. It’s a good idea to keep them as basic as possible. Generally speaking, we don’t want to put business logic in messages. Instead, we just want them to hold the data we need to transport between actors.
public class MyMessage { private String property; public MyMessage(String property) { this.property = property; } public String getProperty() { return property; } }
In order to understand the whole Akka ecosystem, we need to comprehend both the basic structure of an actor and the environment it lives in. This environment is called the Actor System.
What is the ‘Actor System’?
In Akka an actor can create another actors, but who creates the first actor?
The answer is the Actor System. The first thing any Akka application must do is to create the Actor System and then a few High-Level Actors that are able to handle the first incoming messages and build the entire actor hierarchy. In order to have an easy-to-understand actor hierarchy, it is considered good practice to create just one high-level actor . This actor system will provide the infrastructure that the actors use to communicate and interact with each other. It must be as simple as possible.
Cross-actor communication can only happen using an ActorRef, which is a reference used to give other objects limited access to an actors context (i.e. its internal state). In order to send a message to an ActorRef, we can use one of the following calls:
- tell: This is the preferred way of sending messages. No blocking waiting for a message. This gives the best concurrency and scalability characteristics. It sends a message and returns immediately (Fire and Forget).
- ask: The ask pattern involves actors as well as Futures, hence it is offered as a usage pattern rather than a method on an ActorRef. Basically, it sends a message and returns a Future object that represents a possible response.
Remember, tell will fire and forget, so no return value is expected. For example, this actor will process any message of type “MyMessage” and send it back to the sender of the message.
import akka.actor.UntypedActor; public class MyUntypedActor extends UntypedActor { @Override public void onReceive(Object message) throws Exception { if (message instanceof MyMessage) { // Returns the message to the sender getSender().tell(message, getSelf()); } else this.unhandled(message); } } }
The same actor can have multiple instance variables while processing a message. Behind the scenes Akka will run sets of actors on sets of threads, where typically many actors share one thread, and subsequent invocations of one actor may end up being processed on different threads. Therefore, the state of each actor can be safely preserved without any additional code such as method synchronisation or latches/semaphores to avoid race conditions.
A bit of code
The best way to wrap up this whole idea is with a piece of code that uses the toolkit, so I’ve written a very small Java application that analyses a server log to count the number of requests per IP address. The idea is to access the file concurrently using actors, count the number of appearances of the same IP and display a table as a result.
The demo code is available at: https://github.com/Fabbbrrr/file-reader-akka
I’ve also integrated the same demo application with Spring. That version can be found here: https://github.com/Fabbbrrr/file-reader-akka-spring
My experience with akka
Broadly speaking, the problems I faced whilst developing of a real world application with akka mostly resulted from my poor understanding of the documentation, lack of examples for the particular case I was resolving, and the high complexity of the business rules I was implementing. There were a few important lessons I learnt along the way, which I will try and outline here.
Understanding the problem
First things first: I cannot stress enough how important is to understand the problem as a whole before start adding tools and frameworks to solve parts of it. Whilst this is good advice for software development in general, it’s even more important if you are going to use something like akka, which relies heavily on you doing things properly.
In short, actors are like little workers and you have to assign to them a very specific job to do. If you stray outside of this model, you will run into trouble.
RxJava or Akka?
In early stages of the development, we made some bad decisions on the tooling we should use to solve the problem led us to very inconsistent, difficult to understand and hard to maintain code. Specifically, we mixed concepts from both RxJava (http://reactivex.io/ and https://github.com/ReactiveX/RxJava) and akka in the same layer of the application.
In the end, the team decided to take just one approach (akka), which greatly simplified the code and, surprisingly, improved the performance of the app by a good margin. In our case, it seemed that using RxJava observables required much more memory than an actor model when dealing with millions of actors/observables.
This does not mean that RxJava is bad in general. It just means that for the problem the team was trying to solve, just one of the toolkits was necessary. They provide similar tools to solve similar problems, but in our case akka was the best fit.
The lesson here was to keep it simple and pick the right weapon for the job. A long bow and a sling are difficult to handle at the same time :).
Memory issues
We found that under a medium load (5k concurrent users), our application sometimes ran out of memory for no apparent reason. After some days of research, the team discovered that was a problem of backpressure (“whoa, mate, slow down with all that, I can’t take it that fast.”).
The actors’ mailboxes were constantly increasing due to the lack of available threads to process all the messages for all actors. As the mailboxes increased so did the memory consumption, until at some point we just ran out of memory.
Logs
Too many logs can be as bad as none at all. Several times our system failed purely for log-related reasons. It ran out of disk space, experienced high CPU usage, or remote logging just became unresponsive.
The causes were varied. Sometimes we forgot to turn off debug logging in the load-testing environment. Sometimes we were using string concatenation to assemble log entries, which can have a noticeable effect when the code is being executed by hundred of thousands of actors. Sometimes we were just logging at an inappropriate level.
Superfluous logging was also a problem for us. For example, logging entry to two branches of a binary conditional:
LOGGER.info("Checking var value"); if (var) { LOGGER.info("In branch 1"); ... } else { LOGGER.info("In branch 2"); ... }
Supervision strategies
Mixing business logic with supervision is hard. Actor hierarchies and the supervision strategy of a parent’s actor are the means to achieve this separation. A supervision strategy gets the exception that caused the child actor to fail and then decides what to do next. Specifically, it decides whether to:
- Resume the child, losing the faulty message
- Restart the child, losing any child state
- Stop the child
- Escalate the exception to its own supervisor (which then decides what to do about the parent itself, and not only about its child)
Thanks to the preRestart, postRestart and postStop lifecycle hooks, it is possible to let an actor do cleanup work when it is restarted or stopped. By using these functions, cleaning up resources is somehow straightforward and generic enough to help the developers focus in more important tasks around the business logic.
Conclusion
In this article I have tried to demonstrate how Akka can help us to develop better concurrent Java (and Scala!) applications. However, I have scratched just the surface of what Akka has to offer. There is a lot more to learn, for example Mailbox Prioritisation, Typed Actors, Scheduling, and the Actor lifecycle, among many more topics.
Personally, it was a really nice learning experience for me, and I would consider Akka in the future without any doubt. Understanding how a problem can be easily solved with the right tool – ie, one that was ‘meant for it’ – gives developers great power. I feel that Akka was just the right tool for our particular problem, and look forward to using it again.
References
Sample code: https://github.com/Fabbbrrr/file-reader-akka
http://doc.akka.io/docs/akka/snapshot/intro/what-is-akka.html
http://doc.akka.io/docs/akka/2.4.16/java.html
https://en.wikipedia.org/wiki/Actor_model
http://www.reactivemanifesto.org
http://www.slideshare.net/RoyRusso1/introduction-to-akka-atlanta-java-users-group
Dmitry Pokidov
Posted at 06:25h, 04 JanuaryVery nice article. Will definitely consider akka for high concurrent application. How did you solve memory issues (slowed down akka) at the end?
Fabricio Milone
Posted at 07:56h, 04 JanuaryThanks Dmitry!
The memory issues were related to an increasing mailboxes. The actors received too many messages that they couldn’t process in acceptable time, so as our mailbox was the default one, without any limit, they increased the memory consumption as they stored more and more messages to be processed by the actor. Imagine that scaled up to millions of actors and you will have a memory problem.
The solution was to increase the number of threads that akka make available to the actors to use, using more CPU (which usage was low, around 20%) but solving the increasing mailboxes problem.
Dmitry Pokidov
Posted at 06:53h, 05 JanuaryThanks for a quick reply. Increasing number of threads to load CPU sounds like a good idea!
Pingback:TEL monthly newsletter – Feb 2017 – Shine Solutions Group
Posted at 21:06h, 08 March[…] Milone blogged about using AKKA in highly concurrent Java applications. Building concurrency into any application has always been a complex task – especially in […]