A leading online learning provider recently engaged us to add social features to their website so that students could more easily collaborate, and thus be more successful with their study. One of the main social features required was providing students with the ability to form ‘buddy’ relationships, and then to initiate persistent chat sessions between each other. In this blog post, I will describe how we used the Atmosphere Asynchronous WebSocket/Comet framework and the Hazelcast In-Memory Data Grid to build this chat implementation.
“We are the choices we make”
― Patrick Ness, The Knife of Never Letting Go
Before we began developing the chat component, we outlined what would be required both technically and functionally. It was decided that the component needed to:
- Provide real-time messaging,
- Provide a complete history of all chat messages,
- Provide a custom interface that matched the look and feel of the existing site
- Integrate seamlessly with existing authentication and authorisation mechanisms,
- Be easy to implement, and
- Be scalable.
It was also decided that the chat component should not require any additional server software, hardware, or runtime – in effect that it would form an integrated part of the existing Spring Framework website rather than a bolted on feature. After some research, we found that we had a number of choices for a chat starting point, including XMPP (e.g. OpenFire), Node.js, Vert.x, and Atmosphere. In isolation, choosing Atmosphere may not seem like the best option, however once we factored in the above requirements, it soon became clear that Atmosphere ticked all the boxes.
“The secret of getting ahead is getting started.”
– Mark Twain
The project timeline was tight, so we got started right away with a quick spike to make sure the technology choice was sound. Using the Atmosphere Spring chat example, we were soon able to perform simple chat messaging between two browsers via a server, and so began developing our final solution.
There are a number of ways to provide persistent connections over HTTP. Web Sockets and Long Polling were the two main approaches appropriate to our project. With the Web Socket approach, both the browser and server need to provide support, and the web socket is kept open for two-way communication between client and server. The Long Polling approach uses a GET request that is never closed (or closed after a set time) unless a message is sent from the server. This approach only requires server support.
Due to existing customer requirements around supported browser versions, it was decided that chat should only support long-polling. This required us to use a HTTP GET request from the browser for server-to-browser messaging, and a HTTP POST request for browser-to-server messaging. Essentially, any messages sent by the client are POSTs to the server, while the long poll request from the client is used by the server to send messages to the client.
The chat system needed to handle two different message types: Presence messages and Instant messages. Presence messages are fired to a students buddies whenever that student goes offline or comes online. Instant messages are text messages sent between students.
We used a simple JSON format for the message payload, and Jackson object mapping to parse messages on the server side. Once we have a message object, either Presence or Instant, a handler is resolved for it. In the case of the Presence message, a single handler was required to simply set the user’s status to online or offline, and to notify all their buddies. In the case of the Instant message, each one was passed through a series of message handlers:
StudyBuddyCheckMessageHandler -checks that the sender and receiver are actually study buddies
EscapingMessageHandler– removes unwanted and unsafe characters
ProfanityFilterMessageHandler– flags a message as containing profanity and thus rejects the message
PersistingMessageHandler– stores the message in the database
ForwardingMessageHandler– broadcasts the message to the receiver if they are online
All browser-server interaction was handled via a single Spring Controller. The POST method provided the message handling as described above, and the GET method provided the long-polling functionality. Here were the steps involved:
- Use Meteor to wrap the incoming HTTP Request and retrieve an
- Get the student’s ID and send an online presence if they are currently offline
- Add a listener for the
onDisconnectevent to set the user’s presence to offline and to clean up resources and broadcasters
- Get the broadcaster for the ID and attach the current resource to it
- Suspend the resource using
resumeOnBroadcastso that the GET will complete if a message is sent from the server
- Resume any other connected resources as these are now out of date
“We are not cisterns made for hoarding, we are channels made for sharing.”
– Billy Graham
As the final solution was to be run in a cluster, we needed a way for a message sent from a user on one node to be received by a user on another node. The client was already using Hazelcast in their production environment, so we piggybacked on it by using the Atmosphere Hazelcast Broadcaster. Each time an operation requests a broadcaster for an Id, our custom factory returns a hazelcast Broadcaster that is listening on a Hazelcast
ITopic<String> attached to the unique student id.
“As far as the customer is concerned, the interface is the product.”
– Jef Raskin
The left hand pane contains a list of the user’s buddies, showing name, avatar, and online/offline indicators for each. When a buddy is selected, the right hand pane is populated with an ‘infinite scroll’ list of the chat history. New messages to and from the selected buddy are instantly updated into the chat window. If a message is received from an unselected buddy, an indicator with the number of unseen messages is shown for that buddy.
For our solution, the main event we were interested in is
onMessage, which means that a message has been received from the server. When our function is called, we parse the JSON from the server into one or more of our Backbone
Message objects, and then handle these depending upon their type. If the message is a presence message, we update the status of the buddy that the message is from. If the message in an instant message, we:
- Update our counters,
- Store the message in our local cache, and
- If the buddy is selected, add the message to the bottom of the right hand chat pane
For the sending of a message, when the user clicks the ‘Send’ button we create a Backbone
InstantMessage object and populate the message text from the text field. We then simply perform an AJAX POST request containing the JSON representing the message.
“Harpists spend 90 percent of their lives tuning their harps and 10 percent playing out of tune.”
– Igor Stravinsky
Once the solution went into production, some fine tuning was needed. After a few days, we found that the production servers were becoming unstable after a period of time. Specifically:
- The server became unresponsive when accessed via the web
- The application server (Tomcat) was responsive when accessed directly
- When viewed via a JMX console, the number of AJP connections had reached the maximum limit
- Errors were observed in the Apache log
After investigation, the cause was found to be that the AJP connection pool was becoming exhausted due to the large amount of paused chat connections. While it is normal that Atmosphere holds a connection for each connected client, the number of connections was far larger than the clients connected, which indicated that there were stale connections that were not being timed out and released.
As the AJP pool could no longer make connections, new web requests could not be processed. While we were not able to determine exactly what caused the AJP chat connections to become stale. We suspect that it was due to either an older unsupported browser type or a network disconnection from the client side.
We noticed that the server configuration was using the
AjpProtocol connector to make connections between Apache and Tomcat. This is a blocking protocol, which means the thread will wait until either processing completes, or it times-out. By replacing this protocol with the newer
AjpNioProtocol, which is a non-blocking protocol, we were able to resolve issue.
We also needed to tune the number of threads used by Atmosphere. Initially, we were using a sharable thread pool for the broadcasters (
shareableThreadPool=true), and were limiting the number of Async Write threads to 20 (
maxAsyncWriteThreads=20). However, we found that over time, messages were not being delivered to clients.
The root cause was that the broadcasters were unable to get a thread from the limited pool. We changed the Async Write threads to be unbounded (
maxAsyncWriteThreads=-1) and stopped using the shared thread pool (
shareableThreadPool=false). This allowed each broadcaster to have its own set of threads, which resolved the issue.
“Every choice you make has an end result.”
– Zig Ziglar
By choosing Atmosphere to underpin our chat implementation, we were able to deliver a powerful, robust, real-time chat solution in a very short amount of time. In addition, leveraging Hazelcast allowed us to distribute messages across a cluster with a minimal amount of effort.