14 May 2007 JavaOne: Cameron Purdy & ‘The Top 10 Ways to Botch Enterprise Java Scalability and Reliability’
JavaOne’s over now, but there’s still a couple of presentations I want to talk about.
My favorite presenter of the week was Cameron Purdy, CEO of Tangosol. His presentation topic was ‘The Top 10 Ways to Botch Enterprise Java Technology-Based Application Scalability and Reliability’. Tangosol make Coherence, a distributed cache, and were recently acquired by Oracle.
Purdy has done this presentation at a number of conferences, including JavaOne last year, where it was apparently a great hit. For me, his droll humour and technical depth-of-knowledge made for an engaging and informative presentation. However, I know that Mark didn’t get as much from it, although he still found Purdy to be an amusing speaker.
Purdy almost immediately began by urging us to take everything he said with a grain of salt and to always use common sense, for ‘there will be real-world situations in which the principles from this presentation will be wrong’. Indeed, many of the items on his list were actually pairs of opposing viewpoints, each equally wrong.
So what were the top ten?
10. Optimize performance assuming that it will translate to scalability/Ignore the potential impact of performance on scalability (and vice-versa).
I’ve been guilty of confusing performance with scalability in the past and Purdy summarized the difference as follows: performance is about making the single-user scenario faster, whilst scalability is about supporting many users. He presented some fascinating examples of how increasing available resources won’t necessarily achieve the expected performance or scalability increase, especially when you have operations that need to be executed serially. Furthermore, he explained how there are hard physical and mathematic limits on the sort of increases you can expect.
9. Assume you are smarter than the infrastructure/Follow the rules blindly
This includes crimes like sharing mutable state amongst threads when it is difficult to do correctly, not thinking about the transactional implications of using files, building your own connection-pooling functionality and, finally, not using frameworks like Struts, Hibernate or Spring because you reckon you can do a better job of it yourself (and yes, there are still people out there who do that sort of thing). Conversely, he reminded as that at the end of the day, application servers are not magic and sometimes it may be necessary to break the rules to accomplish what you need. An example he gave was the common use of FTP for application integration, with all of its attendant demands on network and file IO resources.
8. Abuse the database/Avoid the database
The ubiquity of relational databases in the enterprise space means that sometimes they are used for things that they just weren’t designed for. We’ve certainly seen in this in our experience as consultants at Shine. Given the cost and limitations of scaling a database, Purdy urged us to avoid using them for what he called ‘high-volume, low-value’ tasks like logging, storing conversational state or doing things that a file-system would be better-off doing. And probably the most important new thing I took from the whole presentation was his rule-of-thumb about when to use a database: only do it if the things you need to store have to be both persistent and transactional.
7. Introduce a single point of bottleneck/Introduce a single point of failure
Pretty straightforward really, so there isn’t much to say other than that prime bottleneck-points can include databases, message queues and web-servers.
6. Abuse Abstractions/Avoid abstractions
Purdy warned of the ability of abstractions to hide real performance costs and make simple things impossible – one of the examples he gave that I think many of us at Shine could relate to are ORMs; in particular, N+1 problems and the inability to execute stored procedures. However, he conceded that there are many useful abstractions that allow orthogonal concerns like caching, security and auditing to be introduced and minimize coding and maintenance. Furthermore, he argued that because method invocation is now so cheap, it is a wrong to oppose such abstractions solely on the basis that the mechanism for introducing them will have a impact on performance.
5. Assume DR can be added when it becomes necessary
At first disaster-recovery seemed a little bit off-topic, but given that it’s something that most of us have to deal with and that it’s usually a requirement that DR have no impact on performance, I can start to see its relevance. Furthermore, he gave some amusing examples of legal DR requirements that were literally impossible due to fundamental physical constraints (like the speed of light).
4. Use a one-size-fits-all architecture
Anyone with any common sense knows that cookie-cutter architecture can be a disaster, especially if they’ve ever been forced to implement a solution grossly unsuited to a problem. Purdy explained how this issue can be exacerbated by unrealistic or poorly-thought out requirements – for example, those containing phrases like ’24×7′, ‘99%’ or just ‘make it like Google’.
3. Use big JVM machine heaps
Purdy began this one with the following unattributed quote: ‘In theory, there is no difference between theory and practice’. He went on to examine a phenomenon that we’ve seen before at Shine: a JVM that takes 30 seconds to GC a 1GB heap when in a lab it’s only supposed to take a fraction of a second. He outlined some of the causes for a problem such as this, including an example of how introducing a single finalizer method into his product managed to knocked %15 off its performance. And finally, he articulated something that we probably all know in our heart of hearts: when it comes GC, sometimes we just don’t know what’s going on.
2. Assume the network works
A trap I’ve fallen into in the past when designing a distributed system is to make unrealistic assumptions about the reliability and speed of the network. Purdy explained that not only is a network of servers only as strong as it’s weakest link, but that when you’ve got a bus being shared by many servers concurrently, they can only use a portion of the maximum theoretical bandwidth on that bus. This is a problem for everything from databases through to blade frames back-planes.
1. Avoid proprietary features/Believe product claims
Whilst many people fear vendor lock-in, it’s not very often that you have to do a wholesale shift from one vendor to the other. And whilst standardization undoubtedly has its benefits, it can also cause a lot of trouble for little benefit (consider, for example, EJB2 vs Hibernate+Spring). That said, Purdy made the point that a vendor is naturally going to pitch their product in the most favorable light, and consequently, the onus falls on us to test those things that matter most to us.
Purdy finished up his presentation by making a point that, whilst perhaps obvious to the more seasoned developers out there, always bears repeating in one form or another: ‘Don’t do good things just because you think they are good things’. I think that as Java developers, our adventures over the last few years with prescriptively-applied patterns (which may become anti-patterns), disastrously over-complicated blueprints and just-plain-horrible EJB 2 would make us particularly receptive to this message when applied to scalability and reliability.
Mark J
Posted at 10:22h, 15 MayI also found his presentation very entertaining, and really the only reason I didn’t get more out of it was that it was the end of a long week. Ben has captured the presentation really well and there are some good points.
Cameron also had this nice quote during my version of the presentation:
“There is never time to do it right, but there is always time to do it over”.