Wednesday, November 11, 2009

Something more scalable

Warning, this is a geeky post.

At work, we use JBoss 4, EJB3, Hibernate. 3 servers in replicated transactional cluster mode (meaning anytime a database object changes, it gets replicated to each of the servers). It has been working very well for the past couple of years. However, there has been a few things that have been bugging me. Any time we had clustered api change, we had to shut down the entire cluster, deploy the changes and then bring it back up again. The minute of downtime was acceptable enough to the company that we didn't bother with two identical clusters and switching the load balancer between them.

We also had some issues with deploying multiple .ear files in each of the app servers. The .ear files all shared the same classloader (ear isolation was off). This means that any jar files in the first .ear file loaded would become the classes that are used by the JVM. So, if you had a shared .jar file across the .ear files, you may run into version compatibility issues if you don't update the .jar file in each of the .ear files. Lame.

This has been ok because we have workarounds, but not optimal because there is a better way of doing things. It just takes a bit of work and a deep understanding of the system to make it all happen. So, about a week ago, I got a bug up my butt to finally fix all of this. I first turned on ear isolation. However, this had the side effect of screwing with the clustering because hibernate and the clustering code lives higher in the classloader tree than our public API code that was being serialized across the network. Thus, Hibernate couldn't find our classes and epic classloader fail ensued. This made me realize that we should also move away from the painfully slow JBoss Cache 1.x (which comes with JBoss4) and switch to Ehcache. Ehcache would live in the ear classloader, so if there was serialization, it wouldn't be a problem. Ehcache has the following benefits over JBoss Cache: uses an invalidation model (which mostly negates the serialization problems), massively faster than JBoss Cache 1.x, easily configured with hibernate, singleton based and extensive JMX statistics.

Needless to say, after a few trials and failures (mostly related to runtime dependency issues and other oddities), I've now got all of our servers running cleanly on this redesigned system. We no longer have to take down the entire cluster, speed has been increased greatly and as I tune the cache (now that I have real stats of what is going on) database hits will go down. Sure it was a very complicated procedure to execute, but I'm happy it is done now and we can move forward on a simplified platform.