Wednesday, November 11, 2009

Something more scalable

Warning, this is a geeky post.

At work, we use JBoss 4, EJB3, Hibernate. 3 servers in replicated transactional cluster mode (meaning anytime a database object changes, it gets replicated to each of the servers). It has been working very well for the past couple of years. However, there has been a few things that have been bugging me. Any time we had clustered api change, we had to shut down the entire cluster, deploy the changes and then bring it back up again. The minute of downtime was acceptable enough to the company that we didn't bother with two identical clusters and switching the load balancer between them.

We also had some issues with deploying multiple .ear files in each of the app servers. The .ear files all shared the same classloader (ear isolation was off). This means that any jar files in the first .ear file loaded would become the classes that are used by the JVM. So, if you had a shared .jar file across the .ear files, you may run into version compatibility issues if you don't update the .jar file in each of the .ear files. Lame.

This has been ok because we have workarounds, but not optimal because there is a better way of doing things. It just takes a bit of work and a deep understanding of the system to make it all happen. So, about a week ago, I got a bug up my butt to finally fix all of this. I first turned on ear isolation. However, this had the side effect of screwing with the clustering because hibernate and the clustering code lives higher in the classloader tree than our public API code that was being serialized across the network. Thus, Hibernate couldn't find our classes and epic classloader fail ensued. This made me realize that we should also move away from the painfully slow JBoss Cache 1.x (which comes with JBoss4) and switch to Ehcache. Ehcache would live in the ear classloader, so if there was serialization, it wouldn't be a problem. Ehcache has the following benefits over JBoss Cache: uses an invalidation model (which mostly negates the serialization problems), massively faster than JBoss Cache 1.x, easily configured with hibernate, singleton based and extensive JMX statistics.

Needless to say, after a few trials and failures (mostly related to runtime dependency issues and other oddities), I've now got all of our servers running cleanly on this redesigned system. We no longer have to take down the entire cluster, speed has been increased greatly and as I tune the cache (now that I have real stats of what is going on) database hits will go down. Sure it was a very complicated procedure to execute, but I'm happy it is done now and we can move forward on a simplified platform.


SteveL said...

Surely that's the JBoss "everything is shared" classloader at work? You can turn that off somehow and are left with the problem of getting commons-logging back ends in at the level of commons-logging, JDBC drivers in high enough up for the DB engine to find them...

Jon Scott Stevens said...

JBoss has a multi tiered classloader. I'm fine with it being shared across all of my .ear files (server/default/lib/*), but the big issue is that hibernate (and jgroups and jboss cache) lives in there.

So, if you are trying to replicate your domain objects in a cluster, hibernate (et al) can't find them because it is in a classloader higher up the chain than your own objects.

The end solution is to put the caching layer into your own classpath so that each ear file has its own cache instance and to turn on invalidation so that there isn't any replication going on.

There is one more side issue which I need to resolve which is that I'd like to be able to turn on Disk based caching as well for my hibernate objects. We use @Enumerated enum's in hibernate and by default hibernate serializes the enums as objects instead of the toString() representation. Since hibernate is in a higher classloader, it can't rehydrate the objects from a disk based cache because the objects aren't in the same classloader. If I define my own @Type(type="enum") and implement my own hydration using toString/valueOf, that will solve this problem.

Galder ZamarreƱo said...

Jon, you can always isolated the classloader for your app and use a different JGroups/JBoss Cache or even Hibernate version. Here's one such example:

Btw, maybe time to move to JBoss 5? :)

Jon Scott Stevens said...

Unfortunately, I can't because we use the 'supported' EAP version of JBoss. You aren't supposed to change components at all in that version, otherwise you lose your support contract from Redhat.

We are running with full classloader isolation on and call by value. That was another recent change I made. It has vastly improved things for us because now ears with different jar files don't conflict.

As for jboss 5, that is a direction we are moving in at work, but it will take quite a bit of porting of our code (classpath changes, etc) to make it work. Part of the porting issue was that we were injecting the TreeCache into our beans and you can no longer do that with 5 (lame).

I'm *very* happy with ehcache now that I have it all setup and working well. We have JMX monitoring of it in cacti, it is very fast, it works well in our cluster. All of the issues we were having before with jboss cache are gone now.

Yev said...

Jon, have you experimented with memcached instead of ehcache, I don't have the statistics on hand at the moment, but I believe that would provide you with an even bigger performance boost then ehcache.

Jon Scott Stevens said...

I seriously doubt it given how we use it. It also serves a different purpose.