Wednesday, April 6, 2011

The Subversion Mistake

At my workplace, when I first got here, we were doing the waterfall method of development. We would do 3 months worth of hard work, with thousands of commits and tons of features. Then we would hand things off to QA to test. Spend a few more weeks bug fixing. Then when things were 'certified' by QA, we would spend a whole weekend (and the next week) doing the release and bug fixing in production. Then the cycle would repeat again. Ugly.

Now, based on feedback I gave, we use an iterative approach to development. It is extremely flexible and has allowed us to increase the rate of releases to our customers as well as the stability of our production environment. We did about 9 iteration releases in 3 months, our customers got features more quickly and we have fewer mid-week critical bug fixing patches. Everyone from the developers all the way up to the marketing department loves that we have become more agile as a company.

In order to support this model of development, we had to change the way we use subversion. Before, we would have a branch for the version in production and do all of our main work on trunk. We would then copy any bug fixes to that branch and do a release from there. I reversed and expanded that model.
  • Trunk is now what is in production.
  • Main development and bug fixes happen on numbered iteration branches (iteration-001, iteration-002, etc.)
  • Features happen on branches named after the feature. (foobranch, barbranch, etc)
  • Each iteration branch is based off of the previous iteration. If a checkin happens in iteration-001, it is merged into iteration-002. (ex: cd iteration-002; svn merge ^/branches/iteration-001 .)
  • No commits happen directly to trunk, only merges. For midweek releases, we cherry pick individual commits from an iteration branch to trunk. (ex: cd trunk; svn merge -c23423 ^/branches/iteration-001 .)
  • Feature branches are based off of an iteration.
Unfortunately, we are quickly learning that subversion does not support this model of development at all and I've had to become a subversion merge expert.

One reason for this is that every time we cherry pick a change from an iteration to trunk, we also need to merge trunk back into the iteration. This is so that the mergeinfo is recorded properly and to make --reintegrate work when we decide to 'close' the iteration. When iteration-001 is closed, I then cd iteration-002; svn merge --record-only ^/trunk to 'reset' the pointer at iteration-002 now. If I don't do this, trunk quickly gets out of sync with an iteration and subversion makes merging a nightmare of conflicts. It shouldn't be this way, but it is.

Another reason is that any feature branch that spans more than one iteration does not have its mergeinfo tracked properly. For example, I have a branch called 'foo'. It is based off of iteration-001 and kept up to date with development (cd foo; svn merge ^/branches/iteration-001 .). At some point, iteration-002 is created and people are committing to it. Also, iteration-001 is routinely merged into iteration-002.

The issue is that the mergeinfo for my branch foo knows nothing about the mergeinfo contained in iteration-002. Thus, if I try to 'upgrade' my branch foo to iteration-002, subversion will try to merge from the start of iteration-002 all the way to HEAD. This obviously won't work because it will try to re-apply changes that have already been applied to iteration-002 and conflict madness will ensue.

The only solution I've been able to come up with for this problem is to just use revision numbers when doing that first merge of iteration-002 into my branch foo. This is completely counter intuitive to having merge tracking. Therefore, I take the point where trunk was 'reset' into iteration-002 and do the merge manually with revision numbers. After that, I also need to cleanup the mergeinfo to make sure it is solid and future mergeinfo merges will work. (cd mybranch; svn merge -r2334:HEAD ^/branches/iteration-002 .; svn merge --record-only ^/branches/iteration-002; svn ci).

I've watched the video from Linus about git. I'm a total convert, the subversion developers really screwed the pooch with the choices they made. (Sidenote: I worked at CollabNet during this period and was actually at some of the discussions.) The issue for me now is that in a corporate environment with 25+ developers distributed all around the world, just switching to git is not an easy task. It isn't like I can just take a 2gig repository of files and import it into git and tell people to switch. Training people who barely know subversion on how to use git is a daunting issue. We also have a lot of prior integration with subversion, such as our issue tracking, reviewboard, commit emails, commit hooks, etc. that would all need to be re-integrated.

So, for now, I keep documenting all of these subversion gotcha's in our wiki while exploring and learning more in my not-so-spare time about how to migrate to git. I'm writing this post with the hope that if someone else reads it in time, they will choose to use git instead of making the subversion mistake.