Wednesday, April 6, 2011

The Subversion Mistake

At my workplace, when I first got here, we were doing the waterfall method of development. We would do 3 months worth of hard work, with thousands of commits and tons of features. Then we would hand things off to QA to test. Spend a few more weeks bug fixing. Then when things were 'certified' by QA, we would spend a whole weekend (and the next week) doing the release and bug fixing in production. Then the cycle would repeat again. Ugly.

Now, based on feedback I gave, we use an iterative approach to development. It is extremely flexible and has allowed us to increase the rate of releases to our customers as well as the stability of our production environment. We did about 9 iteration releases in 3 months, our customers got features more quickly and we have fewer mid-week critical bug fixing patches. Everyone from the developers all the way up to the marketing department loves that we have become more agile as a company.

In order to support this model of development, we had to change the way we use subversion. Before, we would have a branch for the version in production and do all of our main work on trunk. We would then copy any bug fixes to that branch and do a release from there. I reversed and expanded that model.
  • Trunk is now what is in production.
  • Main development and bug fixes happen on numbered iteration branches (iteration-001, iteration-002, etc.)
  • Features happen on branches named after the feature. (foobranch, barbranch, etc)
  • Each iteration branch is based off of the previous iteration. If a checkin happens in iteration-001, it is merged into iteration-002. (ex: cd iteration-002; svn merge ^/branches/iteration-001 .)
  • No commits happen directly to trunk, only merges. For midweek releases, we cherry pick individual commits from an iteration branch to trunk. (ex: cd trunk; svn merge -c23423 ^/branches/iteration-001 .)
  • Feature branches are based off of an iteration.
Unfortunately, we are quickly learning that subversion does not support this model of development at all and I've had to become a subversion merge expert.

One reason for this is that every time we cherry pick a change from an iteration to trunk, we also need to merge trunk back into the iteration. This is so that the mergeinfo is recorded properly and to make --reintegrate work when we decide to 'close' the iteration. When iteration-001 is closed, I then cd iteration-002; svn merge --record-only ^/trunk to 'reset' the pointer at iteration-002 now. If I don't do this, trunk quickly gets out of sync with an iteration and subversion makes merging a nightmare of conflicts. It shouldn't be this way, but it is.

Another reason is that any feature branch that spans more than one iteration does not have its mergeinfo tracked properly. For example, I have a branch called 'foo'. It is based off of iteration-001 and kept up to date with development (cd foo; svn merge ^/branches/iteration-001 .). At some point, iteration-002 is created and people are committing to it. Also, iteration-001 is routinely merged into iteration-002.

The issue is that the mergeinfo for my branch foo knows nothing about the mergeinfo contained in iteration-002. Thus, if I try to 'upgrade' my branch foo to iteration-002, subversion will try to merge from the start of iteration-002 all the way to HEAD. This obviously won't work because it will try to re-apply changes that have already been applied to iteration-002 and conflict madness will ensue.

The only solution I've been able to come up with for this problem is to just use revision numbers when doing that first merge of iteration-002 into my branch foo. This is completely counter intuitive to having merge tracking. Therefore, I take the point where trunk was 'reset' into iteration-002 and do the merge manually with revision numbers. After that, I also need to cleanup the mergeinfo to make sure it is solid and future mergeinfo merges will work. (cd mybranch; svn merge -r2334:HEAD ^/branches/iteration-002 .; svn merge --record-only ^/branches/iteration-002; svn ci).

I've watched the video from Linus about git. I'm a total convert, the subversion developers really screwed the pooch with the choices they made. (Sidenote: I worked at CollabNet during this period and was actually at some of the discussions.) The issue for me now is that in a corporate environment with 25+ developers distributed all around the world, just switching to git is not an easy task. It isn't like I can just take a 2gig repository of files and import it into git and tell people to switch. Training people who barely know subversion on how to use git is a daunting issue. We also have a lot of prior integration with subversion, such as our issue tracking, reviewboard, commit emails, commit hooks, etc. that would all need to be re-integrated.

So, for now, I keep documenting all of these subversion gotcha's in our wiki while exploring and learning more in my not-so-spare time about how to migrate to git. I'm writing this post with the hope that if someone else reads it in time, they will choose to use git instead of making the subversion mistake.

12 comments:

Thomas Koch said...

For nice integration of code review and GIT have a look at Gerrit:

http://alblue.bandlem.com/2011/02/gerrit-git-review-with-jenkins-ci.html

Stephen Haberman said...

My sympathies. FWIW, I transitioned an enterprise team to git and wrote various scripts/hooks to facilitate the move and partially recreate the "central" aspect that is inherent for an internal team working against one master copy of the code (e.g. adding commit numbers as aliases for hashes).

You might find them interesting, if anything as examples to copy/paste from in your own integration:

https://github.com/stephenh/git-central/tree/master/server

The most fun one, to me, was an update hook that would enforce "stable" (the trunk in your new svn model) could only move by a merge from a feature branches and those merges could not introduce any changes. E.g. the feature branch had to be all the way up to date (and implied QA'd at that state) with stable before it could be released.

Unfortunately, I'm in a job using hosted svn these days, so not actively using git-central. Thank god for git-svn.

Adam Monsen said...

Oi, good luck. I feel your pain.

You've heard all the rest of this before, I'm sure. Anyway...

I migrated a large Subversion repository (15k or so commits) used actively by about 15 people in about 7 countries in disparate timezones.

I found migrating svn 1.5+ mergeinfo and svnmerge.py metadata non-starters. Once I gave that up, migration was easy. I migrated 2 important release maintenance branches and trunk, creating three new git repositories. Git makes it easy to copy changes between repositories, so having 3 separate repositories wasn't an issue. The 2 release maintenance branches (now repositories) eventually became dormant anyway.

Yes, there's a steep learning curve with git. Particularly: the dVCS paradigm and the git command line interface. Everything else is pretty much standard SCM. The Mifos developers are capable and patient, so they did well with the migration. Several even preferred git and welcomed the change. They certainly loved the blazing speed and smaller client-side disk footprint.

I did the documentation first, then a few test migrations, then pulled the trigger. I performed the migration during heavy development, without interrupting development. I did not support a "svn compatibility layer", I just said we needed Git to move faster as a team. And it truly helped us do so! After migrating, our commits/month went from less than 100 to over 500 (just discovered that with gitstats).

Now I think I'll just rub it in and tell you how, once we were on git, every day is like my birthday. It's fast and fun like any dVCS, but I found it to have an extremely large, active, helpful community too. The documentation is great: shipped manpages and Pro Git FTW. Git is very scriptable. There are integrations galore. It is consistent and secure: it helped me find a bug in the encrypted filesystem I use for my homedir. You can get as creative as you want with your workflow.

The end.

Yuval A said...

We've had the exact same issues with SVN, and recently completed a successful migration to git.

Not looking back :)

Gert said...

What about Mercurial? (See http://mercurial.selenic.com/). Not as super-powerful as git, but most likely does fulfil your branching and merging needs as described in your post. But a lot easier to learn than git, especially for subversion developers.

Joel Spolsky wrote a perfect beginners (convert) guide at hginit.com.

I've been using Mercurial now for over a year and I've become a real fan :-).

Ian said...

What Gert said. Another vote for Mercurial as really easy to learn coming from Subversion.

aspnair said...

You should definitely take a look at agile methodologies especially scrum. They are especially for these types of scenarios. Try implementing it for your project.
http://en.wikipedia.org/wiki/Scrum_(development)

http://www.goodagile.com/scrumprimer/scrumprimer.pdf

sashang said...

I read the first few paragraphs, understood what your workflow was and pre-empted your conversion to git.

Dutch Rapley said...

I know what you're saying. We use svn at work, but lately I've switched to git locally. I'm using git-svn to handle updates and commits. I have local git branches that are tied to remote svn brances

If the same person does all the merging every time (you?), then you could use git locally to handle this. When you're cherry picking your commits, you could use git in the meeting and maybe it'll pique some interest.

And if some take an interest, they can learn by doing with http://gitimmersion.com/

elarson said...

Have you thought about doing iterative development in another system? Instead of moving everything whole hog, you could instead try one itereation or one feature using something like git or mercurial. if you are the one managing the other devs it might be worthwhile to talk to some subset or team and see if they would be willing to be guinea pigs. They could probably working updating most of the workflow related aspects (commit hooks, tickets, etc.) and start getting documentation in place so it is not a matter of you having to teach the rest of the team.

Merill said...

Moving to Mecurial is a lot easier though.

Foudres said...

If migration and formation is a major concern, i would consider using mercurial instead of git. It has better windows integration and is far simpler to use, especially if you come from the subversion word.