Thursday, November 17, 2011

Contributing to Open Source

I've been working on various open source projects since around 1993. Long before I even really thought of it as open source. It just seemed natural to me to make the fixes I needed and contribute them back. It was always a bit of a challenge to figure out how to get my fixes to the developers. Obviously, they don't know me, so they aren't going to just let me write the files directly. So, I end up sending patches via email or some other means.

Over the years, the process for contributing to projects has gotten easier. Even more recently, it has grown by leaps and bounds thanks to Github.

Case in point. I've been using the twitter bootstrap project for parts of the design of my new company Voost. I like the project a lot. Like millions of other projects, it is hosted on github.

Yesterday, I noticed a small bit of documentation was missing, so I forked the project by clicking a button on the website, created a branch to work on (git checkout -b docadditions), edited the documentation, committed and published my changes and then created a pull request which tells the developers of bootstrap that I have something to contribute:

https://github.com/twitter/bootstrap/pull/647

Mark, one of the developers, who I've never met in my life, was able to take my contributions and combine them with his code by simply clicking a button on the website. Yes, it was that easy.

I also had an enhancement request... so I created an issue...

https://github.com/twitter/bootstrap/issues/646

It was resolved in a few hours with just a small bit of effort. I can then merge his changes into my local fork of the project with a couple easy commands. We stay in perfect sync together.

Bam. That is how collaborative development should work.

As a comparison, in the past, I've done a huge amount of work for the Apache Software Foundation. They have a great open source license, and a huge following. But, they don't use github.

With the ASF, it feels like 1993 again. For each project I want to contribute to, it feels like I'm making a lifetime commitment to that project.

I have to go to the project website and navigate around to figure out how to join a mailing list. This takes several contextual steps in an email client. I need make sure to setup a mail filter to deal with a potentially insane amount of email that I really don't care about. Then, I email a patch to the list (or put it up on gist / pastebin)... and I hope maybe one of the developers might be watching my carefully crafted subject line. Chances are that nobody would respond or the email would get lost, so I'd have to keep nagging people because everyone is busy...

I don't really contribute to the ASF nearly as much anymore.

13 comments:

Karl Fogel said...

Totally agree about ease-of-contribution being key. Projects that impose unnecessary overhead on contributors, especially on drive-by contributors, are going to have worse survival characteristics in the long term than projects that use good infrastructure.

While GitHub has done a great job, this ease isn't unique to GitHub, though. Gitorious, for example, is exactly as easy and in the same way.

(I suppose there still a positive feedback argument: that most developers are already likely to have an account on GitHub, so if all the projects they contribute to are there too, then there's no extra one-time registration overhead. But in practice I've found it easy to register on the small number of sites needed to be able to contribute anywhere -- e.g., https://gitorious.org/statusnet/mainline/merge_requests/168. It's just not a blocker.)

It's kind of analogous to people who think they're Git zealots when really they're distributed version-control zealots who happen to have only tried Git (but not, say, Mercurial). What's great about GitHub is what it does, not the fact that it's GitHub per se.

I actually mildly prefer Gitorious just on the grounds that their system is entirely available as open source, so I know that no matter what happens to them as a hosting platform, I can preserve all the project structure if needed and move somewhere else. This isn't a strong preference, though, because GitHub's APIs -- including git itself, obviously, but also things like the issue tracker API -- make it pretty easy to keep a running local backup and move one's data elsewhere if needed, at not too high a cost. The "freedom penalty" is pretty minimal, even though GitHub itself isn't, alas, free software.

Anyway, thanks for the post. I think you make an important point.

Thomas Koch said...

It even gets more insane if you want to go the whole nine yards and
- create an account in jira
- export the patch to your file system, manually attach it to the jira with your browser
- find out, how you need to name the file so that jenkins gets triggered
- create an account on review.apache.org
- manually upload the patch to review board
- find out how you need to fill the review board inputs so that review board posts to jira
- Do the same dance again, if you want to polish sth. in your patch

Me and others have been pitching Gerrit to the ASF since 2009. But it seems no others feel the same pain as those who are already used to Git...

Jon Scott Stevens said...

Of course, since the ASF is a volunteer organization, if you want to do the work of fixing it, go right ahead. Join a few more mailing lists, have a year or two worth of discussion and if you survive all of that, then maybe you can get to work.

With the recent announce of the ASF taking the source code for Flex, I think that it truly has become the place where projects go to die. As far as I'm concerned, there is no use for having such a large organization to handle source code for active projects any longer. The red tape has gotten too thick.

Developers aren't effective in an environment like that.

Karl Fogel said...

Still, overall what this says to me is:

The ASF should make it as easy to contribute to a project at the ASF as it is to contribute to a project on GitHub, but the ASF doesn't necessarily have to use GitHub itself as the infrastructure for doing that.

IMHO it would be better to use one of the near-clones that are open source. The ASF should not get locked in to a proprietary provider (though at least JIRA bug tracker has XML export, and there are tools, e.g., code.google.com/p/projport for parsing that).

All just talk, though. I haven't been volunteering on the ASF infrastructure committee.

Jon Scott Stevens said...

Well, we used to have this discussion back in the days of CollabNet. The whole free vs. open discussion.

After all these years, I'm really not worried about it anymore. I think it is a non-issue partly because of the way git works and partly because I'm sure that Github has a way to export data (just like Jira). In the end, use the tool which everyone else is using and that tool will persist. As soon as that tool is no longer relevant, deprecate it and move onto the next tool.

The ASF got into the situation it is in now by sitting on the sidelines while everyone else was moving in a different direction. It happens all the time in tech. I'd argue that by drawing a line in the sand and worrying about free vs. open, you are only perpetuating things further.

Karl Fogel said...

What is this "free vs open" dichotomy, by the way? Not a rhetorical question; I keep hearing that phrase, and I've not yet heard an explanation (that I could understand) of what it means.

I don't see what I wrote as drawing lines in the sand; sure you're not reading something that's not there into what I wrote? There are times when it's appropriate to draw such lines, of course, but saying that one prefers Gitorious because the overhead of resuming in the event of provider droppage is less than it would be with GitHub doesn't seem to qualify to me. It's a practical decision, explained purely in practical terms.

I mean, it's not that I don't have ideology, but I didn't even deploy it here :-).

Jon Scott Stevens said...

I don't think I'm doing a good job of explaining myself, so bear with me while I try again.

You are assuming that if Github goes belly up, they won't just donate their source code to a foundation. Clearly the ASF is a likely choice since they have gained a reputation now for being the place to dump dead projects. ;-)

When I say free vs. open, that is effectively saying GPL (free) vs. ASF license (open). Free means that no matter what, the code will always be free. Open means that it may or may not be available in the future. I'm using this term as an analogy to your comment about preferring gitorious over github because you can have the source code to gitorious right now, but not github.

I don't think gitorious is a good solution regardless of the source code being available right now or not. I see gitorious as a dead end. The battle for best service was won by github because the vast majority of people are using it. Much like facebook beat out myspace, tribe.net and friendster by a factor of hundreds of millions of people (and growing). Diaspora is a nice idea, but will it be around in 5 years or be a major contender? Probably not.

Also, say that gitorious does go belly up (which I think will more likely be the case since they are fighting on the losing side right now), are *you* going to work on the source code any further? I suspect the answer is no or not a lot, I'm sure that we all have better things to do with our time. As a result, now that you've sat on that platform, you are again stuck with a dead end solution.

I'd rather go with a hosted solution that is *not* free because at least there is some sort of guarantee that it'll be around. If CollabNet, which has taken a *ton* of investment, has survived for all these years, I find it hard to believe Github, which has taken zero investment, won't be around.

Thus, what real gain do you have by picking the 'free' and arguably less popular solution now?

Unknown said...

I was going to leave a very interesting comment here, but the blogger login failed with Input error: Memcache value is null for FormRestoration on my first attempt, and I lost my comment as a result.

I don't know what to think. Either it means that the barrier is too high for me to contribute to your blog, or that you're right that high barriers to contributions are a bad thing.

(I'll copy this comment to my clipboard before attempting to pass that blogger hurdle a second time).

Karl Fogel said...

"When I say free vs. open, that is effectively saying GPL (free) vs. ASF license (open)."

The usual terms for that distinction are "copyleft" and "non-copyleft", since "free software" is synonymous with "open source software".

In other words: GPL'd software is open source; Apache-licensed software is "free".

For some reason, people get this idea that "free" is a synonym for "copyleft", but it's not -- at least not as used by the FSF, OSI, ASF, CC, and every other organization I can think of that pays non-trivial attention to software licensing. Rather, copyleft software is a subset of free software (and therefore a subset of open source software as well). Same with non-copyleft software.

Anyway, that's all just about the terminology; now you know why I was confused and not seeing the distinction.

Going beyond terminology...

"Free means that no matter what, the code will always be free. Open means that it may or may not be available in the future."

Why would a copyleft vs non-copyleft license affect whether or not the software that powers a site (e.g., GitHub) stays available? Neither kind can be proprietized after the fact. The copyright owners can proprietize future versions, but all the old copies are still out there with the old license on them; anyone can fork from those.

I think the rest of your point -- about long-term survival of a service site being dependent on many factors, not all of which have to do with its underlying software's availability under a free license -- stands. But it's totally unrelated to this "free vs open" non-distinction :-).

One answer to that point is: we see proprietary services get pulled, or made more expensive, all the time. Remember the CDDB debacle, anyone?

The lossage scenario for GitHub is that someone makes them an offer they can't refuse, they take it, and a few years down the road the new owners decide they want to end free hosting for open source projects. Or maybe they take the feature set in a direction most projects don't want to go. Or maybe they shut it down because it's competitive with their other service.

Meanwhile, I have a somewhat higher estimate of the activity level and breadth of support for Gitorious, and think the project's survival charcteristics are higher than you do.

But these are questions of judgment, about which people might disagree based simply on the amount of information they have, or (more likely) their sensitivity to certain risks over others. There isn't one right answer. If you're a social tech startup, GitHub might be right for you; if you're a medical device manufacturer, Gitorious might be right, because the chances that (say) some regulation will kick in and require you to do your own hosting for auditing purposes are much higher -- and then you'll be glad you can just grab the Gitorious code and do that, without disturbing anyone's workflow.

Karl Fogel said...

By the way, the reason I'm going into so much analysis is that this isn't academic for the ASF (where this discussion started).

If the choice were between some special arrangement with GitHub, whereby ASF can piggyback on their infrastructure, versus achieving GitHub-style functionality by using Gitorious's free software, I'd certainly prefer the latter. IMHO the ASF should avoid closed-source infrastructure dependencies, for practical reasons as well as philosophical ones.

I'm pretty sure you'd go the other way on that specific choice, though :-).

Shane Curcuru said...

The real point isn't technology, it's community.

The ASF truly is about community over code. So when you say "drive-by contributors", I say "nice, but not the prime mover of what I want to see".

Yes, I agree, the overall processes should be easier, and especially should be documented/described much better for participating in various projects at the ASF. But personally, I'm not focused on the latest and greatest possible thing that someone, somewhere has come up with. I'm focused on communities who care about their projects and want them to last - for a long time.

So please, let's just agree to disagree, and we can each use our own favorite tools, OK?

Jon Scott Stevens said...

Thanks for your feedback Shane. Sure, I agree to disagree. That's why I don't contribute to the ASF as much anymore.

The attitude that the ASF is trying to 'build community' by creating barriers to entry is exactly the type of thing that drove me and several other early members away from the ASF. IMHO, that attitude comes from a bit of too much corporate mentality.

Enabling people to contribute easily is what builds community. Lower the barrier to entry. We are all busy. We don't have time or energy to invest in jumping over hoops and adding some more documentation or cleaning up the websites isn't enough.

That said, using better tools is the real solution. You don't give someone a hand saw to build a house because that is just like going back to the days before we had electricity, you give someone an electric saw because that is a lot more efficient.

This is the facebook generation. There are a lot of smart kids, who will be replacing us eventually, doing a lot of development. It is no wonder that there really isn't a Ruby presence in the ASF like there was one for Java so long ago. Kids these days don't want to be encumbered by the red tape of the ASF when they have a nice shiny toy over in github land.

WillGH said...

As a small dev shop making a switch from svn to git, we've been facing the same issue. Git by itself was a mess. I could never figure out where the right branch of code was hosted. The learning curve for new devs is high. We've tried both gerrit and gitorious. That helps, but neither have the same ease of use as github. (Right now we are on gitorious). Right now I send new users to github's docs.