Blogging was where we began, and how we built our company so we have preserved this archive to show how our thinking developed over a decade of developing the use of social technology inside organisations

Coding at Headshift: our experience with Version Control Systems

by Riccardo Cambiassi

This is the first post we’d like to dedicate to the exploration and sharing of the basic technical setup and the development workflow here at Headshift.
This article is dedicated to Git, our version control system of choice and primary backbone of our development workflow.

Version Control

Version Control refers to the management of changes to documents, programs, and other information stored as computer files. It is a fairly common (and highly recommended practice) in software development, where a team of people may change the same files. Change sets are associated with a timestamp and the person making the change, so that it’s possible to play back the story of a document and properly attribute the contributions. Revisions can be compared, restored, and depending on the types of files, merged.

You may recognize this idea and the related practices (browsing history, versions and authors) from experience with wiki applications (and, to an extent, blog engines); as a matter of fact, the founding concepts are the same.

Version Control Systems for software development can be either centralized or distributed. When I joined Headshift, a several years ago, we were using subversion, a pretty common example of the earlier sort. Although it got the job done, over time we accumulated a number of concerns that pushed us toward a distributed alternative.

Distributed Version Control Systems are relatively new, taking a peer-to-peer approach, as opposed to the client-server approach of centralized systems.
They have a number of clear advantages and, at first glance, a few potential disadvantages when compared to the centralized approach; what follow is a short list of the characteristics that convinced us to transition to DVCS, and to Git specifically. We have been using git for 2 years now, and it’s a choice we didn’t regret.

Safe for offline or remote work

Every collaborator involved in a project has a copy of the repository that is self contained and independent from the others. This means that users can work productively even when not connected to a network, or failing to connect to the workplace VPN. This was paramount for us, since we can’t always count on having all members of the team at the HQ. Moreover, it gives us the option of turning London’s usually extended commuting time somehow productive.

Improved speed due to less network traffic

This was not a deal breaker given the average size of our projects, but a nice to have indeed. It also encourages small, atomic commits, that in the long run improve the project maintainability.

Allows private (but still versioned) work

This feature was highly desirable, as it allowed for individuals to pursue speculative development without having to worry about locking resources from other team members. At the same time, all work done locally is still versioned, so it’s always possible to easily rollback when hitting a dead end.

Avoids relying on a single physical machine

It’s very nice to know that you can use any local working repository to reconstitute the master in case of system failure, or in the rare cases when a broadband failure cuts you off the central repository.

Still permits centralized control of the “release version” of the project

We liked the idea of being able to shape the workflow around our specific needs (rather than the other way around) through the use of conventions.

With these expectations set, we had a look at the available options and picked git

Why Git?

There are a few DVCS available out there. Git is one, while other common choices include Bazaar and Mercurial. They all share similar characteristics and are available for free, so I think it’s important to point out why we chose this tool above the others.

For a quick intro to Git, you can take a look at learn.github.com, while for a more high level rationale you may want to watch this video of a talk by Linus Torvalds on the subject.
Linus Torvalds is actually to be blamed for git in the first place, as he started its development as a tool to manage the Linux Kernel.
With such a scope in mind, you may argue that git would be overkill for managing web applications. As it turns out, its speed and flexibility fit our development needs very well. Moreover, reviewing our experience so far, I reckon that what really sold us on it were a few delightful design details and reassuring human factors.

Support for svn “switchers”

The Git – SVN crash course was probably the first webpage I read about git.
I was at the time a bit worried about the changes involved in moving from one system (subversion) to the other. That page, in a few lines, demonstrated that, no matter how different the two system were from philosophy to implementation, the transition would have been pretty smooth and the learning curve not that hard (mind you, getting pro on git is a totally different matter). Also, git provides a bidirectional flow of changes between a Subversion and a git repository, which was reassuring. As a matter of fact, we never felt the need to use that feature. Ever.

Branching affordance

Creating new branches and merging back feels so lightweight compared to other systems that I immediately embraced that practice. The relief of being able to experiment safely at no expense of time is priceless.
Also, the visualization tools available for git made it easy and fun to trace the status of the working repositories, giving an immediate feeling of the pace and direction of development.

Staging area

One of the unique features of Git is that it provides an intermediate step between the working files (where work is done) and the repository (where files are safely stored and versioned). This is called the “index” or “staging area”. It is useful as a way to collect references to the files you’ve been working on as soon as they are available, and thus have control on what to eventually commit to the repository (you can even stage specific lines instead of the whole file if you’re into that level of detail).

Ruby friendliness

Over the last few years Git became the VCS of choice of many rubyists, also thanks to the growing Github community. Most of the tools and projects we follow and use everyday are actually hosted there (Ruby on Rails, Capistrano, Cucumber just to name a few), so happily ended up having to deal with just one version control system, which feels good
For a more reasoned and scientific defence of Git’s virtues, have a look at Why Git is Better than X.

Git at Headshift

So, now that we have it, how do we use Git at Headshift? As you’ll read in the next few paragraphs, it became central and influenced (for the better) the way we do things, supporting and encouraging better practices and a leaner, more sound workflow.
It all starts with the management of our code repository.

Headshift Code Repository

All our project codebases (and something more) are kept in a central repository, guarded by voodoo magic, math crunching zombies and cryptographic demons that only our brave system administrators dare to tame.
Code distribution to the developers happens safely encrypted over ssh, and access to the individual repositories is granted only to recognised identities.
For those of you who played with it, the system is not unlike GitHub’s, although less dramatic and fancy: we use Gitosis to manage authorization and access to the code.
The cool thing about Gitosis, is that it “behaves” just as yet another git project on your server; this means that any user whose identity has sufficient rights can clone it and then configure new projects, grant or remove access to other users, all without leaving their editor and, more important, without needing privileged access to the server machine. We even have a web interface to browse the repositories: that’s a pretty plain Gitweb install that gives us just the right level of details to keep a sense of what’s going on, without having to clone and explore the code in depth.

A note on workflow

From a workflow perspective, development happens always locally and we commit tested, working code to the master branch on the shared repository.
From there, we deploy (using a library of custom Capistrano recipes) to staging and production servers.
It is a very simple workflow, but it proved good enough in most cases. There are exceptions, of course, I wrote about them under Open Discussions below.

Along with project codebases, we are experimenting with Git and Capistrano as a way to manage system configuration files.
Again, the cool thing here is to be able to tweak the webserver configuration without having to leave the project context (i.e. the code editor) and without the need for privileged access to the application server.
Other tools somehow tied to the Code Repository that we’ll try to explore in future articles are Pivotal Tracker (that we use to do story-based project planning) and the continuous integration server.

Personal development environment

Although each of us has a slightly different setup, to better fit their programming style, most of us make use of the following tools when dealing with git. You will find them to be quite Mac centric:

    • Git TextMate Bundle \- makes TextMate editor talk with git repositories
    • GitX – very nice git GUI with extra eye-candy and smooth integration with Mac OS X look’n’feel.
    • Git command line aliases – this is where things get personal.
      These are my current aliases. I also like to show the current branch in my prompt.

Open discussion

I’d like to be able to say that the workflow we follow in Headshift is rock solid and no question has been risen about it in more then two years. Actually, there are a few topics where we’re still discussing and no standard convention has been agreed yet:

      • Branch layout – as written above, we like to reference all our development in the master branch, and at the same time keep it always tested and ready for deployment. That just feels right. However, this approach gave us a little headache when, in the past, we had some already developed features in the master that could not be deployed to our staging environment, and yet a hotfix had to be developed and applied straight away.
      • Use of tags – they’re cute, lightweight and descriptive, but I can’t say we’ve been using many of them.
      • Rebase vs. Merge – This is one of the big ideological arguments in the git community. Merge supporters like the fact of being able to keep trace of all atomic, incremental addition to the repository, while rebase supporters like having tidy, minimal aggregated changesets descending from “contained” development efforts.

If you feel like contributing your point of view, please do so in the comments below

Resources

Finally, here are a few more pointers for those looking for more specific info about git

3 Responses to Coding at Headshift: our experience with Version Control Systems

  1. By Panos Kontopoulos on March 10, 2010 at 8:14 pm

    Great post!
    Quick question: How do you manage projects incorporating a database repository like a Drupal CMS project, where half of the action happens in the database.
    Do developers also host local databases (sync?) or work with local files connected to a shared central remote database?
    Again thanx for sharing all this valuable stuff !

  2. By Chris Adams on March 11, 2010 at 3:26 pm

    Hi Panos
    Drupal’s habit of storing so much stuff in the database was a bone of contention when we started out, and every now and then it still does trip us up – what we’ve mainly found ourselves doing is a mixture of the two approaches you mentioned.
    When we work on Drupal projects we’ll normally work in same room as each other, on an internal development database, in regular contact as we work, but if we want to try something new that we think might cause problems with syncing, we’ll change tack slightly.
    Normally, we’ll pull down a local dump of the database and work on that, until we’re fairly sure it won’t break the app – we try where possible to create functionality as swappable themes or modules, so once it works locally, we can then activate it on our shared testing setup.
    We’ve had fairly good results with using Capistrano to deploy Drupal and WordPress sites too, and since Drush became popular in the Drupal community, writing more involved deployment recipes is much less daunting.
    If you’re interested in using Capistrano to automate deployments, it’s worth checking out this blog post by Stuart Eccles at Made by Many , and this one by Duncan Robertson.

  3. By Panos Kontopoulos on March 11, 2010 at 9:36 pm

    Once again, thanx for the great info you shared!
    Keep up the good work :-)