This is the first post we’d like to dedicate to the exploration and sharing of the basic technical setup and the development workflow here at Headshift.
This article is dedicated to Git, our version control system of choice and primary backbone of our development workflow.
Version Control refers to the management of changes to documents, programs, and other information stored as computer files. It is a fairly common (and highly recommended practice) in software development, where a team of people may change the same files. Change sets are associated with a timestamp and the person making the change, so that it’s possible to play back the story of a document and properly attribute the contributions. Revisions can be compared, restored, and depending on the types of files, merged.
You may recognize this idea and the related practices (browsing history, versions and authors) from experience with wiki applications (and, to an extent, blog engines); as a matter of fact, the founding concepts are the same.
Version Control Systems for software development can be either centralized or distributed. When I joined Headshift, a several years ago, we were using subversion, a pretty common example of the earlier sort. Although it got the job done, over time we accumulated a number of concerns that pushed us toward a distributed alternative.
Distributed Version Control Systems are relatively new, taking a peer-to-peer approach, as opposed to the client-server approach of centralized systems.
They have a number of clear advantages and, at first glance, a few potential disadvantages when compared to the centralized approach; what follow is a short list of the characteristics that convinced us to transition to DVCS, and to Git specifically. We have been using git for 2 years now, and it’s a choice we didn’t regret.
Safe for offline or remote work
Every collaborator involved in a project has a copy of the repository that is self contained and independent from the others. This means that users can work productively even when not connected to a network, or failing to connect to the workplace VPN. This was paramount for us, since we can’t always count on having all members of the team at the HQ. Moreover, it gives us the option of turning London’s usually extended commuting time somehow productive.
Improved speed due to less network traffic
This was not a deal breaker given the average size of our projects, but a nice to have indeed. It also encourages small, atomic commits, that in the long run improve the project maintainability.
Allows private (but still versioned) work
This feature was highly desirable, as it allowed for individuals to pursue speculative development without having to worry about locking resources from other team members. At the same time, all work done locally is still versioned, so it’s always possible to easily rollback when hitting a dead end.
Avoids relying on a single physical machine
It’s very nice to know that you can use any local working repository to reconstitute the master in case of system failure, or in the rare cases when a broadband failure cuts you off the central repository.
Still permits centralized control of the “release version” of the project
We liked the idea of being able to shape the workflow around our specific needs (rather than the other way around) through the use of conventions.
With these expectations set, we had a look at the available options and picked git
There are a few DVCS available out there. Git is one, while other common choices include Bazaar and Mercurial. They all share similar characteristics and are available for free, so I think it’s important to point out why we chose this tool above the others.
For a quick intro to Git, you can take a look at learn.github.com, while for a more high level rationale you may want to watch this video of a talk by Linus Torvalds on the subject.
Linus Torvalds is actually to be blamed for git in the first place, as he started its development as a tool to manage the Linux Kernel.
With such a scope in mind, you may argue that git would be overkill for managing web applications. As it turns out, its speed and flexibility fit our development needs very well. Moreover, reviewing our experience so far, I reckon that what really sold us on it were a few delightful design details and reassuring human factors.
Support for svn “switchers”
The Git – SVN crash course was probably the first webpage I read about git.
I was at the time a bit worried about the changes involved in moving from one system (subversion) to the other. That page, in a few lines, demonstrated that, no matter how different the two system were from philosophy to implementation, the transition would have been pretty smooth and the learning curve not that hard (mind you, getting pro on git is a totally different matter). Also, git provides a bidirectional flow of changes between a Subversion and a git repository, which was reassuring. As a matter of fact, we never felt the need to use that feature. Ever.
Creating new branches and merging back feels so lightweight compared to other systems that I immediately embraced that practice. The relief of being able to experiment safely at no expense of time is priceless.
Also, the visualization tools available for git made it easy and fun to trace the status of the working repositories, giving an immediate feeling of the pace and direction of development.
One of the unique features of Git is that it provides an intermediate step between the working files (where work is done) and the repository (where files are safely stored and versioned). This is called the “index” or “staging area”. It is useful as a way to collect references to the files you’ve been working on as soon as they are available, and thus have control on what to eventually commit to the repository (you can even stage specific lines instead of the whole file if you’re into that level of detail).
Over the last few years Git became the VCS of choice of many rubyists, also thanks to the growing Github community. Most of the tools and projects we follow and use everyday are actually hosted there (Ruby on Rails, Capistrano, Cucumber just to name a few), so happily ended up having to deal with just one version control system, which feels good
For a more reasoned and scientific defence of Git’s virtues, have a look at Why Git is Better than X.
Git at Headshift
So, now that we have it, how do we use Git at Headshift? As you’ll read in the next few paragraphs, it became central and influenced (for the better) the way we do things, supporting and encouraging better practices and a leaner, more sound workflow.
It all starts with the management of our code repository.
Headshift Code Repository
All our project codebases (and something more) are kept in a central repository, guarded by voodoo magic, math crunching zombies and cryptographic demons that only our brave system administrators dare to tame.
Code distribution to the developers happens safely encrypted over ssh, and access to the individual repositories is granted only to recognised identities.
For those of you who played with it, the system is not unlike GitHub’s, although less dramatic and fancy: we use Gitosis to manage authorization and access to the code.
The cool thing about Gitosis, is that it “behaves” just as yet another git project on your server; this means that any user whose identity has sufficient rights can clone it and then configure new projects, grant or remove access to other users, all without leaving their editor and, more important, without needing privileged access to the server machine. We even have a web interface to browse the repositories: that’s a pretty plain Gitweb install that gives us just the right level of details to keep a sense of what’s going on, without having to clone and explore the code in depth.
A note on workflow
From a workflow perspective, development happens always locally and we commit tested, working code to the master branch on the shared repository.
From there, we deploy (using a library of custom Capistrano recipes) to staging and production servers.
It is a very simple workflow, but it proved good enough in most cases. There are exceptions, of course, I wrote about them under Open Discussions below.
Along with project codebases, we are experimenting with Git and Capistrano as a way to manage system configuration files.
Again, the cool thing here is to be able to tweak the webserver configuration without having to leave the project context (i.e. the code editor) and without the need for privileged access to the application server.
Other tools somehow tied to the Code Repository that we’ll try to explore in future articles are Pivotal Tracker (that we use to do story-based project planning) and the continuous integration server.
Personal development environment
Although each of us has a slightly different setup, to better fit their programming style, most of us make use of the following tools when dealing with git. You will find them to be quite Mac centric:
- Git TextMate Bundle \- makes TextMate editor talk with git repositories
- GitX – very nice git GUI with extra eye-candy and smooth integration with Mac OS X look’n’feel.
- Git command line aliases – this is where things get personal.
These are my current aliases. I also like to show the current branch in my prompt.
I’d like to be able to say that the workflow we follow in Headshift is rock solid and no question has been risen about it in more then two years. Actually, there are a few topics where we’re still discussing and no standard convention has been agreed yet:
- Branch layout – as written above, we like to reference all our development in the master branch, and at the same time keep it always tested and ready for deployment. That just feels right. However, this approach gave us a little headache when, in the past, we had some already developed features in the master that could not be deployed to our staging environment, and yet a hotfix had to be developed and applied straight away.
- Use of tags – they’re cute, lightweight and descriptive, but I can’t say we’ve been using many of them.
- Rebase vs. Merge – This is one of the big ideological arguments in the git community. Merge supporters like the fact of being able to keep trace of all atomic, incremental addition to the repository, while rebase supporters like having tidy, minimal aggregated changesets descending from “contained” development efforts.
If you feel like contributing your point of view, please do so in the comments below
Finally, here are a few more pointers for those looking for more specific info about git