Git with the Program

by Ostatic Staff - Apr. 04, 2008

Many open-source projects are switching from CVS and Subversion to newer, distributed version-control systems. One of those is Git, written by none other than Linus Torvalds, originally for use on the Linux kernel. Git is becoming increasingly popular for use on other projects, as well. Why?

The Internet has, of course, been essential to the growth of the open source movement. Most communication among the developers of an open source project takes place via e-mail, IRC, or even instant messaging.

The Internet has also made it possible for developers to parallelize their work, with each programmer on a project handling a different part of it. But even in highly modularized projects, there are times when you need to ensure that two programmers don't change the same file -- and more importantly, that they don't change it in mutually incompatible ways.

The solution is to use a version-control system, which tracks every change made to a file, adding notes indicating when the change was made, by whom, and even why. Version control is like a backup system for programmers, ensuring that they can always revert to an earlier version if something goes wrong. It also functions as an early-warning system, telling programmers that they must resolve a conflict with someone else before moving forward.

Version control systems also make it possible to work with "branches" of code, so that a programmer can make a change to version 1.1 without affecting the code for the soon-to-be-released version 2.0. Branching and merging are typically among the hardest things for programmers to understand in a version control system, which means that the software's interface for such functions is crucial.

For years, the most common version-control system among open-source projects was CVS, the Concurrent Versions System. CVS worked well enough for many projects, but it had a very large number of problems. To begin with, it wouldn't let you move or rename files; instead, you had to pretend that the old filename was deleted, and that the new one was created. This, of course, meant that. There were also issues with international (Unicode) text files, as well as binary (e.g., image) files, which had to be saved in a particular manner.

CVS was problematic enough that a new version-control system, named Subversion, was design with the explicit claim of being "a better CVS." This was largely true, in that Subversion fixed many problems with CVS. But it still used the same fundamental model of CVS, with multiple clients and a single server, all connected via the Internet. In such an environment, each programmer has only their version of the software "checked out" on their computer's filesystem. Information about previous versions, other branches, and so forth is only available on the server -- and thus, is only available when someone is connected to the Internet.

In the last few years, a new style of version-control system has emerged, in which each member has a complete copy of the version repository, including all branch, tag, and log information. While this means that each person's disk contains a greater amount of information, it also gives the programmer added flexibility and power.

Some commercial version-control systems use this same technique, as well. Most famously, BitKeeper was used to manage the Linux kernel for several years, after Linus Torvalds explicitly stated that CVS and Subversion weren't even close to powerful or useful enough for his purposes. BitKeeper might not be open source software, but Linus said that it was the first system that he found good enough -- and fast enough -- for him to use on a regular basis. The license under which BitKeeper was made available to open source developers was quite restrictive, however, and eventually resulted in a dispute between the head of BitMover (the company that makes BitKeeper) and the kernel team.

This left Linus forced to choose between no version control (which is how things were actually done for a long time), version control that he and others hated (e.g., CVS and Subversion), using a commercial package (which would be impractical to treated with hostility from the rest of the community), or building a new tool. Linus decided to build a new tool, which he called Git, which incorporates many of the features of BitKeeper.

Git has been around for nearly three years, and is now in constant use by the Linux kernel team. In the last few months, though, I have seen a number of open-source projects switch from Subversion to Git -- most notably Ruby on Rails, whose team announced the switch earlier this week.

There have been some complaints about the fact that Git is less compatible with Windows than other operating systems. But despite some roadblocks, it is a cross-platform system. And according to Linus, he built it because it not only handles version control reliably, but because it does so quickly, a claim that other version-control systems cannot make.

Moreover, there is now a growing number of Git hosting solutions, such as Github, currently in beta testing. People who want to get started with Git, but don't have a server which can serve as a master repository, can thus do so without much trouble.

We thus see that Git was born of frustration, and was originally designed for use on the Linux kernel. But it is now in regular use by other projects as well -- and if frustration with Subversion continues to grow, then we might just see Git become the open source version-control system of choice. For more on Git, see Mike's post.

How do you handle version control?