Decentralized File Version Control

I’ve been using Subversion for years to store all of my files. It works well for binary and text files and Subversion is capable of file locking which is an important feature for binary CAD files. I know that lots of people tell you no put binary files in version control software, but it does work if you use the right system. Right now there’s 10’s of gigabytes in my work repository with a working copy that’s about 2GB and there hasn’t been any problems so far. The advantages of using Subversion are:

  • Data security: All your data is in Subversion which means it’s easy to go back in time if necessary. If you setup a separate server you also get redundancy as well since now there’s multiple copies of the data.
  • File locking: This is important for organizations where multiple people work on the same files. By enforcing file locking rules, Subversion makes sure that only 1 person can edit a file at a time. Nothing is more frustrating than having your changes overwritten by someone else.
  • Remote file access: You can easily access Subversion over HTTPS or SSH.

But nothing is free. Here are some of the disadvantages:

  • Workflow complexity: Programmers are used to version control software. Other people are not and it can cause problems when they have to do more than just edit and save files. Everyone now has to manually lock files and commit them.
  • Centralized server: A centralized server solves all the file locking issues for an organization. But if it goes down everything stops just like a file server. It’s also an issue if remote users can’t remain constantly connected.
  • Big repositories: Most Subversion servers use 1 repository for everything. That can cause issues with directory structures and user permissions.
  • Administration: Setting up a Subversion server properly can be a bit of a learning curve.
  • Performance: Subversion is really slow on big files. It’s just the way it is and there’s not much you can do about it.

For me the biggest feature of Subversion is an increase in data reliability because every client computer has a copy of the server’s workspace. I don’t believe in RAID no matter what the level, especially when I lost all the data in an array once. I also now believe in offsite copies after a natural event happened here a few years ago, so I believe in backing up Subversion to an offsite service like Amazon S3. If you using large files, I don’t recommend using Subversion cloud hosting. It’s just too slow.

This setup has worked for years, but lately it’s causing problems with my workflow because sometimes I want to work disconnected from the network on a laptop. There isn’t really a way to do this with the current version of Subversion unless you buy Wandisco Subversion Multisite. Unless you have lots of users, it’s actually cheaper to buy your employees cars. Seriously. Subversion can push changes to a remote read only server, but all writes have to go back to the main one. That only works if you don’t want to use version control when your disconnected and if you don’t use file locks. Neither applies to me.

At first I thought I could use offline files in Windows. That wasn’t an option since I’d have to start upgrading Windows licenses. I also also looked at Microsoft’s new synchronization tools, but they no longer work on Windows XP and they also aren’t compatible with a Samba server. The only product that came close to meeting my needs was Dropbox, but the costs can really add up quickly. So then I started to look at distributed version control systems. But I quickly learned that they aren’t designed to deal with large binary files very well. The best solution I could find was PlasticSCM which advertised good binary file support. They even provide a free 15 user license, but I just couldn’t get it running and what I did see didn’t impress me.

That left me with few options left. I use Windows so that leaves out Git, so I chose Mercurial. Right now I’m doing some testing and it looks like it can meet most of my needs. It can handle binary files and with an extension module it can handle files bigger than 10MB quite well. And the really nice thing is that because every workspace is the repository, off site hosting like BitBucket and Kiln are options. It actually doesn’t make sense to host your own repositories because sites like those are free for small teams. I’m planning on using Kiln for projects with lots of binary files since it supports them better than BitBucket. However, I also intend to use BitBucket for more open source type projects since it has more traction in the open source community. But the best part is that the software on my end remains the same no matter what service I use. Both sites also provide bug tracking and documentation systems as well.

All this goodness comes at the expense of file locking unfortunately. It manageable for small teams but I doubt it effective for larger ones. For now I intend to deal with it by using smaller repositories (1 per project). That way if it becomes necessary I can manually make the repository read only on the main server or make sure that edits are being communicated to other team members (like what you would do on any file server). Another option is that all writes to the main repository have to go through 1 person. I intend to try Mercurial on some smaller projects first to get a feel for if no file locking is a deal breaker, but for now I don’t think it is.