Tuesday, August 01, 2006

Backupy Goodness

I'm sort of paranoid about losing data. I've had occurrences in my life where I lost files that I was wanting when I did something stupid, or when hardware failed. Because of this, I'm pretty religious about redundancy for my data. All of my data sits on a raid-1 array, and important files are mirrored daily on both my and Holly's workstations. I also have a safe deposit box at the bank in which I frequently deposit backup DVDs.

With all of this data, I've run into a problem lately. Since I changed jobs about 1.5 years ago, I don't make it to the bank nearly as often as I used to. What was once a monthly trip, it has now changed to once every five, six and even seven months. Also, the bulk of my data isn't changing. For the data on DVDs, I could just change out the discs that contain changing data, but the others will eventually degrade. It seems wasteful to use 7 DVDs for photos that are the same as last time. Because of these problems, I've decided to move some of my backups online.

I'm now backing up data to S3, which is a data storage service from Amazon. Rather than use their APIs to write custom file handling code, I'm using Jungle Disk to handle file transfers between my machine and S3.

I had two goals for the backup service. I wanted to only transfer necessary files to cut down on bandwidth costs, and I wanted the data to be encrypted in an easily-recoverable way.

For security, Jungle Disk does support RC4 encryption. I imagine that they went with a stream cipher because there isn't a large minimum block size that one would get with block ciphers (which saves storage space and money). I decided against using one tool's specific encryption scheme because I didn't want to be locked into Jungle Disk if I ever decided to switch things around. I decided instead to use gpg for my encryption needs.

For the bandwidth issues, I needed a tool that would only sync changed files. I could do something like rsync, robocopy, or SyncToy, but they rely on source files matching destination files. Unless I kept a local cache of the gpg encrypted files, I wouldn't be able to do this easily.

To accomplish these goals, I ended up writing my own tool (surprise, surprise). It uses gpg for encryption, and it compares md5 sums of the source files with a file manifest that I keep to decide when to update a file. I can define which directories I wish to back up, whether to sign them or not, and it takes it from there. It can be run from a command line, and it has a GUI for easy configuration. I'm pretty happy with it. Files are signed by my gpg key, and they are encrypted with both Holly's and my public keys.

So far, I've only been testing it out with a few hundred megabytes of data. After a month of using it, I think I'm going to scale it up to back up my photo collection as well.


Travis said...

That's hot. I have a half-written S3 client that does AES encryption. I got busy with work and never got around to implementing the UI. I'll have to check out this JungleDrive.

Anonymous said...

Your utility is quite in-line with what I'm trying to accomplish with backups and S3. Is your utility GPL? Do you have a download for it? Thanks.

mail (at) james [dot] crocker [dot] name

Tim said...

Might not be up your alley but I'm using encfs locally for anything that's sensitive and/or needs backing up to S3