Backing up to Amazon Glacier
I built my business on the premise that data is valuable. My most profitable websites are the result of hundreds of hours of gathering and organizing data, so backups are important to me. I have a cron that runs daily that gathers all my databases, configuration files, keys, backup scripts, and a few other random bits of information into a compressed archive. The compression ratio is pretty good: about 1.8GB of data is compressed to about 112MB.
For the past couple of years I’ve been using Dropbox to store the backups. I have the command line version of Dropbox installed so the backup script just has to copy the backup archive into a folder, then Dropbox uploads it and pushes it out to my home computers. This has been working great for the most part, but it has a few downsides. First, it is sort of expensive. I can’t fit all the backups I want plus my own personal files into a free Dropbox account, so I’ve been paying $99/year for Dropbox Pro ($8.25 per month). Second, it isn’t terribly secure. Dropbox employees have access to your data, which isn’t really unexpected but it still isn’t something I’m thrilled about. Last, it makes all my home computers download a large 100MB+ archive every morning which is particularly annoying on my laptop while traveling.
So I decided to switch to Amazon Glacier. They charge $0.01 per GB per month for long term storage. The pricing model is very wonky, with lots of random little fees here and there depending on what you do, but as best as I can tell I think I’ll be paying less than $0.40 per month to store a rolling 100 days of backups.
To do the actual heavy lifting I decided to use glacier-cli, which is a Python tool that provides a pretty easy interface to Glacier and is easy to use in a cron or bash script. It requires Python 2.7 but my server only had Python 2.6. I was able to follow the instructions at Too Much Data to install Python 2.7 next to my existing installation. Next, I followed the installation instructions for glacier-cli to clone the required scripts. I also set up environment variables to hold my AWS access key ID and secret access key. Finally, I was ready to go. Or so I thought. I ran a simple command to list my vaults: /usr/local/bin/python2.7 /root/glacier-cli/glacier.py vault list
Egh. It didn’t work. I was missing some required Python stuff. I then spent a fair amount of time installing all the missing stuff I needed by using the easy_install-2.7 command. I was also missing the sqlite headers, which I installed using yum.
I then tried again with glacier.py and it worked! Yay! I created a vault, uploaded a file, and finally modified my backup script to use Glacier instead of Dropbox.
Anyways, I’m pretty excited. I’ve canceled my Dropbox Pro subscription and completely removed Dropbox from my server. It has freed up a pile of space on my computer, saved me some unneeded bandwidth usage, and I’m now storing backups longer than I ever have before. It is so inexpensive that I’ll likely start a weekly backup of all the files for my websites, too (about 7.4GB uncompressed if I leave out Fake Name Generator order files). This data is almost entirely in git on at least one other server, so I’m not terribly worried about losing it, but it doesn’t hurt to have redundancies in backups.