Easy GitHub backup with curl

We just deleted an old private GitHub repository. Before we did, I wanted a copy of the issues. Turns out, it’s super simple with curl. Of course, you can backup the repository itself very easily like:

git clone git@github.com:user/repo.git
git clone git@github.com:user/repo.wiki.git

Then to get the issues, first I created a new personal API token on GitHub. Then, took a minute to figure out, but you use that as the username with a blank password or a password of x-oauth-basic. Then the curl command looks like:

curl -u 'access_token:x-oauth-basic' https://api.github.com/repos/user/repo/issues

If you got everything right (which I didn’t the first 5 or 10 times…) you should get a JSON document containing the issues of that repo. There will probably be multiple pages. Personally, I used curl -sSi and piped to less, then I saw the links for page 2 and 3 in the headers. I repeated the process by hand and dumped all three pages into .json files.

There are tools available to automate this, but the whole thing took me less than 15 minutes by hand.

Octocat Pumpkin

Advanced CrashPlan backup strategy

Some lessons I’ve learned in my year and a half with CrashPlan. Please note, this is an advanced guide to CrashPlan. You have been warned. I assume you’re already familiar with CrashPlan and understand their backup sets, etc.

This post covers several topics (and I might update it later if I remember or discover more).

  • cron vs anacron
  • backup speed
  • file selection verification

Cron vs Anacron

Personally, I consider this a bug, and one that CrashPlan ought to have fixed a long time ago. When configuring “Verify selection every”, if you choose a number of days and a time of the day (which is the default), your backup verification will only happen if your computer is on at the scheduled time. Ala cron.

However, if you choose a number of hours < 24, and your computer is off at the scheduled time, the backup verification will run as soon as the computer is on again after the scheduled time. Ala anacron.

Bottom line, for your most important data, set the verification to run every 23 hours and accept that it’ll happen at inconvenient times of the day.

Backup speed

For a long time I felt like CrashPlan took forever to run backups. Eventually, more than a year after using the service, I decided to investigate. I found some excellent articles.

tl;dr Change the advanced settings. If you’re backing up compressed media, turn off compression. For backup sets that rarely change, change “Data de-duplication” to minimal.

I discovered this after I decided add ~500GB of media on a USB disk to my backup sets. After making these changes, the backup took about 6 weeks instead of 3 months! I regularly saw upload speeds of >6Mbps on connections that would support it, I was moving a lot during the 6 week upload period!

File selection verification

This is an optimisation I’m only now figuring out nearly 2 years into my CrashPlan adventure. If you’re backing up a large folder of very infrequently changing data, put it into its own backup set. For example, I backup ~500GB of audio and ebooks. I almost never add to the collection.

By putting this into a separate backup set from my photos I can run a manual file verification of the photos without also waiting for the verification of 1’000s of book files which I know have not changed. My advice is the more backup sets the better.

Note that if you split one backup set into multiple smaller sets, you will lose the history, including any deleted files, previous versions, etc. Best to set this up from the beginning. But remember CrashPlan is a backup system, and it should not be confused with external storage.

Conclusion

CrashPlan’s java app is horrible. It’s slow, ugly, and a PITA to use. If I could find a better alternative, I’d switch in a heartbeat, I have zero loyalty to CrashPlan. However, having said that, as of my last research, CrashPlan is simply the only contender in the market. The defining characteristics for me are:

  • Client side encryption with key that is unknown to my backup provider.
  • Sensible pricing (unlimited space, 10 computer family plan for $150/yr).
  • Indefinite retention of external drive backups (BackBlaze for example deletes these after 30 days, or after 6 months if your computer is off, even while you continue paying, completely outrageous).
  • Cross platform, even if I only actually use OSX, the idea that I can also backup a Linux based server is a necessity with a 10 machine plan.

Find me another service that has these features and I’m there. In the meantime, I continue to use CrashPlan and endure its peculiarities and shortcomings.

Rsync.net gets cheaper

My online backup service, rsync.net, has just dropped their prices. They’re now $1.20 per GB per month, unlimited bandwidth. Pretty reasonable I reckon. Plus they’ve added a couple of Windows clients to make things easier for poor souls not yet enlightened to the power of Linux. :-p

While others are talking doom and gloom it would seem rsync.net are on the up and up. Glad to be with them.

Goodbye Rsync.net, Hello Amazon S3

Update 27-Nov-2008: In the end I stayed with rsync.net.

Today I’ve decided I’m fed up with my current backup provider, rsync.net. The service they provide is pretty solid, I’ve been using it for a few months now. The main reason I chose them at $1.80 per Gb instead of Amazon at around $0.30 per Gb is the support. They guarantee to have a real, live, intelligent engineer answer my questions. That’s worth more than a few bucks a month.

However, the service of late has been abysmal. As soon as my questions got beyond “How do I plug my computer in”, it took 5 days to get a response to tell me there’s a problem with their system, it should be fixed soon. Another five days later, and still no response to my question “Will you tell me when it’s working?”.

Given that the support I thought I was getting is apparently a myth, time to switch I think. I also discovered that they won’t automatically expand my account. So if I need more space, I have to email them to ask for it. Bah.

Goodbye rsync.net, I’m afraid it’s been a little disappointing.