Wednesday, July 30, 2008

An optimal backup+mirroring system.

Backing up to something other than a harddrive has the disadvantage that you can't backup everything up-to-the-minute.
Backing up using RAID 1 or another form of mirroring has the disadvantage that it reduces the amount of logical harddrive space to a fraction of its physical combined size.

You could get the best of both worlds by:
a) backing up new data to DVD-R or maybe DVD-RW periodically, for example once a day
b) mirroring what *isn't* yet backed up on other harddrive(s)

The filesystem would keep track of everything that's backed up and automatically mirror either sectors or files that are added or changed, until they're backed up. The mirrored data would exist as separate files, or one large file, on the (partially) mirrored harddrive(s) so that it can grow and shrink and not need the entire harddrive or even a fixed-sized partition on it.

If the mirroring works by sectors then the one large file could have an allocation table of sectors and, since they're all the same size, when one is removed the last sector in the file can be copied to its location and then deleted from the end. For efficiency implemented on the file system level the sectors being mirrored could sit squarely on sectors in the file system they're being mirrored on.

In the above case you don't even have to copy a sector from the end when one is deleted: just change the info in the FS of which sectors the file uses. But the kind of file system fragmentation that would cause might defeat the purpose.

If, instead of files or sectors, the mirroring works on arbitrarily small chunks of files, then the one large file could be a database, which should be periodically compressed.

Partially mirrored harddrives should be able to be any harddrive whether it's on the same system or on any other system on the network, such as with Chiron FS.

Each system used as a mirror should also be able to have its own data mirrored, except of course for the data that's already acting as a mirror. That way, for example, if its harddrive crashes you won't have to reinstall the OS: just sync a new harddrive with a mirror (and/or DVD backup) and then install it.

I'm not sure how incremental DVD backups are usually handled, so that you don't have to go back 10 years and put in 100 different DVDs in series to reconstruct the original data, but I'm sure this question has already been handled. But if not, I have some ideas.

1) when possible, erase the entire DVD and rewrite it. Make sure you don't rewrite a DVD too many times so that it doesn't work. Have the software keep track of how many times it's been rewritten. Also use multisession for incremental changes. If used in combination with erasing, a multisession DVD wouldn't have to be erased until it gets full due to incremental additions.

2) Have a limit to how many backup DVDs a given filesystem state can depend on. The software should know exactly what's on what DVD so it can automatically enforce this limit. When it's too many, it can start over from the beginning and backup to new DVDs, or better, erase the old DVDs and write over them. OR it can only rewrite just enough to keep the number of backup DVDs below the limit.

--i've been informed that tape backup is better than dvd for large scale servers. so replace dvd with tape backup in the above. and remove 1 and 2. unless dvd is still an economical solution for small scale.

No comments: