December 1, 2010:

I've been learning new things about POSIX filesystem reliability guarantees (or lack thereof) in the event of an unclean shutdown. Fun.

This isn't as bad as it sounds - I plugged my new terabyte drive into my box last night. I'm taking advantage of the opportunity to switch away from reiserfs (primarily because I really don't enjoy seeing a murderer's egowank every time I type 'mount', thankyouverymuch; secondarily, because the increased complexity adds more room for bugs; tertiarily, because last I checked (this may have been fixed), the default debian grub configuration was unbootable on reiserfs root drives after an unclean shutdown.), and so I wanted to figure out what filesystem would be best.

Unsurprisingly, that's quite the flamewarbait.

Things I've discovered:
Data from a partition 150gigs or so into the disc can be read almost twice as fast as a partition near the end of the disc. (I haven't bothered testing the very beginning, because at the moment that's occupied by a devtest workstation install that I don't want to wipe - ultimately, I intend to try to move the old install onto my old disc, but for now I'd rather have two copies of my main system and one copy of the workstation than one copy of each.)

fd = open("",O_WRONLY|O_CREAT);
is wrong. And not just because I left out all the error checking for clarity and forgot O_EXCL - posix provides no guarantee as to the order of operations ON DISK, so a crash that happens before the disk is completely synced can wind up leaving "file" with neither its old contents OR the new contents. Sigh.

I also did an assortment of benchmarking tests on various filesystems. Which consisted of untarring and then rm -rfing kernel source on them. It's not all that scientific - first of all, I didn't take steps to flush either OS-level or hardware-level caches between tests, and I wasn't working on all that large a dataset. Integer figures were obtained using date;(command);date; floating point figures were obtained using time;(command); each command was usually tested thrice, starting with a newly-created filesystem on sdb14. Since I was benchmarking anyhow, I decided to also test filesystems that I'd ruled out for one reason or another.


Filesystem tar xfj /usr/src/linux- rm -rf linux-
bfscould not mount new fs
btrfs 16, 16, 171, 1, 1
ext213, 13.3, 13.60.4, 0.2, 0.3
ext3 13, 15, 13 <1, <1, 1
ext4 15, 13, 13 1, <1, <1
jfs 32, 28, 29 10, 6, 6
jffs2 29, 28, 276, 8, 5
nilfs 18, 17, 27 1, <1, <1
ntfs 18, 18, 18 1, 1, 1
reiserfs14.3, 16.4, 17.20.9, 0.9, 0.9
vfat (sdb9)16.8, 14.5, 15.02.4, 12.5, 0.5
xfs48>50 *
zfs (sdb9)26.2, 26.1, 26.17.7, 7.3, 6.7

So: the differences between ext[234] are well within my margin of error.

Reiserfs is lagging slightly but could be within the margin of error. My benchmark does, at least, make me feel confident that moving to ext[34] won't give me a major performance hit.

The poor performance of xfs and jfs on my tests surprises me, given that they seem to be intended as high-performance filesystems; the rm in xfs was interrupted partway through, since I didn't feel like waiting for it, and the reports of data corruption after power failures in xfs were enough that I didn't intend to use it for anything unless it was drastically faster than other filesystems.

NTFS surprised me twice - first by offering the most consistent timings of any of the filesystems, and second by lagging noticeably when I unmounted the filesystem. Given the bizarre behaviour of ntfs systems when mounted from rescue CDs (I've frequently seen the filesystem itself apparently lock up, and found deleted files still appear to be present after a reboot), it wouldn't be my choice even if it appeared to outperform everything else by a factor of ten, though.

bfs just plain refused to mount a new filesystem. (I suspect, given the 512 inode maximum, that it probably couldn't handle the 1gig volume I tried to put it on.)

UDF's slowness was not entirely surprising - I've mounted udf disc images in the past, and never been happy with them.

I'm not entirely sure that I got the zfs pool set up properly, and it's on a different partition than the other benchmarks. However, the partition I put it on is on the faster end of the disc than the one I've been benchmarking everything else on - and being confused by the filesystem setup is probably a sign that I shouldn't be using it.

If I wrote a benchmark that better simulated a real load, I'd want to revisit btrfs, nilfs, and the various exts and see how they performed; as it is, I suspect that I'll wind up using ext4.


{ Add Comment }