10 Oct 2002 22:35
Compressed filesystem (was Re: CryptoGraphic Disk.)
Todd Vierling <tv <at> pobox.com>
2002-10-10 20:35:07 GMT
2002-10-10 20:35:07 GMT
On Sun, 6 Oct 2002, Hubert Feyrer wrote: : Uhu. I understand writing esp. in the middle of files is interesting. : But for a start, a read-only filesystem that was "pre-compressed" would be : enough - what I have in mind is a CD that has more than 650MB. Warning: compression internals discussion ahead. Watch for falling bits. The biggest problem with any on-the-fly mechanism, even for reading, is seeking *backwards*. Most compression algorithms are highly directional (towards the end of the stream). Without some mechanism for handling backwards motion, any such seek would require a rewind() and restart of decompression from the beginning. That would be really ugly and obnoxiously slow for any file > 256KB or so. Now, this could be accomplished with gzip/zlib compression, provided you have some layer to intercept lseek(), through crafty finagling of zlib internals (by stashing the decompressor's state machine and Lempel-Ziv window periodically as "smart restore points"). That's memory intensive depending on the compression level (-z9 is a 32KB LZ window, I believe), and how your backoff algorithm chooses when to throw away saved state. The other alternative, which has been used in some compression schemes, is to partition the uncompressed blocks so that there's a "clean" state machine at known points throughout the stream. Typically this consists of independently compressed blocks of a fixed uncompressed size, with some sort of periodic index of the blocks (such as a N-block table between each N data blocks, or a pointer to the next block at the start of each, or similar). Obviously, the compression achieved here is far less than you'd get from a complete-stream gzip pass, but it doesn't have nearly the requirements of changing horses^W^Wrestarting gzip decompression in the middle of a stream. The former case is a candidate for a LD_PRELOAD overlay of libc, and would never belong in kernel space at all, though it has the highest degree of compatibility (given that it would handle plain gzipped files). The latter case lends itself much better to a layered filesystem, and could be kernelized depending on the memory requirements of the compression scheme used (LZO, for instance, is very low on memory consumption). Oh, I haven't even addressed writing here, yet. All this is read-only.-- -- -- Todd Vierling <tv <at> pobox.com>
RSS Feed