View Single Post
Old 03 December 2004, 19:22   #5
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Quote:
Originally Posted by girv
@Photon: what algorithm do you use? Just curious - data compression/encryption is a pet interest of mine (sad but true )
It's basically a block/pattern recognizer. Most such crunchers can't compete with LZW type algorithms but this one has special optimizations. It scans the whole file first to find how frequent each byte value is. Then it uses 12 special tokens to denote a crunched sequence of bytes. These tokens work like semi-intelligent code, decrunching its own pattern or block. The "intelligence" is applied when crunching, that's why it's almost as slow as the Titanics cruncher (right now at least).

By 'special optimizations' I mean, when I decided what these tokens would be, I took stats of files and studied them. I had to throw out a few tokens that I thought "these tokens will gain lots of bytes for sure!", when it turned out they didn't! The stats also told me how to balance the "bytes gained/bytes used in output file" ratio.

Right now the cruncher uses a scan range (for patterns) of up to 2048 bytes. Maybe a few bytes could be shaved off if >4-byte patterns are found far away, in that case it's easy to add a token. But I would have to implement the pointertable speedup first, since scanning a 350K file with a range of 65536 bytes takes *forever*!

I think the reason I get better results than other block/pattern crunchers is because I took those stats and used them to save BITS too, not just bytes. The bits are of course saved in the output of the crunched file, I don't go into the bit level when I'm reading the file!


I would love to get some comparisons of crunched file ratios, since I can't run all crunchers on my setup! It's made to auto-decrunch executables (like Imploder), not data files, so it would probably not be as good for crunching .IFF files and the like, compared to LHA, for example. But I haven't compared many data files yet, I'm coding on my demo now.

Last edited by Photon; 03 December 2004 at 19:24. Reason: speling eror
Photon is offline  
 
Page generated in 0.07948 seconds with 11 queries