Before I forget, some comments on the Ancient cource code...
Pack.z
Code:
uint8_t levelCounts[24];
levelCounts[maxLevel-1]+=2;
That looks buggy. The +2 is there because the last entry can be in range 2..100h, which exceeds your uint8 array. Situations where 8bit overflows could occur:
1) 256 codes with 8bit codesize (ie. end code and 255 different bytes).
2) 1 code with 1bit codesize, plus 256 codes with 9bit codesize.
TDCS and LZW5
Those are somewhat duplicated, they could be as well merged into a single file. The only differences are:
1) adding +2 or +3 (in case 1 and 2)
2) end code is case=1/disp=0000h in LZW5, and case=3/disp=0000h in TDCS
3) whether or not using XOR for negating the disp values makes no difference at all
LOB
There is a name for that something lz: It's ByteKiller. Although with several minor changes:
1) with leading 2 uncompressed bytes
2) with disp+1, instead of disp+0
3) with forwards streams, instead backwards
4) with readbyte, instead readbits(8)
5) with 8bit lzlen+4, instead of lzlen+1
Code:
MSS (lzss style packer)
That's almost same as TurboPacker (but with inverted flag bits for compressed/uncompressed bytes). Anyways, that might be coincidence since they are both standard LZSS variants.
ByteKiller
In case you want to also support that format, here's what I've found out about it.
Code:
ByteKiller is an old compression tool with Question-and-Answer based User
Interface (the ANC Cruncher variant is commandline based, and crashes when not
specifying src and dst filenames).
ByteKiller was made by Lord Blitter (various hacks were released by other
people, some of them based on other hacks from yet other people).
The exact same ByteKiller compressed bitstream format & checksum calculation is
used in various games and demos (often with customized headers/footers).
Standard ByteKiller Format
This format is used by the original ByteKiller tool (v1.2plus, v1.3, v2.0,
v2.03, v2.05, v3.0, v3.0win, v3.0b are all using the format; except, Pro 1.0 is
appending an extra ID word).
000h 4 Compressed Size (Filesize-0Ch) (or Pro 1.0: Filesize-10h) ;\
004h 4 Uncompressed Size ; Head
008h 4 Compressed Checkum (all compressed 32bit words XORed) ;/
00Ch .. Compressed Data (32bit words, processed backwards) ;-Data
... (-) Nothing here ;<-- normal versions with [000h]=Filesize-0Ch ;\Foot
... (4) ID ("data") ;<-- Pro 1.0 version with [000h]=Filesize-10h ;/
Alternate ACE! Variant
000h 4 ID ("ACE!") ;\
004h 4 Compressed Size+4 (Filesize-0Ch) ; Head
008h 4 Uncompressed Size ;
00Ch 4 Compressed Checkum (all compressed 32bit words XORed) ;/
010h .. Compressed Data (32bit words, processed backwards) ;-Data
Alternate FVL0/MD10/MD11/PVC! Variants
000h 4 ID ("FVL0", "MD10", "MD11", "PVC!") ;\
004h (4) Compressed Size+8 (Filesize-8) ;-ID="FVL0" (ANC Cruncher) ;
004h (-) Nothing here ;-ID="MD10" (McDisk mag 1-2) ; Head
004h (2) Uncompressed Size XOR "MD" ;\ID="MD11" (McDisk mag 3) ;
006h (2) Uncompressed Size XOR "11" ;/ ;
004h (4) Uncompressed Size, as in footer ;\ID="PVC!" (VideoTracker) ;
008h (4) Compressed Size+14h (Fsize-0..2);/ ;/
... .. Compressed Data (32bit words, processed backwards) ;-Data
... 4 Compressed Checksum (all compressed 32bit words XORed) ;\
... 4 Uncompressed Size ; Foot
... (-) Nothing here ;-normal case ;
... (..) Appended stuff (see below) ;-ID="PVC!" (VideoTracker) ;/
VideoTracker "PVC!" files may have following data appended:
Player\* --> appends 1-byte
Routine\*.rot --> appends 2-bytes, plus another PVC! file, also with 2-bytes
Video\* --> appends nothing
DecompressByteKiller(src,dst,src_len,dst_len,chksum):
;This is for the plain Data part (after stripping the headers/footers)
src=src_start+src_len, dst=dst_start+dst_len
src=src-4, collected=BigEndian32bit[src], chksum=chksum XOR collected
if collected=0 then error ;first word contains 0..31 bits, plus end flag
@@decompress_lop:
if dst<=dst_start then goto @@decompress_done
if GetBits(1)=0 then
if GetBits(1)=0 then rawlen=GetBits(3)+1, goto @@copy_raw ;00
else lzlen=2, disp=GetBits(8) ;01
else
if GetBits(1)=0 then
if GetBits(1)=0 then lzlen=3, disp=GetBits(9) ;100
else lzlen=4, disp=GetBits(10) ;101
else
if GetBits(1)=0 then lzlen=GetBits(8)+1, disp=GetBits(12) ;110
else rawlen=GetBits(8)+9, goto @@copy_raw ;111
if disp=0 then error
for i=1 to lzlen, dst=dst-1, [dst]=[dst+disp] ;copy lz
goto @@decompress_lop
;---
@@copy_raw:
for i=1 to rawlen, dst=dst-1, [dst]=GetBits(8) ;copy raw
goto @@decompress_lop
;---
@@decompress_done:
if dst<>dst_start or src<>src_start or collected<>00000001h then error
if chksum<>0 then error
ret
;---
GetBits(n):
val=0, for i=1 to n, val=val*2+GetBit ;<-- LEFT shift (!)
return val
;---
GetBit:
collected=collected SHR 1 ;<-- RIGHT shift (!)
if collected=00000000h then
src=src-4, x=BigEndian32bit[src], chksum=chksum XOR x
collected=(x+100000000h) SHR 1
return carry ;carry-out from above SHR shift's
Decompression supports 12bit disp aka 1000h byte Dictionary size (although the
compressors may use less for faster compression; ByteKiller v1.3, v2.0 and Pro
v1.0 are only allowing to use max 800h bytes).
IDs for ByteKiller based formats
ID Offset Used in
Size-0Ch 0 Packer: ByteKiller 1.2+, 1.3, 2.0, 2.0x, 3.0
"ACE!" 0 Mag: Resident #1
"ARP3" 0 Tool: Action Replay III freeze file, Game demo: RedZone
"ARPF" 0 Tool: Action Replay II freeze file, Game demo: WarZone
"CRND" 0 X-Out, Killing Game Show
"CRUN" 0 Dominium
"DAVE" 0 Myth - History in the Making
"FVL0" 0 Packer: ANC Cruncher (reportedly=FLV0, actually=FVL0)
"FUCK";"MARC" 0;EOF Sensible Soccer V1.1, Alien World, Yogi's Big Clean Up
"GR20" 0 Games created with GRAC V2.00
"MD10" 0 Mag: McDisk #1, McDisk #2
"MD11" 0 Mag: McDisk #3
"PVC!" 0 Tool: VideoTracker
"xVdg" 0 AMOS: Used in compiled AMOS programs
"xVgd" 0 AMOS: AMOS Updater V1.36, EasyAMOS installation disks
"FLA0" EOF X-it
"JEK!" EOF AtariST: Jek Packer V1.2(d) - V1.3, JamPacker V1
"JEK!" EOF The seven gates of Jambala (other hdr than Jek Packer)
"MARC" EOF Mercenary III, Flimbo's Quest
"OND<" EOF Shadow of the Beast ">DATA COMPACTION (C)1989 A.R.BOND<"
"TXIC" EOF Chariots Of Wrath, Striker Number 9, Fighter Command
"data" EOF Packer: ByteKiller Pro 1.0
Wishlist: I need more test files with hacked IDs.
LZX
Code:
There is no good spec on the LZX header content -> lots of unknowns here
The format is more or less well documented (at the end of the "unlzx.c" source code).
Here is what I've extracted from it:
Code:
LZX Archive Format (1995-1997)
LZX archives can contain single files (separately compressed) and merged files
(compressed together). For example,
Archive Header
File List entry for 1st file, with Compressed Size=0 ;\
File List entry for 2nd file, with Compressed Size=0 ; Merged (Flags=1)
File List entry for 3rd file, with Compressed Size<>0 ;
Compressed data for 1st-3rd file ;/
File List entry for 4th file, with Compressed Size=0 ;\
File List entry for 5th file, with Compressed Size<>0 ; Merged (Flags=1)
Compressed data for 4th-5th file ;/
File List entry for 6th file, with Compressed Size<>0 ;\Single (Flags=0)
Compressed data for 6th file ;/
File List entry for 7th file, with Compressed Size<>0 ;\Single (Flags=0)
Compressed data for 7th file ;/
Archive Header:
000h 3 ID ("LZX")
003h 1 Flags (00h) (bit0=DamageProtect, bit1=Locked)
004h 1 Unknown (00h) (or 0Ch)
005h 1 Unknown (00h)
006h 1 Unknown (0Ah) ;maybe version? or hdrsize? or Amiga?
007h 1 Unknown (00h) (or 04h)
008h 1 Unknown (00h)
009h 1 Unknown (00h)
File List entries:
000h 1 Attr (0Fh) (bit0-7=Read,Write,Delete,Exec,Archive,Hold,Script,Pure)
001h 1 Unused (00h)
002h 4 Uncompressed Size ;LittleEndian...
006h 4 Compressed Size (or 0 for non-last merged files)
00Ah 1 Machine Type (0=MSDOS, 1=Windows, 2=OS2, 0Ah=Amiga, 14h=Unix)
00Bh 1 Method (02h) (or 00h) (0=Stored, 2=LzxCompressed, 20h=EOF=???)
00Ch 1 Flags (00h) (or 01h) (0=Single, 1=Merged)
00Dh 1 Unused (00h)
00Eh 1 Comment Length (00h) (for Amiga: 0..79)
00Fh 1 Version needed to extract (0Ah)
010h 1 Unused (00h)
011h 1 Unused (00h)
012h 4 Timestamp (similar to MSDOS format, but 6bit year and 6bit second)
016h 4 CRC32 on Uncompressed Data
01Ah 4 CRC32 on File List entry & name/comment (with this field zeroed)
01Eh 1 Filename Length
01Fh .. Filename ("path/filename")
... .. Comment (if any)
... .. Compressed Data (if any)
Unknown if there are any fields for UID/GID.
Unknown what Method=20h=EOF means (it's never used, neither 20h nor N+20h).
LZX Bug
The 68020 and 68040 versions are occassionally creating corrupted LZX archives.
As workaround, always use the 68000 version for compression. In particular, LZX
seems to store garbage in the last 2-3 bytes of some files:
http://telparia.com/fileFormatSamples/archive/lzx/CXHandler3.8.LZX - corrupt
http://aminet.net/package/util/cdity/CXHandlerV38 - intact copy of above files
Apart from the Unknown/Unused fields, the biggest unknown is that Method=20h=EOF thing.
LHLB (lh.library)
Code:
different distance/count logic
Yeah, but not soo different.
The length/count logic is exactly same. Except that the datalen tree contains two extra length codes with len=1 and len=2 (these are probably unused dummy codes, for avoiding to need to add +2 to the length values) (or maybe len=2 is actually used, but len=1 is certainly useless).
The different distance logic is just giving the exact same results as in LZHUF, you could as well use the normal LZHUF logic, instead of the huge distanceHighBits[256] table.