Is there a better packer than arjm7? - Page 2

ross · 15 September 2021, 16:17

Quote:

Originally Posted by bebbo

hm...

Very nice report.

However the purpose is not clear to me

(there is none of the compressors we talked about)

Is it to show compression speeds that are actually acceptable?

Photon · 15 September 2021, 19:53

Nice thread, getting the itch as always, a few thoughts.

Leonard: Floppy speed is a 'decent' reference but also relative, I've apparently used 28936 b/s as definition (don't remember the calculation but it includes MFM decoding.) However if performance is desired, it's not good to settle for floppy speed, because then you have little time for 'action' (Tai-Pan/Phalanx definition

) For such needs I would put 'good enough' at twice floppy speed at least or 58K/s.

Sometimes performance isn't a big deal (example: onefiler-on-floppy) and then any decompression speed is good, slower than floppy speed could even save buffers if you risk it. So this is why I think floppy speed is a decent reference but not necessarily the goal of a competitive decruncher.

This was the reasoning behind creating Nibbler (new algorithm).

a/b: Like the initiative

but Shrinkler is at 0 bytes/s? If you could check the axis, feel free to place Nibbler somewhere. My chart has quite few data points and was measured before all these legacy algorithms were ported and explored. (Though old, they can reach great ratios if run exhaustively, and the same is true if some features are removed to improve decompression speed, so the fastest versions of them should not be discounted but run a million times to make the most of them with modern tech 35y later.)

It would be better with more datapoints and categorized by type of content (sorry, was too lazy to add all at the time), because algorithm and setting can affect ratio depending on it. Often I see this "ratio!" with no concern for the type of content. There is nothing that says you shouldn't use multiple crunchers in a single release, or over separate releases, but there's some desire there to just "make it smaller and never change my tools". It hasn't been possible yet, and maybe there's a lesson there to keep exploring

I have a burning desire to finish my improvements to Nibbler, but the stats-running+analysis is very time-consuming, and I must finish previous obligations first.

a/b · 15 September 2021, 20:18

It's not my image, I just found it elsewhere and decided to post it because it looked interesting and relevant, showing zx0's position compared to some of the other known algorithms.
BTW, horizontal axis is inverted: right side is copy speed (ldir=1), and left side is 25x+ copy speed, which is where Shrinkler resides.

NorthWay · 15 September 2021, 21:02

Quote:

Originally Posted by ross

...

Thanks.

I have always thought of LZ as using [offset,length] pairs - typically adding up to 16 bits - but I guess I should think of it as [length[,offset]]

leonard · 15 September 2021, 22:39

zx0 has really nice properties. I took time to pack my AmigAtari demo as this is the most challenging data to fit on single floppy. Despite zx0 did very good job it doesn't succeed in making AmigAtari fit on the disk. Here is the arjm7 original AmigAtari version:

Code:

            boot.bin    310    310 (100%) [---] Off:$000000 (00/0/01:$000) (user arg=0)
       dirkernel.tmp   9080   6720 ( 74%) [AR4] Off:$000136 (00/0/01:$136) (user arg=0)
       logo_fade.bin 131160  27284 ( 20%) [AR7] Off:$001b76 (00/1/03:$176) (user arg=0)(C:128KiB F:  1KiB)
            main.bin 331380 122194 ( 36%) [AR7] Off:$00860a (03/0/02:$00a) (user arg=0)(C:422KiB F:234KiB)
        ym7Pack0.bin  18672  12402 ( 66%) [AR7] Off:$02635c (13/1/09:$15c) (user arg=0)
        ym7Pack1.bin 202162 113136 ( 55%) [AR7] Off:$0293ce (14/1/11:$1ce) (user arg=0)
        ym7Pack2.bin 200448 112806 ( 56%) [AR7] Off:$044dbe (25/0/01:$1be) (user arg=0)
        ym7Pack3.bin 199646 107798 ( 53%) [AR7] Off:$060664 (35/0/02:$064) (user arg=0)
        ym7Pack4.bin 203764 106350 ( 52%) [AR7] Off:$07ab7a (44/1/03:$17a) (user arg=0)
        ym7Pack5.bin 174760  91400 ( 52%) [AR7] Off:$094ae8 (54/0/02:$0e8) (user arg=0)
        ym7Pack6.bin 128800  58362 ( 45%) [AR7] Off:$0aaff0 (62/0/04:$1f0) (user arg=0)
     CosoPackLz4.bin 218912 142048 ( 64%) [AR7] Off:$0b93ea (67/0/08:$1ea) (user arg=0)
----------------------------------------------------------------
Saving AmigAtari.adf:
Disk contains 12 files, packing ratio: 49%
1777KiB packed to 880KiB ( 1819094 to 900810 bytes )
1KiB left ( 310 bytes )

and now the result using zx0

Code:

            boot.bin    310    310 (100%) [---] Off:$000000 (00/0/01:$000) (user arg=0)
       dirkernel.tmp   9180   6800 ( 74%) [AR4] Off:$000136 (00/0/01:$136) (user arg=0)
       logo_fade.bin 131160  27364 ( 20%) [AR7] Off:$001bc6 (00/1/03:$1c6) (user arg=0)(C:128KiB F:  1KiB)
            main.bin 331380 123368 ( 37%) [AR7] Off:$0086aa (03/0/02:$0aa) (user arg=0)(C:422KiB F:234KiB)
        ym7Pack0.bin  18672  14052 ( 75%) [AR7] Off:$026892 (14/0/01:$092) (user arg=0)
        ym7Pack1.bin 202162 129864 ( 64%) [AR7] Off:$029f76 (15/0/06:$176) (user arg=0)
        ym7Pack2.bin 200448 125772 ( 62%) [AR7] Off:$049abe (26/1/07:$0be) (user arg=0)
        ym7Pack3.bin 199646 130882 ( 65%) [AR7] Off:$06860a (37/1/11:$00a) (user arg=0)
        ym7Pack4.bin 203764 125114 ( 61%) [AR7] Off:$08854c (49/1/02:$14c) (user arg=0)
        ym7Pack5.bin 174760 108342 ( 61%) [AR7] Off:$0a6e06 (60/1/05:$006) (user arg=0)
        ym7Pack6.bin 128800  66602 ( 51%) [AR7] Off:$0c153c (70/0/07:$13c) (user arg=0)
     CosoPackLz4.bin 218912 160446 ( 73%) [AR7] Off:$0d1966 (76/0/05:$166) (user arg=0)
ERROR: Don't fit on the disk.

the 3 first files are pretty same ratio but others are a bit larger with zx0, esp the last one "CosoPackLz4.bin" . These files are special, they are aleady LZ4 packed. This is the key feature of AmigAtari, LZ4 so the package could fit in memory, and LZ4 + Arj packed to fit on the disk ( LZ4 could be packed, because it's a byte stream format ).

I'm still looking for another packer that could fit AmigAtari on a floppy....

ross · 15 September 2021, 22:48

Quote:

Originally Posted by leonard

I'm still looking for another packer that could fit AmigAtari on a floppy....

Add an entropy coding stage to ZX0

leonard · 15 September 2021, 23:12

Quote:

Originally Posted by ross

Add an entropy coding stage to ZX0

that's not that simple... one of the key feature of zx0 is that there is no entropy coding, so they can brute force all combine between litterals + pair/offset ( and it also supports string of litteral instead of 9bits per literral for standard LZxx )

If you add entropy coding, it become extremely hard to brute force the search space.

Zx0 is a really powerfull packer for small files, small platforms. All the energy is spent at compression stage. Such a good packing ratio for such a simple depacker is beautifull

Photon · 15 September 2021, 23:33

Quote:

Originally Posted by a/b

It's not my image, I just found it elsewhere and decided to post it because it looked interesting and relevant, showing zx0's position compared to some of the other known algorithms.
BTW, horizontal axis is inverted: right side is copy speed (ldir=1), and left side is 25x+ copy speed, which is where Shrinkler resides.

Copyspeed is indeed another point of axis, one which normally can't be transcended by a decompressor, but you can get close.

The image is bad because I see an axis starting at 0 bytes per "frame" and ending at 2500 bytes per "frame". And 0 bytes per anything is 0 bytes per second. Please don't put Nibbler on this image, and I've already stressed the importance of type of content before strictly committing to single cruncher, if ever.

Quote:

Originally Posted by leonard

I'm still looking for another packer that could fit AmigAtari on a floppy....

You have posted this here as a challenge even though it fits, but swap out one or two files (such as the already compressed files, which might fare great with a basic RLE...), and arjm7 might not be the one to beat.

Again as per my "loading scheme" paragraphs much can be done for presentation by staging the decompression and using the right tool for each stage.

ross · 15 September 2021, 23:41

Quote:

Originally Posted by leonard

that's not that simple...

Yep, mine was basically a joke, I'm absolutely aware of how complicated such a thing would be.
Much more feasible to add support for distant offsets (maybe using -q compression, otherwise the compression time would be..)

Don_Adan · 15 September 2021, 23:55

Easy, use LZMA only for CosoPack file. Add nice jingle when this file is depacked a few minutes and you can use other packer for rest files. Or use LZMA only for packed main (big) file (like in BC Kid 1 disk) and add jingle with nice music and text "Please wait loading and depacking." People can/must wait at begining only. Rest of files will be quickly depacked.

meynaf · 16 September 2021, 08:00

Quote:

Originally Posted by leonard

I'm still looking for another packer that could fit AmigAtari on a floppy....

Personnally i would reconsider using own encoded soundchip output rather than original music code and data. For example COSO/TFMX songs are quite small and you could reuse player code where appropriate.

a/b · 16 September 2021, 10:43

@Thcm optimize can be a bit faster with this:

Code:

int elias_gamma_bits(int value) {
#if defined __GNUC__
    // written this way to cancel out a xor in __builtin_clz()
    return 1 + ((__builtin_clz(v)^(sizeof(int)*8-1))<<1);
#elif defined _MSC_VER
    int bits;
    _BitScanReverse(&bits,value);   // might need <intrin.h>
    return  1 + (bits<<1);
#else
    int bits = 1;
    while (value > 1) {
        bits += 2;
        value >>= 1;
    }
    return bits;
#endif
}

When compiled with optimizations, this will end up using a single bsr instruction on satan-cpu to find the highest 1. I don't use other compilers so only those two cases covered.

Don_Adan · 16 September 2021, 11:26

Or you can depack LZ4 packed files and use direct zx0 on these files. Some packers dont like to pack packed files and LZ4 seems to be average packer for me. Second option is to split big files on smallest parts like 30KB or 60KB and check if zx0 can pack these better. Anyway double packed files is never good option for me. Good packer must always pack better original (not packed) file, than packed with other packer already.

ross · 16 September 2021, 14:02

I see 3 possible improvements/optimizations that can be applied to the ZX0 68k depacker (apart from the obvious ones concerning micro-optimization), of course modifying the bitcoding structure.

One that I used in nrv2x, one that I used in aplibx and one that concerns the structure of the raw stream itself (it has a really cool feature, if I'm not wrong it's an 'even' encoding!*).

Also the 'end token' would be better moved in the literal run stream decoding for two reasons: probably guarantees a less invasive check (but a bit more end bits?, but who cares) and can solve the >64k lrun, that is plain wrong in actual decoder.
I think we can gain a decent decompression speed compared to the available code (which to tell the truth I haven't tried yet

)
Tonight I'll work on it a bit.

*probably in the 68k case an 'odd' encoding is better; this means that the 'startup' byte should be different (and a single bit gained)

leonard · 16 September 2021, 14:03

Quote:

Originally Posted by meynaf

Personnally i would reconsider using own encoded soundchip output rather than original music code and data. For example COSO/TFMX songs are quite small and you could reuse player code where appropriate.

oh of course it would make data a lot smaller! but this is a totally different issue. I just used AmigAtari demo as a size benchmark. I could use "De Profundis" demo too, contains a lot of amiga demo data. ( but packed 2 disks with zx0 could take hours

)

a/b · 16 September 2021, 15:37

Quote:

Originally Posted by ross

... One that I used in nrv2x...

What I changed in nrv2b... inverted all match offset bits, and the rest was minor patching of bits and bytes at the end to reduce overrun and size a tiny bit. However, I didn't have any interests in in-place depacking, and I was happy with depacker size and speed so I didn't mess with bitcode any futher.
I see a not.b d0 (offset hi-byte) in depacker, plus lsl.w #8 so there is definitely a potential there :P.

ross · 16 September 2021, 16:04

Quote:

Originally Posted by a/b

However, I didn't have any interests in in-place depacking

Well, actually there is a big interest in in-place depacking (backward or forward ones).
For one simple reason: many coders don't want to know anything about offsets, safety bytes, pre-set buffers and arcane requirements.
They want to allocate the original memory and tell the unpacker to do the job like a blackbox.

Quote:

and I was happy with depacker size and speed so I didn't mess with bitcode any futher.

I thought ZX0 would stimulate the optimizer-man inside of you

(and I'm not talking about micro-optimizations, which everyone can do).

Quote:

I see a not.b d0 (offset hi-byte) in depacker, plus lsl.w #8 so there is definitely a potential there :P.

Yep, the one I've used in aplibx that uses a similar approach for offsets

ross · 16 September 2021, 20:44

Ok, new provisional bitcode defined (EDIT: AZX0?, name to be defined

).

The unpacker code is 120 bytes with no subroutines (is unrolled as much as possible) and contains some accelerators, fully supports >16-bit offsets and protection for literal runs >64k.
It does not yet contain the outline code for in-place decompression and some details.
It works till now only in my head because I don't even encoded a synthetic stream for it to try (so for sure it will have to be fixed here and there

)
Fortunately the stream is really simple so it didn't take me that long, I think I used all the tricks I could (not many because the code is too little..).

If I have not made huge errors of concept (which could well be) in a short time I try to generate the new bitcode.
I expect good speed.

ross · 17 September 2021, 17:47

More code unrollolled, more accelerated paths, now I'm near 150 bytes.
Maybe time to stop, I'm at short branches limit.

And I haven't tried with a synthetic stream yet, so nothing might work

I now use much more the original bitcode so the conversion should be simplified.

ross · 22 September 2021, 17:27

Ok, now I've a working packer and unpacker for the 'new' ZX0 stream, optimized and friendly to 68k.

I've tested it with the github mentioned "cobra.scr" (a little file of 6912 bytes, packed to 2294)

Impressive results (they went beyond my wildest expectations

):
- original stream and original 68k decoder: 387600 cycles
- 'new' stream and my optimized decoder: 275700 cycles

This is a whopping 40% speed increase!
Those who know about decompressors know it is a remarkable achievement.

I would also have done an absolute speed comparison with other (de)packers, but as long as there is no support for large offsets it makes no sense.

15 September 2021, 19:53	#22
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,655	Nice thread, getting the itch as always, a few thoughts. Leonard: Floppy speed is a 'decent' reference but also relative, I've apparently used 28936 b/s as definition (don't remember the calculation but it includes MFM decoding.) However if performance is desired, it's not good to settle for floppy speed, because then you have little time for 'action' (Tai-Pan/Phalanx definition ) For such needs I would put 'good enough' at twice floppy speed at least or 58K/s. Sometimes performance isn't a big deal (example: onefiler-on-floppy) and then any decompression speed is good, slower than floppy speed could even save buffers if you risk it. So this is why I think floppy speed is a decent reference but not necessarily the goal of a competitive decruncher. This was the reasoning behind creating Nibbler (new algorithm). a/b: Like the initiative but Shrinkler is at 0 bytes/s? If you could check the axis, feel free to place Nibbler somewhere. My chart has quite few data points and was measured before all these legacy algorithms were ported and explored. (Though old, they can reach great ratios if run exhaustively, and the same is true if some features are removed to improve decompression speed, so the fastest versions of them should not be discounted but run a million times to make the most of them with modern tech 35y later.) It would be better with more datapoints and categorized by type of content (sorry, was too lazy to add all at the time), because algorithm and setting can affect ratio depending on it. Often I see this "ratio!" with no concern for the type of content. There is nothing that says you shouldn't use multiple crunchers in a single release, or over separate releases, but there's some desire there to just "make it smaller and never change my tools". It hasn't been possible yet, and maybe there's a lesson there to keep exploring I have a burning desire to finish my improvements to Nibbler, but the stats-running+analysis is very time-consuming, and I must finish previous obligations first. Last edited by Photon; 15 September 2021 at 20:03.

16 September 2021, 20:44	#38
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 54 Posts: 4,491	Ok, new provisional bitcode defined (EDIT: AZX0?, name to be defined ). The unpacker code is 120 bytes with no subroutines (is unrolled as much as possible) and contains some accelerators, fully supports >16-bit offsets and protection for literal runs >64k. It does not yet contain the outline code for in-place decompression and some details. It works till now only in my head because I don't even encoded a synthetic stream for it to try (so for sure it will have to be fixed here and there ) Fortunately the stream is really simple so it didn't take me that long, I think I used all the tricks I could (not many because the code is too little..). If I have not made huge errors of concept (which could well be) in a short time I try to generate the new bitcode. I expect good speed. Last edited by ross; 16 September 2021 at 20:53.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Zip packer corrupting files ?	Retroplay	support.Apps	13	23 July 2011 12:17
old soundeditors and pt-packer	Promax	request.Apps	7	14 July 2010 13:21
Pierre Adane Packer	Muerto	request.Modules	15	21 October 2009 18:03
Power Packer PP Files HELP	W4r3DeV1L	support.Apps	2	30 September 2008 06:20
Cryptoburners graphics packer	Ziaxx	request.Apps	1	06 March 2007 10:30

15 September 2021, 20:18	#23
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,062	It's not my image, I just found it elsewhere and decided to post it because it looked interesting and relevant, showing zx0's position compared to some of the other known algorithms. BTW, horizontal axis is inverted: right side is copy speed (ldir=1), and left side is 25x+ copy speed, which is where Shrinkler resides.

15 September 2021, 23:55	#30
Don_Adan Registered User Join Date: Jan 2008 Location: Warsaw/Poland Age: 56 Posts: 2,039	Easy, use LZMA only for CosoPack file. Add nice jingle when this file is depacked a few minutes and you can use other packer for rest files. Or use LZMA only for packed main (big) file (like in BC Kid 1 disk) and add jingle with nice music and text "Please wait loading and depacking." People can/must wait at begining only. Rest of files will be quickly depacked.

16 September 2021, 11:26	#33
Don_Adan Registered User Join Date: Jan 2008 Location: Warsaw/Poland Age: 56 Posts: 2,039	Or you can depack LZ4 packed files and use direct zx0 on these files. Some packers dont like to pack packed files and LZ4 seems to be average packer for me. Second option is to split big files on smallest parts like 30KB or 60KB and check if zx0 can pack these better. Anyway double packed files is never good option for me. Good packer must always pack better original (not packed) file, than packed with other packer already.

16 September 2021, 14:02	#34
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 54 Posts: 4,491	I see 3 possible improvements/optimizations that can be applied to the ZX0 68k depacker (apart from the obvious ones concerning micro-optimization), of course modifying the bitcoding structure. One that I used in nrv2x, one that I used in aplibx and one that concerns the structure of the raw stream itself (it has a really cool feature, if I'm not wrong it's an 'even' encoding!). Also the 'end token' would be better moved in the literal run stream decoding for two reasons: probably guarantees a less invasive check (but a bit more end bits?, but who cares) and can solve the >64k lrun, that is plain wrong in actual decoder. I think we can gain a decent decompression speed compared to the available code (which to tell the truth I haven't tried yet ) Tonight I'll work on it a bit. probably in the 68k case an 'odd' encoding is better; this means that the 'startup' byte should be different (and a single bit gained)

17 September 2021, 17:47	#39
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 54 Posts: 4,491	More code unrollolled, more accelerated paths, now I'm near 150 bytes. Maybe time to stop, I'm at short branches limit. And I haven't tried with a synthetic stream yet, so nothing might work I now use much more the original bitcode so the conversion should be simplified.

22 September 2021, 17:27	#40
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 54 Posts: 4,491	Ok, now I've a working packer and unpacker for the 'new' ZX0 stream, optimized and friendly to 68k. I've tested it with the github mentioned "cobra.scr" (a little file of 6912 bytes, packed to 2294) Impressive results (they went beyond my wildest expectations ): - original stream and original 68k decoder: 387600 cycles - 'new' stream and my optimized decoder: 275700 cycles This is a whopping 40% speed increase! Those who know about decompressors know it is a remarkable achievement. I would also have done an absolute speed comparison with other (de)packers, but as long as there is no support for large offsets it makes no sense.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)