15 September 2021, 16:17 | #21 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
|
15 September 2021, 19:53 | #22 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,655
|
Nice thread, getting the itch as always, a few thoughts.
Leonard: Floppy speed is a 'decent' reference but also relative, I've apparently used 28936 b/s as definition (don't remember the calculation but it includes MFM decoding.) However if performance is desired, it's not good to settle for floppy speed, because then you have little time for 'action' (Tai-Pan/Phalanx definition ) For such needs I would put 'good enough' at twice floppy speed at least or 58K/s. Sometimes performance isn't a big deal (example: onefiler-on-floppy) and then any decompression speed is good, slower than floppy speed could even save buffers if you risk it. So this is why I think floppy speed is a decent reference but not necessarily the goal of a competitive decruncher. This was the reasoning behind creating Nibbler (new algorithm). a/b: Like the initiative but Shrinkler is at 0 bytes/s? If you could check the axis, feel free to place Nibbler somewhere. My chart has quite few data points and was measured before all these legacy algorithms were ported and explored. (Though old, they can reach great ratios if run exhaustively, and the same is true if some features are removed to improve decompression speed, so the fastest versions of them should not be discounted but run a million times to make the most of them with modern tech 35y later.) It would be better with more datapoints and categorized by type of content (sorry, was too lazy to add all at the time), because algorithm and setting can affect ratio depending on it. Often I see this "ratio!" with no concern for the type of content. There is nothing that says you shouldn't use multiple crunchers in a single release, or over separate releases, but there's some desire there to just "make it smaller and never change my tools". It hasn't been possible yet, and maybe there's a lesson there to keep exploring I have a burning desire to finish my improvements to Nibbler, but the stats-running+analysis is very time-consuming, and I must finish previous obligations first. Last edited by Photon; 15 September 2021 at 20:03. |
15 September 2021, 20:18 | #23 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
It's not my image, I just found it elsewhere and decided to post it because it looked interesting and relevant, showing zx0's position compared to some of the other known algorithms.
BTW, horizontal axis is inverted: right side is copy speed (ldir=1), and left side is 25x+ copy speed, which is where Shrinkler resides. |
15 September 2021, 21:02 | #24 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 852
|
|
15 September 2021, 22:39 | #25 |
Registered User
Join Date: Apr 2013
Location: paris
Posts: 133
|
zx0 has really nice properties. I took time to pack my AmigAtari demo as this is the most challenging data to fit on single floppy. Despite zx0 did very good job it doesn't succeed in making AmigAtari fit on the disk. Here is the arjm7 original AmigAtari version:
Code:
boot.bin 310 310 (100%) [---] Off:$000000 (00/0/01:$000) (user arg=0) dirkernel.tmp 9080 6720 ( 74%) [AR4] Off:$000136 (00/0/01:$136) (user arg=0) logo_fade.bin 131160 27284 ( 20%) [AR7] Off:$001b76 (00/1/03:$176) (user arg=0)(C:128KiB F: 1KiB) main.bin 331380 122194 ( 36%) [AR7] Off:$00860a (03/0/02:$00a) (user arg=0)(C:422KiB F:234KiB) ym7Pack0.bin 18672 12402 ( 66%) [AR7] Off:$02635c (13/1/09:$15c) (user arg=0) ym7Pack1.bin 202162 113136 ( 55%) [AR7] Off:$0293ce (14/1/11:$1ce) (user arg=0) ym7Pack2.bin 200448 112806 ( 56%) [AR7] Off:$044dbe (25/0/01:$1be) (user arg=0) ym7Pack3.bin 199646 107798 ( 53%) [AR7] Off:$060664 (35/0/02:$064) (user arg=0) ym7Pack4.bin 203764 106350 ( 52%) [AR7] Off:$07ab7a (44/1/03:$17a) (user arg=0) ym7Pack5.bin 174760 91400 ( 52%) [AR7] Off:$094ae8 (54/0/02:$0e8) (user arg=0) ym7Pack6.bin 128800 58362 ( 45%) [AR7] Off:$0aaff0 (62/0/04:$1f0) (user arg=0) CosoPackLz4.bin 218912 142048 ( 64%) [AR7] Off:$0b93ea (67/0/08:$1ea) (user arg=0) ---------------------------------------------------------------- Saving AmigAtari.adf: Disk contains 12 files, packing ratio: 49% 1777KiB packed to 880KiB ( 1819094 to 900810 bytes ) 1KiB left ( 310 bytes ) Code:
boot.bin 310 310 (100%) [---] Off:$000000 (00/0/01:$000) (user arg=0) dirkernel.tmp 9180 6800 ( 74%) [AR4] Off:$000136 (00/0/01:$136) (user arg=0) logo_fade.bin 131160 27364 ( 20%) [AR7] Off:$001bc6 (00/1/03:$1c6) (user arg=0)(C:128KiB F: 1KiB) main.bin 331380 123368 ( 37%) [AR7] Off:$0086aa (03/0/02:$0aa) (user arg=0)(C:422KiB F:234KiB) ym7Pack0.bin 18672 14052 ( 75%) [AR7] Off:$026892 (14/0/01:$092) (user arg=0) ym7Pack1.bin 202162 129864 ( 64%) [AR7] Off:$029f76 (15/0/06:$176) (user arg=0) ym7Pack2.bin 200448 125772 ( 62%) [AR7] Off:$049abe (26/1/07:$0be) (user arg=0) ym7Pack3.bin 199646 130882 ( 65%) [AR7] Off:$06860a (37/1/11:$00a) (user arg=0) ym7Pack4.bin 203764 125114 ( 61%) [AR7] Off:$08854c (49/1/02:$14c) (user arg=0) ym7Pack5.bin 174760 108342 ( 61%) [AR7] Off:$0a6e06 (60/1/05:$006) (user arg=0) ym7Pack6.bin 128800 66602 ( 51%) [AR7] Off:$0c153c (70/0/07:$13c) (user arg=0) CosoPackLz4.bin 218912 160446 ( 73%) [AR7] Off:$0d1966 (76/0/05:$166) (user arg=0) ERROR: Don't fit on the disk. I'm still looking for another packer that could fit AmigAtari on a floppy.... |
15 September 2021, 22:48 | #26 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
|
15 September 2021, 23:12 | #27 |
Registered User
Join Date: Apr 2013
Location: paris
Posts: 133
|
that's not that simple... one of the key feature of zx0 is that there is no entropy coding, so they can brute force all combine between litterals + pair/offset ( and it also supports string of litteral instead of 9bits per literral for standard LZxx )
If you add entropy coding, it become extremely hard to brute force the search space. Zx0 is a really powerfull packer for small files, small platforms. All the energy is spent at compression stage. Such a good packing ratio for such a simple depacker is beautifull |
15 September 2021, 23:33 | #28 | ||
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,655
|
Quote:
The image is bad because I see an axis starting at 0 bytes per "frame" and ending at 2500 bytes per "frame". And 0 bytes per anything is 0 bytes per second. Please don't put Nibbler on this image, and I've already stressed the importance of type of content before strictly committing to single cruncher, if ever. Quote:
Again as per my "loading scheme" paragraphs much can be done for presentation by staging the decompression and using the right tool for each stage. |
||
15 September 2021, 23:41 | #29 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
|
15 September 2021, 23:55 | #30 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,039
|
Easy, use LZMA only for CosoPack file. Add nice jingle when this file is depacked a few minutes and you can use other packer for rest files. Or use LZMA only for packed main (big) file (like in BC Kid 1 disk) and add jingle with nice music and text "Please wait loading and depacking." People can/must wait at begining only. Rest of files will be quickly depacked.
|
16 September 2021, 08:00 | #31 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Personnally i would reconsider using own encoded soundchip output rather than original music code and data. For example COSO/TFMX songs are quite small and you could reuse player code where appropriate.
|
16 September 2021, 10:43 | #32 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
@Thcm optimize can be a bit faster with this:
Code:
int elias_gamma_bits(int value) { #if defined __GNUC__ // written this way to cancel out a xor in __builtin_clz() return 1 + ((__builtin_clz(v)^(sizeof(int)*8-1))<<1); #elif defined _MSC_VER int bits; _BitScanReverse(&bits,value); // might need <intrin.h> return 1 + (bits<<1); #else int bits = 1; while (value > 1) { bits += 2; value >>= 1; } return bits; #endif } |
16 September 2021, 11:26 | #33 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,039
|
Or you can depack LZ4 packed files and use direct zx0 on these files. Some packers dont like to pack packed files and LZ4 seems to be average packer for me. Second option is to split big files on smallest parts like 30KB or 60KB and check if zx0 can pack these better. Anyway double packed files is never good option for me. Good packer must always pack better original (not packed) file, than packed with other packer already.
|
16 September 2021, 14:02 | #34 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
I see 3 possible improvements/optimizations that can be applied to the ZX0 68k depacker (apart from the obvious ones concerning micro-optimization), of course modifying the bitcoding structure.
One that I used in nrv2x, one that I used in aplibx and one that concerns the structure of the raw stream itself (it has a really cool feature, if I'm not wrong it's an 'even' encoding!*). Also the 'end token' would be better moved in the literal run stream decoding for two reasons: probably guarantees a less invasive check (but a bit more end bits?, but who cares) and can solve the >64k lrun, that is plain wrong in actual decoder. I think we can gain a decent decompression speed compared to the available code (which to tell the truth I haven't tried yet ) Tonight I'll work on it a bit. *probably in the 68k case an 'odd' encoding is better; this means that the 'startup' byte should be different (and a single bit gained) |
16 September 2021, 14:03 | #35 |
Registered User
Join Date: Apr 2013
Location: paris
Posts: 133
|
oh of course it would make data a lot smaller! but this is a totally different issue. I just used AmigAtari demo as a size benchmark. I could use "De Profundis" demo too, contains a lot of amiga demo data. ( but packed 2 disks with zx0 could take hours )
|
16 September 2021, 15:37 | #36 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
What I changed in nrv2b... inverted all match offset bits, and the rest was minor patching of bits and bytes at the end to reduce overrun and size a tiny bit. However, I didn't have any interests in in-place depacking, and I was happy with depacker size and speed so I didn't mess with bitcode any futher.
I see a not.b d0 (offset hi-byte) in depacker, plus lsl.w #8 so there is definitely a potential there :P. |
16 September 2021, 16:04 | #37 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Well, actually there is a big interest in in-place depacking (backward or forward ones).
For one simple reason: many coders don't want to know anything about offsets, safety bytes, pre-set buffers and arcane requirements. They want to allocate the original memory and tell the unpacker to do the job like a blackbox. Quote:
(and I'm not talking about micro-optimizations, which everyone can do). Quote:
|
||
16 September 2021, 20:44 | #38 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Ok, new provisional bitcode defined (EDIT: AZX0?, name to be defined ).
The unpacker code is 120 bytes with no subroutines (is unrolled as much as possible) and contains some accelerators, fully supports >16-bit offsets and protection for literal runs >64k. It does not yet contain the outline code for in-place decompression and some details. It works till now only in my head because I don't even encoded a synthetic stream for it to try (so for sure it will have to be fixed here and there ) Fortunately the stream is really simple so it didn't take me that long, I think I used all the tricks I could (not many because the code is too little..). If I have not made huge errors of concept (which could well be) in a short time I try to generate the new bitcode. I expect good speed. Last edited by ross; 16 September 2021 at 20:53. |
17 September 2021, 17:47 | #39 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
More code unrollolled, more accelerated paths, now I'm near 150 bytes.
Maybe time to stop, I'm at short branches limit. And I haven't tried with a synthetic stream yet, so nothing might work I now use much more the original bitcode so the conversion should be simplified. |
22 September 2021, 17:27 | #40 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Ok, now I've a working packer and unpacker for the 'new' ZX0 stream, optimized and friendly to 68k.
I've tested it with the github mentioned "cobra.scr" (a little file of 6912 bytes, packed to 2294) Impressive results (they went beyond my wildest expectations ): - original stream and original 68k decoder: 387600 cycles - 'new' stream and my optimized decoder: 275700 cycles This is a whopping 40% speed increase! Those who know about decompressors know it is a remarkable achievement. I would also have done an absolute speed comparison with other (de)packers, but as long as there is no support for large offsets it makes no sense. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Zip packer corrupting files ? | Retroplay | support.Apps | 13 | 23 July 2011 12:17 |
old soundeditors and pt-packer | Promax | request.Apps | 7 | 14 July 2010 13:21 |
Pierre Adane Packer | Muerto | request.Modules | 15 | 21 October 2009 18:03 |
Power Packer PP Files HELP | W4r3DeV1L | support.Apps | 2 | 30 September 2008 06:20 |
Cryptoburners graphics packer | Ziaxx | request.Apps | 1 | 06 March 2007 10:30 |
|
|