16 November 2014, 22:33 | #21 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
However, chipmem only, I did test ASM-One_V1.48 which is one that you did too. 'gzip -9' compresses it to 47.6% of its original size (288464 -> 137221). I unpack this at a rate of 27657 bytes/s ingest, or 57666 bytes/s output. My original test was the first 204480 bytes of New Zealand Story in ADF form. It packs better (204480 -> 68741; 33.6% of original size). I unpack this at a rate of 23868 bytes/s ingest, or 71000 bytes/s output. Decompression overhead compared with the simple byte-copy loop is 2.57x - 3.17x (c.w. your overhead of 1.38x, or LZX at 7.58x). So I don't look so bad after all. Beats everyone but you by a good margin, and 'gzip -9' compresses very well. I don't think Inflate is going to touch the speed of your unpack routine however! Last edited by Keir; 16 November 2014 at 23:59. |
|
16 November 2014, 23:24 | #22 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,671
|
Well, it's simply how much longer it takes to decompress, compared to a byte-copy loop. The reference to a byte-copy loop is my own, but it's a real one. This is what optimization of decompressors will approach. At least until someone writes a word or longword compressor.
38% overhead simply means it takes 1.38x the time to decompress - compared to a byte-copy loop of the original file. 658% means it takes 7.58x the time. Had a look now, and at first glance, I'm not sure how to provide the correct parameters, f.ex. it doesn't mention source and destination address, and I'm not sure what to provide as nodes[] etc. Is there a source that decompresses an example .gz file from RAM to RAM? |
16 November 2014, 23:49 | #23 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
@kaffer
Thanks a lot for contibuting your inflate routine here. It's very interesting for me, because I had to fight a lot with the zlib inflate code some time ago which I extracted from zlib.library 3.2 and then tried to optimize as good as possible for my icon.library. I never really understood how the huffman decoding works, but I could at least reduce the size of the zlib decoder by nearly 80 %. Compared to the ported and optimized inflate function from the zlib.library your code is even a lot shorter and consumes much less memory than my current inflate function. So, I replaced my routine with your code and checked it out by loading several PNG and OS4 iconsets. The good news: Your inflate code could successfully decode all the tested iconsets. And the clear structure and your comments may finally help me to understand the huffman decoding. The bad news: It is significantly slower than my current zlib decoder, and I really hoped and expected the opposite. (It needs 6.5 times as long, although the real bottleneck in my code is the color reduction). Using only the stack can also be a problem without MinStack or StackAttack, since StackCheck reported more than 5200 bytes used by the WB calling my icon.library. In order to use your code for a zlib stream, the zlib header has to be processed before calling your inflate function: Old inflate code removed. There is newer code at the end of my icon.library assembler source. Last edited by PeterK; 26 June 2020 at 13:10. |
16 November 2014, 23:55 | #24 | ||
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
Quote:
Code:
lea (output_buf),a4 lea (input_stream),a5 bsr inflate |
||
17 November 2014, 00:01 | #25 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,671
|
Quote:
No, I don't have a c compiler for Amiga, so a degzip.exe for 68000 would be helpful (well, required! ) for me to decompress any files then. |
|
17 November 2014, 00:07 | #26 | ||||
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
Quote:
Quote:
Quote:
|
||||
17 November 2014, 00:35 | #27 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
|
|
17 November 2014, 01:16 | #28 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
I tested your code under WinUAE with 68020 CPU, since I don't have a working real Amiga anymore.
Disabling the Inline-option makes no difference for the JIT compiler. The iconsets are usually a bunch of 50-200 PNG or OS4 icons, where each of their images has a compressed size of 3-10 kB. Attached is my zlib decoder stuff, although it's not cleaned up and prepared for a public release. So, you will find a lot of confusing comments from a coder who did not understand the code and also still some references to my icon.library (A5). Update: Used the timer device now to compare exclusively the inflate functions. The result was 45* (not 6.5*) But this huge difference may only occur for icon decoding. They make use of the big tables in my routine in contrary to standard gzip streams which don't use these tables. My code is not optimized for standard gzip decompression. I think, one problem is that you rebuild the tables for every function call again. That makes it slow. Old inflate code removed. There is newer code at the end of my icon.library assembler source. Last edited by PeterK; 26 June 2020 at 13:13. |
17 November 2014, 08:02 | #29 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
I could optimise that case by allowing the static-case tables to be pre-generated and set aside at start of day. I reckon I could have a go at beating your code because although it is unrolled and inlined up the wazoo and really quite slick, it is quite unnecessarily outputting data via a pointless 32kB window buffer (if I'm not misreading the code). I expect that's the bulk of your extra memory usage right there, plus you have an extra read and write of every output byte. The window buffer is only necessary if you are streaming output to disk or network (for example); no need if doing in-memory decompression. If you optimise out that intermediate buffer then there's not much fat to trim speed-wise and it's a very slick albeit ugly routine! |
|
17 November 2014, 08:18 | #30 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
Yes, I think you're right, the 32 kB window buffer is definitely not optimal. But since I only ported the code from zlib.library I just tried to optimize it as it was without changing the concept. That's because I still don't have a clear imagination of what's going on and no vision yet to make it better. But I will try again.
|
17 November 2014, 08:52 | #31 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
Maybe I am mad, but this kind of optimisation and algorithm work is quite fun! My plan is to allow pre-generation of the len/distance base+extrabits tables, and the static-huffman tables. Also I will look into optimisations for 68020+: is there anything to look for there apart from allowing unaligned memory accesses and using scaled-index addressing modes? I guess shifts are cheap(er) so less need to work to avoid them... |
|
17 November 2014, 17:50 | #32 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Okay I have implemented table pre-generation in the latest version of inflate.asm here https://raw.githubusercontent.com/ke...er/inflate.asm
To use it you need to:
I haven't yet done any 68020+ optimisations but I think there is only minor cycle shaving to do there; table pre-generation should make a really massive difference. |
17 November 2014, 20:17 | #33 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
Thank you, kaffer
Your new inflate function is already much faster. It needs only 60 % longer now, but it is still more than 2 kB shorter than mine and also saves a lot of memory, which both can be very important for standard low-end Amigas like the A500 or A600. The PNG and OS4 icons can be converted into the OS 3.5 format. The difference for the complete icon reading, uncompressing, color processing and rendering is only 7 %. |
17 November 2014, 20:33 | #34 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
Apart from that it might just be some optimisations with an eye to 68020+. I add bytes to the input-buffering register lazily, as shifting bytes up into the most-significant position is expensive on 68000, whereas prefetching up front for the next few code lookups probably makes sense on 68020, avoiding some tests+branches, and the longer shifts up to the far end of the shift register have no extra cost on 68020+. I'll have another browse of your code and see if there are any tricks to steal If you could send me a Deflate stream for one or two icons that might be handy, then I can examine their profile a bit and see where time might be being spent. |
|
17 November 2014, 20:49 | #35 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
Most likely I would use your code for the 68000 version of the icon.library, since the users of these systems will be happy about every kB of code less and the difference in memory consumption is even more than 40 kB, and that is really a lot on a 500 kB Amiga. I didn't check your code for possible optimizations yet. Maybe on one of the next days I will have a look.
|
17 November 2014, 22:07 | #36 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
|
|
17 November 2014, 22:12 | #37 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
Ok, two PNG icon streams are attached (only the inflate data, no zlib header and CRC). I hope that I grabbed the correct data bytes.
Thanks for your great work, kaffer !! |
17 November 2014, 22:43 | #38 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
BoingBall.dat is obviously made by a different and rather insane encoder; it contains nearly 100(!) sub-blocks, about 95% of which use the static huffman dictionaries. This would explain why generating those static table on-the-fly per-block sucked so bad -- the generating overhead was suffered approx per 50 bytes of input. I will hook this one into my test harness. |
|
18 November 2014, 16:34 | #39 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 682
|
Quote:
|
|
23 November 2014, 16:47 | #40 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,050
|
Quote:
lea (a1,d2.w),a3 with move.l A1,A3 add.w D2,A3 this is fastest or same for most 680x0 (except 68060), if I remember right. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Amiga formatted zip disks via USB zip drive in WinUAE? | planetidiot | New to Emulation or Amiga scene | 8 | 02 February 2018 08:43 |
System.zip / bad zip-file | Ztein | project.ClassicWB | 24 | 22 April 2012 02:14 |
Use of 4MB PCMCIA Fast Flash Memory as Fast RAM in A1200 | nkarytia | support.Hardware | 10 | 16 September 2011 13:37 |
Added SIMM to ZIP adapter, now 16MB Fast RAM | tonyyeb | support.Hardware | 18 | 01 September 2008 10:59 |
LZX unpacking??? | Medvind | support.Apps | 25 | 27 November 2002 12:33 |
|
|