English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General > Coders. Releases

 
 
Thread Tools
Old 11 November 2014, 23:37   #1
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
Fast ZIP (Deflate) unpacking

I have spent a few days implementing the standard 'Inflate' algorithm in 68000 asm. Deflate is the standard compression format embedded within ZIP/PKZIP, gzip and PNG formats. The benefits for me are that I can compress in my Linux cross-dev environment using 'gzip -9', which also has compression ratios typically a little better than the best regarded Amiga packers.

I made a lot of effort to make the code size small (e.g. it could be used in a bootblock loader) but also fast for larger files. The code can be as small as 638 bytes, or if configured for max speed is still a reasonable 930 bytes while unpacking at around 23-27kB/s (ingest rate of compressed data; output rate is in the region of 50-70kB/s) on an unexpanded A500 (which compares very favourably with good Amiga native pack formats). It is about fast enough to fully keep up with a track loader.

Anyway, public domain source is here:
https://raw.githubusercontent.com/ke...ster/inflate.S (GAS+CPP syntax)
https://raw.githubusercontent.com/ke...er/inflate.asm (native Amiga asm syntax, thanks to phx for the conversion!).

Last edited by kaffer; 16 November 2014 at 23:42.
kaffer is offline  
AdSense AdSense  
Old 12 November 2014, 13:42   #2
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,023
Thanks, I don't think there's any other open-source M68K implementation of Inflate anywhere, so it's great to have a solid reference available. (is it for vasm with standard syntax module, or GAS?)
Leffmann is offline  
Old 12 November 2014, 17:23   #3
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,004
vasm doesn't understand C preprocessor comments and #defines. But the source is short enough to convert it into standard Amiga assembler syntax.

Thanks for the effort! Maybe it allows me to switch from Bytekiller to zip in my game projects.
phx is offline  
Old 12 November 2014, 22:11   #4
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
Yeah it's CPP + GAS. I won't insult you guys by proffering unrequested advice, the conversion to a native syntax should be straightforward. GAS has a special syntax for branches which direct its built-in relaxer (it only shortens branches and the like when told to) -- Jcc and JBSR turn to shortest version of BCC.[bw] and BSR.[bw] that reaches the target.

Once you have it converted and building, you can optionally configure the four OPT_ options at the top of the file and then you 'bsr inflate' with arguments as described in the source.

But, any trouble or questions at all just post here or send me a message.

EDIT: I also plan to help out with inflate-in-place (or as close as possible). In many cases you should be able to stick the compressed image at the end of the buffer you will unpack to, and the output won't catch up til the end. Obviously that's not always the case but I have degzip.c in the same github folder as inflate.S which I use for stripping gzip headers and performing analytics. One analytic I plan to add is 'leeway' which is the number of bytes of overlap between input and output you can have, for this specific input stream, without having the output overrun the remaining input at any time. If you see what I mean Should allow significant sharing of input and output buffer space in most cases.
kaffer is offline  
Old 12 November 2014, 22:29   #5
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.
kaffer is offline  
Old 12 November 2014, 22:53   #6
Lonewolf10
AMOS Extensions Developer
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 38
Posts: 1,917
Quote:
Originally Posted by Leffmann View Post
Thanks, I don't think there's any other open-source M68K implementation of Inflate anywhere, so it's great to have a solid reference available. (is it for vasm with standard syntax module, or GAS?)
I did write my own ASM for it last year, but didn't think my code worthy for general release. It's not that hard really, once you understand how the data is stored (Compressed Image File Formats book by Miano is a great reference, combined with the official RFC1950 and RFC1951 docs)
Making the code compact and fast is the hard part - and that is where my code wouldn't rate highly right now.


Kaffer, does your code work with PNG files saved with Microsoft Paint? I find that my code sometimes works and sometimes falls over, but works perfectly for 16 colour PNG's saved with Personal Paint (from Windows XP, but running on Windows 2000!).

Edit: Great job on the coding Kaffer, much better than mine. It seems to be lacking the 4 filters though (applied to data before drawing image to screen for PNG images) - or is that PNG specific?

Last edited by Lonewolf10; 12 November 2014 at 23:15.
Lonewolf10 is offline  
Old 12 November 2014, 23:23   #7
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
My routine implements pure Inflate as specified in RFC 1951. These would commonly be embedded within a container format, e.g., my associated degzip.c does the trivial job of stripping/replacing gzip headers. The PNG format's filter stage would be implemented separately, they are not part of Inflate.
kaffer is offline  
Old 12 November 2014, 23:30   #8
Lonewolf10
AMOS Extensions Developer
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 38
Posts: 1,917
Ahh, ok.

My code was written specifically for PNG inflation, so I got a little confused. It's been over a year since I last looked at the RFC's.
Lonewolf10 is offline  
Old 14 November 2014, 08:45   #9
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,004
Quote:
Originally Posted by kaffer View Post
EDIT: I also plan to help out with inflate-in-place (or as close as possible).
That's certainly useful, as RAM in classic Amigas is always small.

Quote:
I have degzip.c in the same github folder as inflate.S which I use for stripping gzip headers and performing analytics.
Ah, yes, there's a header in gzipped files. That explains why I couldn't unzip such a file again with your inflate routine.
Are there example files for testing, without a header, anywhere?

Quote:
Originally Posted by kaffer View Post
If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.
I did it. But it was not as easy as I hoped. The biggest problem was to convert all the digit labels, which are referenced with Nf and Nb, into normal local labels without introducing a bug. And preprocessor macros with an argument, like LOOKUP_BYTES(). I just made three equates from it.

The following source assembles fine with Devpac, AsmOne, PhxAss, vasm, Barfly, ... so it should be pretty portable. I'm not sure if it works, because I didn't have a test file at hand.

The four options can be overwritten on the command line by "-DOPT_xyz=n", for those assemblers which support that. With vasm I'm getting 646 bytes for the smallest code and 928 bytes for the most optimised.
Attached Files
File Type: lha inflate_src.lha (5.7 KB, 108 views)
phx is offline  
Old 14 November 2014, 12:26   #10
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
Quote:
Originally Posted by phx View Post
Ah, yes, there's a header in gzipped files. That explains why I couldn't unzip such a file again with your inflate routine.
Are there example files for testing, without a header, anywhere?
Well you could compile and run it through degzip.c with a little modification (since it stamps in its own header at the moment, sizes and CRCs, for my test harness).
EDIT: No need to modify degzip.c any more, it will DTRT by default.

Quote:
I did it. But it was not as easy as I hoped. The biggest problem was to convert all the digit labels, which are referenced with Nf and Nb, into normal local labels without introducing a bug. And preprocessor macros with an argument, like LOOKUP_BYTES(). I just made three equates from it.

The following source assembles fine with Devpac, AsmOne, PhxAss, vasm, Barfly, ... so it should be pretty portable. I'm not sure if it works, because I didn't have a test file at hand.
Thank you very much! I believe I have phxass installed in a UAE partition somewhere, so I will give it a whirl. I should be able to diff the binary against the GAS version in fact.

Quote:
The four options can be overwritten on the command line by "-DOPT_xyz=n", for those assemblers which support that. With vasm I'm getting 646 bytes for the smallest code and 928 bytes for the most optimised.
Perfect, and those are the correct sizes! (I tweaked the code a little bit more after my announcement).

I will be back with a thumbs up very soon...

Last edited by kaffer; 14 November 2014 at 18:44.
kaffer is offline  
Old 14 November 2014, 14:11   #11
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
Thumbs up! The only difference in generated code is benign assembler interference (generating ADD vs ADDI).

I have put the code up in a new repository https://github.com/keirf/Amiga-Stuff

Let me know if you'd like a credit in the converted inflate.asm file!

If you want to actually use it, which I'm imagining you do, you will want to build the included degzip.c and run it like 'degzip in.gz out.raw' and then out.raw can be passed straight to inflate. degzip.c is GNU+Posix, but only lightly -- if you need to do any porting for your environment then please do send me the patches!
kaffer is offline  
Old 14 November 2014, 16:32   #12
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
I have added the abovementioned 'leeway' calculation to degzip.c, so that unpack can happen (almost) in place. How it works is: degzip prints a Leeway value (and writes it into its custom output header if you specify option -H). If you place the input stream at the far end of the output buffer *plus the leeway value* then inflate will work correctly without overwriting any input bytes before inflate has consumed them.

e.g., Deflate size is dflsz, Decompressed size is outsz. Then out = alloc(outsz + leeway), in = out + outsz + leeway - dflsz, read deflate stream to <in>, inflate(in, out).

leeway is typically very small (< 16 bytes). Nice

Last edited by kaffer; 14 November 2014 at 18:01.
kaffer is offline  
Old 14 November 2014, 23:40   #13
Photon
Moderator
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Posts: 4,451
Quote:
Originally Posted by kaffer View Post
If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.
Just saw you mention 'native syntax' twice and wonder what is meant by it?

Reason for asking is I don't remember seeing instructions such as jra (for example) in docs by Motorola or Freescale, but I could be wrong?
Photon is offline  
Old 14 November 2014, 23:55   #14
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,964
Quote:
Originally Posted by Photon View Post
Reason for asking is I don't remember seeing instructions such as jra (for example) in docs by Motorola or Freescale, but I could be wrong?
For some reason they changed it. Seems silly. It's the thought process of: 'Everyone knows what these instruction are called, so we're gonna call'm something else.'. Seems quite useless.
Thorham is offline  
Old 15 November 2014, 00:13   #15
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
Quote:
Originally Posted by Photon View Post
Just saw you mention 'native syntax' twice and wonder what is meant by it?

Reason for asking is I don't remember seeing instructions such as jra (for example) in docs by Motorola or Freescale, but I could be wrong?
I mean the syntax broadly accepted by all the well-known Amiga assemblers. Not so much opcode mnemonics (because as you say they are standardised) as things like macro syntax, conditionals syntax, $ for hex numbers, rs.[bwl], etc etc. All these things differ in a GAS+CPP build env.

And yes jra etc are non-standard, they address GAS's built-in relaxer which converts them to e.g., bra.[bw] as appropriate for the distance to the target.
kaffer is offline  
Old 15 November 2014, 00:52   #16
Photon
Moderator
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Posts: 4,451
Thanks for the quick response Yes, ofc it's a dynamic jcc, good to have as an assembler/compiler directive. So converting from GAS syntax to 68k assembler.

Since the thread is about decompression speed, I compared yours to mine (oh no!! )

I compressed DPaint 3.25. Gzip -9 on Windows makes it 150559 bytes vs. 154788 for lha and 157156 for Nibbler 1.02. Your decompression speed is 5K/s slower than the best legacy decompressors (none of which reach trackloading speeds).

And Nibbler 1.02 is 6.55x faster than yours. Some optimizing still left to do I think! Muhahah...
Photon is offline  
Old 15 November 2014, 01:31   #17
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
Quote:
Originally Posted by Photon View Post
Since the thread is about decompression speed, I compared yours to mine (oh no!! )
Lol yeah okay it is a bit about compression speed

Quote:
I compressed DPaint 3.25. Gzip -9 on Windows makes it 150559 bytes vs. 154788 for lha and 157156 for Nibbler 1.02. Your decompression speed is 5K/s slower than the best legacy decompressors (none of which reach trackloading speeds).
I'll have to think how I could get closer to LZX and Crunchmania. It's hard because Deflate is a little-endian bit stream and the required shifting ops suck hugely on a 68000. I may need to profile a bit more closely.... Hmmm also I only tested on UAE so far. I don't know how accurate its cycle-exact is against wallclock time. Maybe I should test on my real 500.
EDIT: Tested on a real unexpanded A600, and confirms the unpacking rate of ~23kB/s (23kB/s of input stream, that is).

Quote:
And Nibbler 1.02 is 6.55x faster than yours. Some optimizing still left to do I think! Muhahah...
That's a very awesome unpack speed!

EDIT: The results page states 68000 for the speed numbers, but no other test setup details. Is it a bog standard unexpanded 500, no fast ram? And are your stated speeds the rate at which you process the packed data, or the rate that the unpackers produce decompressed data?

Last edited by kaffer; 15 November 2014 at 14:33.
kaffer is offline  
Old 16 November 2014, 17:27   #18
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 49
Posts: 918
A few optmised version.
Attached Files
File Type: lzx deflo.lzx (7.2 KB, 91 views)
Don_Adan is offline  
Old 16 November 2014, 19:00   #19
Photon
Moderator
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Posts: 4,451
Quote:
Originally Posted by kaffer View Post
EDIT: The results page states 68000 for the speed numbers, but no other test setup details. Is it a bog standard unexpanded 500, no fast ram? And are your stated speeds the rate at which you process the packed data, or the rate that the unpackers produce decompressed data?
Decompression speed means output rate, of course. In general, you have a bunch of files totalling n bytes. For either compression or decompression you're really only interested in how fast it compresses or decompresses the files, i.e. those n bytes. With the optimized version and measuring the output rate you will probably get a higher figure then!

Speed tests were run for the selected crunchers on 8 exe, graphics, and music files larger than 80KB. My timing routine was 1/100s precision, so the size was to avoid the rounding affecting results. For LZX I used an optimized source-code which may account for its nice result. In all cases, the depacker decompressed from fastram to fastram, with code running in fastram. Except in the case of LHA, where I didn't have the source and unpacked from file to RAM: with lots of buffers, which may account for its low score.

Measurements were done on my A500, GVP HD8+ SCSI interface with 2.3MB/s DMA and 8MB Zorro (24-bit) fastram.

With fastmem disabled, in the example of DPaint 3.25 (277616 bytes), the decompression speed of Nibbler from slowmem to slowmem on A500 gave a speed of 132198 bytes/s.

Basically, the speed is throttled by the speed at which new bytes can be written to RAM.

In fact, you can compare a decompressor with a nave (non-unraveled) byte copy loop to get a figure to put on "decompression code overhead". Copying DPaint from slowmem to slowmem gave a rate of 182642 bytes/s, making Nibbler decompress 38% slower than this byte-copy loop, i.e. 38% decompression code overhead. On CPUs with instruction caches this figure would drop a little, but still be throttled by how well burst writes help with the byte output stream write to RAM.

A quick calculation puts LZX (the best competitor) at 658% decompression code overhead, so ...mmm... that chart still looks mighty fine to me!

Last edited by Photon; 16 November 2014 at 19:13.
Photon is offline  
Old 16 November 2014, 19:13   #20
kaffer
Registered User
 
Join Date: May 2011
Location: Cambridge
Posts: 428
Quote:
Originally Posted by Don_Adan View Post
A few optmised version.
Thanks, I'd spotted the LEA->ADD.w optimisations, but shortening the exit branch is nice to have too, shaves a few more cycles!
kaffer is offline  
AdSense AdSense  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
System.zip / bad zip-file Ztein project.ClassicWB 24 22 April 2012 03:14
Use of 4MB PCMCIA Fast Flash Memory as Fast RAM in A1200 nkarytia support.Hardware 10 16 September 2011 14:37
Added SIMM to ZIP adapter, now 16MB Fast RAM tonyyeb support.Hardware 18 01 September 2008 11:59
Amiga formatted zip disks via USB zip drive in WinUAE? planetidiot New to Emulation or Amiga scene 5 12 October 2007 16:18
LZX unpacking??? Medvind support.Apps 25 27 November 2002 13:33

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 00:29.


Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Page generated in 0.26139 seconds with 12 queries