Fast ZIP (Deflate) unpacking

Keir · 11 November 2014, 22:37

I have spent a few days implementing the standard 'Inflate' algorithm in 68000 asm. Deflate is the standard compression format embedded within ZIP/PKZIP, gzip and PNG formats. The benefits for me are that I can compress in my Linux cross-dev environment using 'gzip -9', which also has compression ratios typically a little better than the best regarded Amiga packers.

I made a lot of effort to make the code size small (e.g. it could be used in a bootblock loader) but also fast for larger files. The code can be as small as 638 bytes, or if configured for max speed is still a reasonable 930 bytes while unpacking at around 23-27kB/s (ingest rate of compressed data; output rate is in the region of 50-70kB/s) on an unexpanded A500 (which compares very favourably with good Amiga native pack formats). It is about fast enough to fully keep up with a track loader.

Anyway, public domain source is here:
https://raw.githubusercontent.com/ke...late/inflate.S (GAS+CPP syntax)
https://raw.githubusercontent.com/ke...te/inflate.asm (native Amiga asm syntax, thanks to phx for the conversion!).

Leffmann · 12 November 2014, 12:42

Thanks, I don't think there's any other open-source M68K implementation of Inflate anywhere, so it's great to have a solid reference available. (is it for vasm with standard syntax module, or GAS?)

phx · 12 November 2014, 16:23

vasm doesn't understand C preprocessor comments and #defines. But the source is short enough to convert it into standard Amiga assembler syntax.

Thanks for the effort! Maybe it allows me to switch from Bytekiller to zip in my game projects.

Keir · 12 November 2014, 21:11

Yeah it's CPP + GAS. I won't insult you guys by proffering unrequested advice, the conversion to a native syntax should be straightforward. GAS has a special syntax for branches which direct its built-in relaxer (it only shortens branches and the like when told to) -- Jcc and JBSR turn to shortest version of BCC.[bw] and BSR.[bw] that reaches the target.

Once you have it converted and building, you can optionally configure the four OPT_ options at the top of the file and then you 'bsr inflate' with arguments as described in the source.

But, any trouble or questions at all just post here or send me a message.

EDIT: I also plan to help out with inflate-in-place (or as close as possible). In many cases you should be able to stick the compressed image at the end of the buffer you will unpack to, and the output won't catch up til the end. Obviously that's not always the case but I have degzip.c in the same github folder as inflate.S which I use for stripping gzip headers and performing analytics. One analytic I plan to add is 'leeway' which is the number of bytes of overlap between input and output you can have, for this specific input stream, without having the output overrun the remaining input at any time. If you see what I mean

Should allow significant sharing of input and output buffer space in most cases.

Keir · 12 November 2014, 21:29

If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.

Lonewolf10 · 12 November 2014, 21:53

Quote:

Originally Posted by Leffmann

Thanks, I don't think there's any other open-source M68K implementation of Inflate anywhere, so it's great to have a solid reference available. (is it for vasm with standard syntax module, or GAS?)

I did write my own ASM for it last year, but didn't think my code worthy for general release. It's not that hard really, once you understand how the data is stored

(Compressed Image File Formats book by Miano is a great reference, combined with the official RFC1950 and RFC1951 docs)
Making the code compact and fast is the hard part - and that is where my code wouldn't rate highly right now.

Kaffer, does your code work with PNG files saved with Microsoft Paint? I find that my code sometimes works and sometimes falls over, but works perfectly for 16 colour PNG's saved with Personal Paint (from Windows XP, but running on Windows 2000!).

Edit: Great job on the coding Kaffer, much better than mine. It seems to be lacking the 4 filters though (applied to data before drawing image to screen for PNG images) - or is that PNG specific?

Keir · 12 November 2014, 22:23

My routine implements pure Inflate as specified in RFC 1951. These would commonly be embedded within a container format, e.g., my associated degzip.c does the trivial job of stripping/replacing gzip headers. The PNG format's filter stage would be implemented separately, they are not part of Inflate.

Lonewolf10 · 12 November 2014, 22:30

Ahh, ok.

My code was written specifically for PNG inflation, so I got a little confused. It's been over a year since I last looked at the RFC's.

phx · 14 November 2014, 07:45

Quote:

Originally Posted by kaffer

EDIT: I also plan to help out with inflate-in-place (or as close as possible).

That's certainly useful, as RAM in classic Amigas is always small.

Quote:

I have degzip.c in the same github folder as inflate.S which I use for stripping gzip headers and performing analytics.

Ah, yes, there's a header in gzipped files. That explains why I couldn't unzip such a file again with your inflate routine.

Are there example files for testing, without a header, anywhere?

Quote:

Originally Posted by kaffer

If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.

I did it. But it was not as easy as I hoped. The biggest problem was to convert all the digit labels, which are referenced with Nf and Nb, into normal local labels without introducing a bug. And preprocessor macros with an argument, like LOOKUP_BYTES(). I just made three equates from it.

The following source assembles fine with Devpac, AsmOne, PhxAss, vasm, Barfly, ... so it should be pretty portable. I'm not sure if it works, because I didn't have a test file at hand.

The four options can be overwritten on the command line by "-DOPT_xyz=n", for those assemblers which support that. With vasm I'm getting 646 bytes for the smallest code and 928 bytes for the most optimised.

Keir · 14 November 2014, 11:26

Quote:

Originally Posted by phx

Ah, yes, there's a header in gzipped files. That explains why I couldn't unzip such a file again with your inflate routine.

Are there example files for testing, without a header, anywhere?

Well you could compile and run it through degzip.c with a little modification (since it stamps in its own header at the moment, sizes and CRCs, for my test harness).
EDIT: No need to modify degzip.c any more, it will DTRT by default.

Quote:

I did it. But it was not as easy as I hoped. The biggest problem was to convert all the digit labels, which are referenced with Nf and Nb, into normal local labels without introducing a bug. And preprocessor macros with an argument, like LOOKUP_BYTES(). I just made three equates from it.

The following source assembles fine with Devpac, AsmOne, PhxAss, vasm, Barfly, ... so it should be pretty portable. I'm not sure if it works, because I didn't have a test file at hand.

Thank you very much!

I believe I have phxass installed in a UAE partition somewhere, so I will give it a whirl. I should be able to diff the binary against the GAS version in fact.

Quote:

The four options can be overwritten on the command line by "-DOPT_xyz=n", for those assemblers which support that. With vasm I'm getting 646 bytes for the smallest code and 928 bytes for the most optimised.

Perfect, and those are the correct sizes! (I tweaked the code a little bit more after my announcement).

I will be back with a thumbs up very soon...

Keir · 14 November 2014, 13:11

Thumbs up! The only difference in generated code is benign assembler interference (generating ADD vs ADDI).

I have put the code up in a new repository https://github.com/keirf/Amiga-Stuff

Let me know if you'd like a credit in the converted inflate.asm file!

If you want to actually use it, which I'm imagining you do, you will want to build the included degzip.c and run it like 'degzip in.gz out.raw' and then out.raw can be passed straight to inflate. degzip.c is GNU+Posix, but only lightly -- if you need to do any porting for your environment then please do send me the patches!

Keir · 14 November 2014, 15:32

I have added the abovementioned 'leeway' calculation to degzip.c, so that unpack can happen (almost) in place. How it works is: degzip prints a Leeway value (and writes it into its custom output header if you specify option -H). If you place the input stream at the far end of the output buffer *plus the leeway value* then inflate will work correctly without overwriting any input bytes before inflate has consumed them.

e.g., Deflate size is dflsz, Decompressed size is outsz. Then out = alloc(outsz + leeway), in = out + outsz + leeway - dflsz, read deflate stream to <in>, inflate(in, out).

leeway is typically very small (< 16 bytes). Nice

Photon · 14 November 2014, 22:40

Quote:

Originally Posted by kaffer

If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.

Just saw you mention 'native syntax' twice and wonder what is meant by it?

Reason for asking is I don't remember seeing instructions such as jra (for example) in docs by Motorola or Freescale, but I could be wrong?

Thorham · 14 November 2014, 22:55

Quote:

Originally Posted by Photon

Reason for asking is I don't remember seeing instructions such as jra (for example) in docs by Motorola or Freescale, but I could be wrong?

For some reason they changed it. Seems silly. It's the thought process of: 'Everyone knows what these instruction are called, so we're gonna call'm something else.'. Seems quite useless.

Keir · 14 November 2014, 23:13

Quote:

Originally Posted by Photon

Just saw you mention 'native syntax' twice and wonder what is meant by it?

Reason for asking is I don't remember seeing instructions such as jra (for example) in docs by Motorola or Freescale, but I could be wrong?

I mean the syntax broadly accepted by all the well-known Amiga assemblers. Not so much opcode mnemonics (because as you say they are standardised) as things like macro syntax, conditionals syntax, $ for hex numbers, rs.[bwl], etc etc. All these things differ in a GAS+CPP build env.

And yes jra etc are non-standard, they address GAS's built-in relaxer which converts them to e.g., bra.[bw] as appropriate for the distance to the target.

Photon · 14 November 2014, 23:52

Thanks for the quick response

Yes, ofc it's a dynamic jcc, good to have as an assembler/compiler directive. So converting from GAS syntax to 68k assembler.

Since the thread is about decompression speed, I compared yours to mine (oh no!!

)

I compressed DPaint 3.25. Gzip -9 on Windows makes it 150559 bytes vs. 154788 for lha and 157156 for Nibbler 1.02. Your decompression speed is 5K/s slower than the best legacy decompressors (none of which reach trackloading speeds).

And Nibbler 1.02 is 6.55x faster than yours. Some optimizing still left to do I think! Muhahah...

Keir · 15 November 2014, 00:31

Quote:

Originally Posted by Photon

Since the thread is about decompression speed, I compared yours to mine (oh no!!

)

Lol yeah okay it is a bit about compression speed

Quote:

I compressed DPaint 3.25. Gzip -9 on Windows makes it 150559 bytes vs. 154788 for lha and 157156 for Nibbler 1.02. Your decompression speed is 5K/s slower than the best legacy decompressors (none of which reach trackloading speeds).

I'll have to think how I could get closer to LZX and Crunchmania. It's hard because Deflate is a little-endian bit stream and the required shifting ops suck hugely on a 68000. I may need to profile a bit more closely.... Hmmm also I only tested on UAE so far. I don't know how accurate its cycle-exact is against wallclock time. Maybe I should test on my real 500.
EDIT: Tested on a real unexpanded A600, and confirms the unpacking rate of ~23kB/s (23kB/s of input stream, that is).

Quote:

And Nibbler 1.02 is 6.55x faster than yours. Some optimizing still left to do I think! Muhahah...

That's a very awesome unpack speed!

EDIT: The results page states 68000 for the speed numbers, but no other test setup details. Is it a bog standard unexpanded 500, no fast ram? And are your stated speeds the rate at which you process the packed data, or the rate that the unpackers produce decompressed data?

Don_Adan · 16 November 2014, 16:27

A few optmised version.

Photon · 16 November 2014, 18:00

Quote:

Originally Posted by kaffer

EDIT: The results page states 68000 for the speed numbers, but no other test setup details. Is it a bog standard unexpanded 500, no fast ram? And are your stated speeds the rate at which you process the packed data, or the rate that the unpackers produce decompressed data?

Decompression speed means output rate, of course. In general, you have a bunch of files totalling n bytes. For either compression or decompression you're really only interested in how fast it compresses or decompresses the files, i.e. those n bytes. With the optimized version and measuring the output rate you will probably get a higher figure then!

Speed tests were run for the selected crunchers on 8 exe, graphics, and music files larger than 80KB. My timing routine was 1/100s precision, so the size was to avoid the rounding affecting results. For LZX I used an optimized source-code which may account for its nice result. In all cases, the depacker decompressed from fastram to fastram, with code running in fastram. Except in the case of LHA, where I didn't have the source and unpacked from file to RAM: with lots of buffers, which may account for its low score.

Measurements were done on my A500, GVP HD8+ SCSI interface with 2.3MB/s DMA and 8MB Zorro (24-bit) fastram.

With fastmem disabled, in the example of DPaint 3.25 (277616 bytes), the decompression speed of Nibbler from slowmem to slowmem on A500 gave a speed of 132198 bytes/s.

Basically, the speed is throttled by the speed at which new bytes can be written to RAM.

In fact, you can compare a decompressor with a naïve (non-unraveled) byte copy loop to get a figure to put on "decompression code overhead". Copying DPaint from slowmem to slowmem gave a rate of 182642 bytes/s, making Nibbler decompress 38% slower than this byte-copy loop, i.e. 38% decompression code overhead. On CPUs with instruction caches this figure would drop a little, but still be throttled by how well burst writes help with the byte output stream write to RAM.

A quick calculation puts LZX (the best competitor) at 658% decompression code overhead, so ...mmm... that chart still looks mighty fine to me!

Keir · 16 November 2014, 18:13

Quote:

Originally Posted by Don_Adan

A few optmised version.

Thanks, I'd spotted the LEA->ADD.w optimisations, but shortening the exit branch is nice to have too, shaves a few more cycles!

11 November 2014, 22:37	#1
Keir Registered User Join Date: May 2011 Location: Cambridge Posts: 682	Fast ZIP (Deflate) unpacking I have spent a few days implementing the standard 'Inflate' algorithm in 68000 asm. Deflate is the standard compression format embedded within ZIP/PKZIP, gzip and PNG formats. The benefits for me are that I can compress in my Linux cross-dev environment using 'gzip -9', which also has compression ratios typically a little better than the best regarded Amiga packers. I made a lot of effort to make the code size small (e.g. it could be used in a bootblock loader) but also fast for larger files. The code can be as small as 638 bytes, or if configured for max speed is still a reasonable 930 bytes while unpacking at around 23-27kB/s (ingest rate of compressed data; output rate is in the region of 50-70kB/s) on an unexpanded A500 (which compares very favourably with good Amiga native pack formats). It is about fast enough to fully keep up with a track loader. Anyway, public domain source is here: https://raw.githubusercontent.com/ke...late/inflate.S (GAS+CPP syntax) https://raw.githubusercontent.com/ke...te/inflate.asm (native Amiga asm syntax, thanks to phx for the conversion!). Last edited by Keir; 01 February 2019 at 13:19.

14 November 2014, 15:32	#12
Keir Registered User Join Date: May 2011 Location: Cambridge Posts: 682	I have added the abovementioned 'leeway' calculation to degzip.c, so that unpack can happen (almost) in place. How it works is: degzip prints a Leeway value (and writes it into its custom output header if you specify option -H). If you place the input stream at the far end of the output buffer plus the leeway value then inflate will work correctly without overwriting any input bytes before inflate has consumed them. e.g., Deflate size is dflsz, Decompressed size is outsz. Then out = alloc(outsz + leeway), in = out + outsz + leeway - dflsz, read deflate stream to <in>, inflate(in, out). leeway is typically very small (< 16 bytes). Nice Last edited by Keir; 14 November 2014 at 17:01.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Amiga formatted zip disks via USB zip drive in WinUAE?	planetidiot	New to Emulation or Amiga scene	8	02 February 2018 08:43
System.zip / bad zip-file	Ztein	project.ClassicWB	24	22 April 2012 02:14
Use of 4MB PCMCIA Fast Flash Memory as Fast RAM in A1200	nkarytia	support.Hardware	10	16 September 2011 13:37
Added SIMM to ZIP adapter, now 16MB Fast RAM	tonyyeb	support.Hardware	18	01 September 2008 10:59
LZX unpacking???	Medvind	support.Apps	25	27 November 2002 12:33

12 November 2014, 12:42	#2
Leffmann Join Date: Jul 2008 Location: Sweden Posts: 2,269	Thanks, I don't think there's any other open-source M68K implementation of Inflate anywhere, so it's great to have a solid reference available. (is it for vasm with standard syntax module, or GAS?)

12 November 2014, 16:23	#3
phx Natteravn Join Date: Nov 2009 Location: Herford / Germany Posts: 2,496	vasm doesn't understand C preprocessor comments and #defines. But the source is short enough to convert it into standard Amiga assembler syntax. Thanks for the effort! Maybe it allows me to switch from Bytekiller to zip in my game projects.

12 November 2014, 21:11	#4
Keir Registered User Join Date: May 2011 Location: Cambridge Posts: 682	Yeah it's CPP + GAS. I won't insult you guys by proffering unrequested advice, the conversion to a native syntax should be straightforward. GAS has a special syntax for branches which direct its built-in relaxer (it only shortens branches and the like when told to) -- Jcc and JBSR turn to shortest version of BCC.[bw] and BSR.[bw] that reaches the target. Once you have it converted and building, you can optionally configure the four OPT_ options at the top of the file and then you 'bsr inflate' with arguments as described in the source. But, any trouble or questions at all just post here or send me a message. EDIT: I also plan to help out with inflate-in-place (or as close as possible). In many cases you should be able to stick the compressed image at the end of the buffer you will unpack to, and the output won't catch up til the end. Obviously that's not always the case but I have degzip.c in the same github folder as inflate.S which I use for stripping gzip headers and performing analytics. One analytic I plan to add is 'leeway' which is the number of bytes of overlap between input and output you can have, for this specific input stream, without having the output overrun the remaining input at any time. If you see what I mean Should allow significant sharing of input and output buffer space in most cases.

12 November 2014, 21:29	#5
Keir Registered User Join Date: May 2011 Location: Cambridge Posts: 682	If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.

12 November 2014, 22:23	#7
Keir Registered User Join Date: May 2011 Location: Cambridge Posts: 682	My routine implements pure Inflate as specified in RFC 1951. These would commonly be embedded within a container format, e.g., my associated degzip.c does the trivial job of stripping/replacing gzip headers. The PNG format's filter stage would be implemented separately, they are not part of Inflate.

12 November 2014, 22:30	#8
Lonewolf10 AMOS Extensions Developer Join Date: Jun 2007 Location: near Cambridge, UK Age: 44 Posts: 1,924	Ahh, ok. My code was written specifically for PNG inflation, so I got a little confused. It's been over a year since I last looked at the RFC's.

14 November 2014, 13:11	#11
Keir Registered User Join Date: May 2011 Location: Cambridge Posts: 682	Thumbs up! The only difference in generated code is benign assembler interference (generating ADD vs ADDI). I have put the code up in a new repository https://github.com/keirf/Amiga-Stuff Let me know if you'd like a credit in the converted inflate.asm file! If you want to actually use it, which I'm imagining you do, you will want to build the included degzip.c and run it like 'degzip in.gz out.raw' and then out.raw can be passed straight to inflate. degzip.c is GNU+Posix, but only lightly -- if you need to do any porting for your environment then please do send me the patches!

14 November 2014, 23:52	#16
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,602	Thanks for the quick response Yes, ofc it's a dynamic jcc, good to have as an assembler/compiler directive. So converting from GAS syntax to 68k assembler. Since the thread is about decompression speed, I compared yours to mine (oh no!! ) I compressed DPaint 3.25. Gzip -9 on Windows makes it 150559 bytes vs. 154788 for lha and 157156 for Nibbler 1.02. Your decompression speed is 5K/s slower than the best legacy decompressors (none of which reach trackloading speeds). And Nibbler 1.02 is 6.55x faster than yours. Some optimizing still left to do I think! Muhahah...

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)