![]() |
![]() |
#1 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
Fast ZIP (Deflate) unpacking
I have spent a few days implementing the standard 'Inflate' algorithm in 68000 asm. Deflate is the standard compression format embedded within ZIP/PKZIP, gzip and PNG formats. The benefits for me are that I can compress in my Linux cross-dev environment using 'gzip -9', which also has compression ratios typically a little better than the best regarded Amiga packers.
I made a lot of effort to make the code size small (e.g. it could be used in a bootblock loader) but also fast for larger files. The code can be as small as 638 bytes, or if configured for max speed is still a reasonable 930 bytes while unpacking at around 23-27kB/s (ingest rate of compressed data; output rate is in the region of 50-70kB/s) on an unexpanded A500 (which compares very favourably with good Amiga native pack formats). It is about fast enough to fully keep up with a track loader. Anyway, public domain source is here: https://raw.githubusercontent.com/ke...late/inflate.S (GAS+CPP syntax) https://raw.githubusercontent.com/ke...te/inflate.asm (native Amiga asm syntax, thanks to phx for the conversion!). Last edited by Keir; 01 February 2019 at 13:19. |
![]() |
![]() |
#2 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,268
|
Thanks, I don't think there's any other open-source M68K implementation of Inflate anywhere, so it's great to have a solid reference available. (is it for vasm with standard syntax module, or GAS?)
|
![]() |
![]() |
#3 |
Natteravn
![]() Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,347
|
vasm doesn't understand C preprocessor comments and #defines. But the source is short enough to convert it into standard Amiga assembler syntax.
Thanks for the effort! Maybe it allows me to switch from Bytekiller to zip in my game projects. ![]() |
![]() |
![]() |
#4 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
Yeah it's CPP + GAS. I won't insult you guys by proffering unrequested advice, the conversion to a native syntax should be straightforward. GAS has a special syntax for branches which direct its built-in relaxer (it only shortens branches and the like when told to) -- Jcc and JBSR turn to shortest version of BCC.[bw] and BSR.[bw] that reaches the target.
Once you have it converted and building, you can optionally configure the four OPT_ options at the top of the file and then you 'bsr inflate' with arguments as described in the source. But, any trouble or questions at all just post here or send me a message. EDIT: I also plan to help out with inflate-in-place (or as close as possible). In many cases you should be able to stick the compressed image at the end of the buffer you will unpack to, and the output won't catch up til the end. Obviously that's not always the case but I have degzip.c in the same github folder as inflate.S which I use for stripping gzip headers and performing analytics. One analytic I plan to add is 'leeway' which is the number of bytes of overlap between input and output you can have, for this specific input stream, without having the output overrun the remaining input at any time. If you see what I mean ![]() |
![]() |
![]() |
#5 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
If anyone does the conversion to native syntax please contribute it back and I will make a proper repository for all these loader and tools bits and pieces.
|
![]() |
![]() |
#6 | |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 43
Posts: 1,924
|
Quote:
![]() Making the code compact and fast is the hard part - and that is where my code wouldn't rate highly right now. Kaffer, does your code work with PNG files saved with Microsoft Paint? I find that my code sometimes works and sometimes falls over, but works perfectly for 16 colour PNG's saved with Personal Paint (from Windows XP, but running on Windows 2000!). Edit: Great job on the coding Kaffer, much better than mine. It seems to be lacking the 4 filters though (applied to data before drawing image to screen for PNG images) - or is that PNG specific? Last edited by Lonewolf10; 12 November 2014 at 22:15. |
|
![]() |
![]() |
#7 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
My routine implements pure Inflate as specified in RFC 1951. These would commonly be embedded within a container format, e.g., my associated degzip.c does the trivial job of stripping/replacing gzip headers. The PNG format's filter stage would be implemented separately, they are not part of Inflate.
|
![]() |
![]() |
#8 |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 43
Posts: 1,924
|
Ahh, ok.
My code was written specifically for PNG inflation, so I got a little confused. It's been over a year since I last looked at the RFC's. |
![]() |
![]() |
#9 | |||
Natteravn
![]() Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,347
|
Quote:
Quote:
![]() Are there example files for testing, without a header, anywhere? Quote:
The following source assembles fine with Devpac, AsmOne, PhxAss, vasm, Barfly, ... so it should be pretty portable. I'm not sure if it works, because I didn't have a test file at hand. The four options can be overwritten on the command line by "-DOPT_xyz=n", for those assemblers which support that. With vasm I'm getting 646 bytes for the smallest code and 928 bytes for the most optimised. |
|||
![]() |
![]() |
#10 | |||
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
Quote:
EDIT: No need to modify degzip.c any more, it will DTRT by default. Quote:
![]() Quote:
I will be back with a thumbs up very soon... Last edited by Keir; 14 November 2014 at 17:44. |
|||
![]() |
![]() |
#11 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
Thumbs up! The only difference in generated code is benign assembler interference (generating ADD vs ADDI).
I have put the code up in a new repository https://github.com/keirf/Amiga-Stuff Let me know if you'd like a credit in the converted inflate.asm file! ![]() If you want to actually use it, which I'm imagining you do, you will want to build the included degzip.c and run it like 'degzip in.gz out.raw' and then out.raw can be passed straight to inflate. degzip.c is GNU+Posix, but only lightly -- if you need to do any porting for your environment then please do send me the patches! |
![]() |
![]() |
#12 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
I have added the abovementioned 'leeway' calculation to degzip.c, so that unpack can happen (almost) in place. How it works is: degzip prints a Leeway value (and writes it into its custom output header if you specify option -H). If you place the input stream at the far end of the output buffer *plus the leeway value* then inflate will work correctly without overwriting any input bytes before inflate has consumed them.
e.g., Deflate size is dflsz, Decompressed size is outsz. Then out = alloc(outsz + leeway), in = out + outsz + leeway - dflsz, read deflate stream to <in>, inflate(in, out). leeway is typically very small (< 16 bytes). Nice ![]() Last edited by Keir; 14 November 2014 at 17:01. |
![]() |
![]() |
#13 | |
Moderator
![]() Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,374
|
Quote:
Reason for asking is I don't remember seeing instructions such as jra (for example) in docs by Motorola or Freescale, but I could be wrong? |
|
![]() |
![]() |
#14 |
Computer Nerd
![]() Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 46
Posts: 3,412
|
For some reason they changed it. Seems silly. It's the thought process of: 'Everyone knows what these instruction are called, so we're gonna call'm something else.'. Seems quite useless.
|
![]() |
![]() |
#15 | |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
Quote:
And yes jra etc are non-standard, they address GAS's built-in relaxer which converts them to e.g., bra.[bw] as appropriate for the distance to the target. |
|
![]() |
![]() |
#16 |
Moderator
![]() Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,374
|
Thanks for the quick response
![]() ![]() Since the thread is about decompression speed, I compared yours to mine (oh no!! ![]() I compressed DPaint 3.25. Gzip -9 on Windows makes it 150559 bytes vs. 154788 for lha and 157156 for Nibbler 1.02. Your decompression speed is 5K/s slower than the best legacy decompressors (none of which reach trackloading speeds). And Nibbler 1.02 is 6.55x faster than yours. Some optimizing still left to do I think! Muhahah... ![]() |
![]() |
![]() |
#17 | |||
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
Quote:
![]() Quote:
EDIT: Tested on a real unexpanded A600, and confirms the unpacking rate of ~23kB/s (23kB/s of input stream, that is). Quote:
![]() EDIT: The results page states 68000 for the speed numbers, but no other test setup details. Is it a bog standard unexpanded 500, no fast ram? And are your stated speeds the rate at which you process the packed data, or the rate that the unpackers produce decompressed data? Last edited by Keir; 15 November 2014 at 13:33. |
|||
![]() |
![]() |
#18 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 54
Posts: 1,580
|
A few optmised version.
|
![]() |
![]() |
#19 | |
Moderator
![]() Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,374
|
Quote:
![]() Speed tests were run for the selected crunchers on 8 exe, graphics, and music files larger than 80KB. My timing routine was 1/100s precision, so the size was to avoid the rounding affecting results. For LZX I used an optimized source-code which may account for its nice result. In all cases, the depacker decompressed from fastram to fastram, with code running in fastram. Except in the case of LHA, where I didn't have the source and unpacked from file to RAM: with lots of buffers, which may account for its low score. Measurements were done on my A500, GVP HD8+ SCSI interface with 2.3MB/s DMA and 8MB Zorro (24-bit) fastram. With fastmem disabled, in the example of DPaint 3.25 (277616 bytes), the decompression speed of Nibbler from slowmem to slowmem on A500 gave a speed of 132198 bytes/s. Basically, the speed is throttled by the speed at which new bytes can be written to RAM. In fact, you can compare a decompressor with a naïve (non-unraveled) byte copy loop to get a figure to put on "decompression code overhead". Copying DPaint from slowmem to slowmem gave a rate of 182642 bytes/s, making Nibbler decompress 38% slower than this byte-copy loop, i.e. 38% decompression code overhead. On CPUs with instruction caches this figure would drop a little, but still be throttled by how well burst writes help with the byte output stream write to RAM. A quick calculation puts LZX (the best competitor) at 658% decompression code overhead, so ...mmm... that chart still looks mighty fine to me! ![]() Last edited by Photon; 16 November 2014 at 18:13. |
|
![]() |
![]() |
#20 |
Registered User
Join Date: May 2011
Location: Cambridge
Posts: 676
|
|
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Amiga formatted zip disks via USB zip drive in WinUAE? | planetidiot | New to Emulation or Amiga scene | 8 | 02 February 2018 08:43 |
System.zip / bad zip-file | Ztein | project.ClassicWB | 24 | 22 April 2012 02:14 |
Use of 4MB PCMCIA Fast Flash Memory as Fast RAM in A1200 | nkarytia | support.Hardware | 10 | 16 September 2011 13:37 |
Added SIMM to ZIP adapter, now 16MB Fast RAM | tonyyeb | support.Hardware | 18 | 01 September 2008 10:59 |
LZX unpacking??? | Medvind | support.Apps | 25 | 27 November 2002 12:33 |
|
|