English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 16 April 2017, 18:44   #521
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
Quote:
Originally Posted by paraj View Post
After spending way too much time on this silly project, I think I'm finally done creating odd LZ-variants And this might actually be useful for situations where larger packers won't fit (I'm telling myself...)

Playing around with various parameters, I've settled on the following word oriented format: 16-bits of control/mode bits (MSB->LSB, 1=literal 0=match). Literals are 16-bit and matches are coded with 11-bits for the offset, 5 bits for length-1. A zero match word signals end of data).

42-byte inner loop can't quite match the best of Ross', but the extra instructions gave improvements on my test data. Of course adding more instructions gave better results, but this is what I ended up with.
Hi paraj, I too came to the conclusion that 5 bits per length were the best.
Quite good and fast format (for the autoimposed limits) but sure you haven't missed http://eab.abime.net/showpost.php?p=...&postcount=471?

Code:
            1.024 bootblock.bin
              877 bootblock.bin.lz32b
              922 bootblock.bin.wlz.ref
           68.708 copchunky.exe
           49.147 copchunky.exe.lz32b
           51.482 copchunky.exe.wlz.ref
          449.132 simple.bin
           15.893 simple.bin.lz32b
           16.884 simple.bin.wlz.ref
          164.344 twister.exe
           64.063 twister.exe.lz32b
           66.028 twister.exe.wlz.ref
[copchunky.exe (-c11), simple.bin (-c9), other (-c10, default)]
Mixed 8/16bit packed data and 32byte decoder:
http://eab.abime.net/showpost.php?p=...&postcount=480

Bye!
ross
ross is offline  
Old 16 April 2017, 22:21   #522
paraj
Registered User

 
Join Date: Feb 2017
Location: Denmark
Posts: 78
Quote:
Originally Posted by ross View Post
Hi paraj, I too came to the conclusion that 5 bits per length were the best.
Quite good and fast format (for the autoimposed limits) but sure you haven't missed http://eab.abime.net/showpost.php?p=...&postcount=471?

Code:
            1.024 bootblock.bin
              877 bootblock.bin.lz32b
              922 bootblock.bin.wlz.ref
           68.708 copchunky.exe
           49.147 copchunky.exe.lz32b
           51.482 copchunky.exe.wlz.ref
          449.132 simple.bin
           15.893 simple.bin.lz32b
           16.884 simple.bin.wlz.ref
          164.344 twister.exe
           64.063 twister.exe.lz32b
           66.028 twister.exe.wlz.ref
[copchunky.exe (-c11), simple.bin (-c9), other (-c10, default)]
Mixed 8/16bit packed data and 32byte decoder:
http://eab.abime.net/showpost.php?p=...&postcount=480

Bye!
ross
No, but I wanted my own creation And after messing with different lz4 inspired variations I found that I while it was easy to beat your implementation in compression ratio, I just couldn't do it while keeping the byte count down (without imposing too many encoder restrictions). So I ended up scraping it all and doing a fully word-oriented encoder so it would at least have the benefit of being novel . BTW except for data that compresses poorly (e.g. bootblock.bin) I found that encoding most literal runs just wasn't worth it.

FWIW my final byte-oriented coder performs as follows:

Quote:
Writing 58584 bytes (58 KB) to test_data\twister.exe.lz4b2
Writing 46517 bytes (46 KB) to test_data\copchunky.exe.lz4b2
Writing 887 bytes (1 KB) to test_data\bootblock.bin.lz4b2
Writing 6576 bytes (7 KB) to test_data\simple.bin.lz4b2
But the decoder is a massive (!) 78 bytes (excluding the saving/restoring of registers):

Code:
        ; a0 = dest
        ; a1 = src
unpack:
        movem.l d2-d3/a2, -(a7)
.refill:
        moveq   #8, d0          ; make sure upper bits of d0 are clear
.getcontrolbits:
        move.w  d0, d1          ; d1 = number of control bits (8 normally, less on short run)
        move.b  (a1)+, d0       ; d0 = control bits
        bne.s   .mainloop       ; special case?
        move.b  (a1)+, d0       ; yes load another byte
        beq.s   .mainloop       ; the control bits just happened to be 0
        cmp.b   d0, d1          ; See if this is a short run/the end or a literal run
        beq.s   .quit		; =8 -> quit
        bcc.s	.getcontrolbits	; <8 -> the byte specified the number of control bits, d0 is copied to d1 and refilled (with a value guaranteed not to be 0)
        addq.w  #6, d0          ; >8 -> literal run length - 7, add 6 (for dbf)
.litcopy:
	move.b	(a1)+, (a0)+
	dbf	d0, .litcopy
        bra.s	.refill
.mainloop:
        ; d0.b = control bits
        ; d1.b = num control bits
        subq.b  #1, d1          ; we consume one control bit
        bmi.s   .refill         ; are we out of bits?
        add.b   d0, d0          ; get control bit
        bcc.s   .match          ; clear -> match
        move.b  (a1)+, (a0)+    ; copy literal
        bra.s   .mainloop       ; and continue loop
.match:
        moveq   #-1, d2         ; d2 = !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!
        move.b  (a1)+, d2       ; d2 = !!!!!!!!!!!!!!!! ObOaO9O8L3L2L1L0
        move.b  d2, d3          ; d3 = ???????????????? ObOaO9O8L3L2L1L0
        lsl.w   #4, d2          ; d2 = !!!!!!!!ObOaO9O8 L3L2L1L0........
        move.b  (a1)+, d2       ; d2 = !!!!!!!!ObOaO9O8 O7O6O5O4O3O2O1O0
        lea     (a0,d2.w), a2   ; a2 = &out[-1-offset]
        move.b  (a2)+, (a0)+    ; copy first byte
        moveq   #$f, d2         ; d2 = ................ ........!!!!!!!!
        and.w   d2, d3          ; d3 = ................ ........L3L2L1L0
        bne.s   .copyloop
        move.b  (a2)+, (a0)+    ; copy another byte
        move.b  (a1)+, d3       ; get extra length
        add.w   d2, d3          ; + $f (and with one extra iteration from dbf and the above copies we get +$12 (18))
.copyloop:
        move.b  (a2)+, (a0)+    ; copy the rest of the match
        dbf     d3, .copyloop
        bra.s   .mainloop       ; handle next lz value
.quit:
        movem.l (a7)+, d2-d3/a2 ; restore registers
        rts                     ; and return
But it was fun trying to compete Now I guess it's back to doing something "productive" while using other peoples packers
paraj is offline  
Old 17 April 2017, 16:12   #523
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
Quote:
Originally Posted by paraj View Post
No, but I wanted my own creation And after messing with different lz4 inspired variations I found that I while it was easy to beat your implementation in compression ratio, I just couldn't do it while keeping the byte count down (without imposing too many encoder restrictions). So I ended up scraping it all and doing a fully word-oriented encoder so it would at least have the benefit of being novel . BTW except for data that compresses poorly (e.g. bootblock.bin) I found that encoding most literal runs just wasn't worth it.
lzb32 was only a proof of concept: the smallest, effective, pretty fast LZ depacker.
It's useful for two thing: bootblock packing (surprisingly good!) and code prepacking/deduplication.
My real (unborn) packer, a sort of titanics replacer, is overbloated, so a prepacker was a necessity.

I assure you that i've (mine) better compressors algorithms

Quote:
FWIW my final byte-oriented coder performs as follows:
But the decoder is a massive (!) 78 bytes (excluding the saving/restoring of registers):


Quote:
But it was fun trying to compete Now I guess it's back to doing something "productive" while using other peoples packers
You are right

Bye!
ross
ross is offline  
Old 02 May 2017, 10:24   #524
litwr
Registered User

 
Join Date: Mar 2016
Location: Ozherele
Posts: 103
This message was misplaced. Please delete it. It was for the other thread.

Last edited by litwr; 02 May 2017 at 10:33. Reason: it was a wrong place for this message.
litwr is offline  
Old 22 September 2018, 21:43   #525
Blueberry
Registered User
 
Join Date: Dec 2007
Location: Aarhus / Denmark
Posts: 33
Wow, a long discussion about compression, and I wasn't there to take part in it!

Anyway, reading through the posts, I found my name mentioned by @ross:

Quote:
Originally Posted by ross View Post
@Blueberry: i'm reading now Your last comment on Pouet!
"Interesting to see Shrinkler used for a bootblock. In my own experiments, it did not beat my simple bootblock cruncher (which does not crunch as well but has a much smaller decruncher). Could be because the code is written to be good for the simple one. I see from the sizes that you have some huge repetitions, in which case Shrinkler is definitely better."
I'm very curious about "the simple one". Your code is surely a must.
OK, here's the decrunch code I have used for the last handful of bootblocks (68020 version, 38 bytes; 68000 version is 2 bytes bigger):

Code:
Decrunch:
	; Source in A0, destination in A1
	move.b	(a0)+,d0
.literal:
	move.b	(a0)+,(a1)+
	bsr.b	GetBit
	bcc.b	.literal
	bsr.b	GetBit
	bcc.b	.ref
	moveq.l	#-1,d1
	move.b	(a0)+,d1
	beq.b	End
.ref:
	move.b	(a1,d1.w*2),(a1)+
	bsr.b	GetBit
	bcs.b	.ref
	bra.b	.literal

GetBit:
	add.b	d0,d0
	bne.b	.same
	move.b	(a0)+,d0
	addx.b	d0,d0
.same:
	; C = bit
End:	rts
The compressed data consists of a bit stream and a byte stream, interleaved with each other according to when the decruncher needs to read a byte from the byte stream or the next byte from the bit stream. This requires a somewhat steady hand in the encoder, but it works out.

The first byte of the bit stream contains only 7 bits of data. The least significant bit is always 1. This is used as a marker bit by the bit stream decoder to know when it has decoded all bits of the current byte. For subsequent bytes, the marker bit is inserted by the decoder, so all 8 bits can be used.

The first symbol is always a literal. After that, bits from the bit stream indicate what follows:

0: Literal. Copy one byte from the byte stream.
1 0: Match with repeated offset.
1 1: Match with new offset. Read a byte from the byte stream. Zero means end of data. Otherwise, multiply by two and one-extend to a word. Thus, all even offsets up to 510 can be represented (680x0 code has mostly even-offset matches).

For both kinds of match, the length is unary coded in the bit stream: a match of length k is coded as k-1 one bits followed by a zero bit.

A match is always followed by a literal, so the literal/match indicator bit is omitted in this case.


To compare the cruncher to the lz32b algorithm discussed here, I ran some of my bootblocks through the lz32b compressor.

The bootblocks of course contain a bit more overhead than just this code (12 byte bootblock header + some code to set up the pointers, flush the cache and call the decompressed code). So the actual payload (which is what is compressed here) compress to around 950-960 bytes using my algorithm.

These are the sizes I got (all were compressed with 10 bit offsets, which was the best setting in all cases):

Space Onion: 1364 -> 1046
BeelzeBub's Beat-Block: 1336 -> 1081
Swirl: 1280 -> 1042
Fluid Fire: 1388 -> 1076
Splatblob: 1186 -> 1055

As can be seen, my method beats lz32b by quite a significant margin (around 100 bytes), so it is well worth the slightly bigger decrunch code for a 1k target size.

The comparison is of course not entirely fair, since these particular payloads were tuned to compress well using my cruncher. But I still think it would have an edge in most practical cases, as long as the important matches are within the 510 bytes maximum distance (which can usually be achieved with some tweaking).

I have attached a zip containing the payload files in case anyone wants to experiment.
Attached Files
File Type: zip bootblocks.zip (5.5 KB, 10 views)

Last edited by Blueberry; 22 September 2018 at 22:32.
Blueberry is offline  
Old 22 September 2018, 22:35   #526
Blueberry
Registered User
 
Join Date: Dec 2007
Location: Aarhus / Denmark
Posts: 33
Another data point:

The bootblock.bin data file that @paraj posted crunches to 790 bytes.
Blueberry is offline  
Old 22 September 2018, 22:38   #527
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
Quote:
Originally Posted by Blueberry View Post
As can be seen, my method beats lz32b by quite a significant margin (around 100 bytes), so it is well worth the slightly bigger decrunch code for a 1k target size.
Ok, you code is a must like i've guessed, I had no doubt about it
But this cruncher is on another level: match with repeated offset and the even offset make too much difference.
There is a thread somewhere about a modified aplib packer: it simply beat everything I thrown at it because use repeated offset
(logically not counting for entropy encoder or cm but compared to other bare and fast lz, near 100kb/s on A500).

I simply tried to write the smallest depacker possible with maximum compression for that size and 32 byte seemed to me a great value to reach

Thanks for the code and the description!


PS: I always try to read what you write about the algorithms you build.
The last one in the checkboard challenge is simply awesome, I had to read it 3 times to understand it
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Starting ASM coding on A1200. Which Assembler? Nosferax Coders. Asm / Hardware 68 27 November 2015 17:14
4th tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 11 10 April 2013 21:32
3rd tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 6 26 March 2013 16:57
First tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 46 18 March 2013 13:33
2nd tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 10 17 March 2013 12:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 02:13.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.
Page generated in 0.07688 seconds with 14 queries