Anyone up for an ASM coding competition? - Page 27

ross · 16 April 2017, 17:44

Quote:

Originally Posted by paraj

After spending way too much time on this silly project, I think I'm finally done creating odd LZ-variants

And this might actually be useful for situations where larger packers won't fit (I'm telling myself...)

Playing around with various parameters, I've settled on the following word oriented format: 16-bits of control/mode bits (MSB->LSB, 1=literal 0=match). Literals are 16-bit and matches are coded with 11-bits for the offset, 5 bits for length-1. A zero match word signals end of data).

42-byte inner loop can't quite match the best of Ross', but the extra instructions gave improvements on my test data. Of course adding more instructions gave better results, but this is what I ended up with.

Hi paraj, I too came to the conclusion that 5 bits per length were the best.
Quite good and fast format (for the autoimposed limits) but sure you haven't missed http://eab.abime.net/showpost.php?p=...&postcount=471?

Code:

            1.024 bootblock.bin
              877 bootblock.bin.lz32b
              922 bootblock.bin.wlz.ref
           68.708 copchunky.exe
           49.147 copchunky.exe.lz32b
           51.482 copchunky.exe.wlz.ref
          449.132 simple.bin
           15.893 simple.bin.lz32b
           16.884 simple.bin.wlz.ref
          164.344 twister.exe
           64.063 twister.exe.lz32b
           66.028 twister.exe.wlz.ref

[copchunky.exe (-c11), simple.bin (-c9), other (-c10, default)]
Mixed 8/16bit packed data and 32byte decoder:
http://eab.abime.net/showpost.php?p=...&postcount=480

Bye!
ross

paraj · 16 April 2017, 21:21

Quote:

Originally Posted by ross

Hi paraj, I too came to the conclusion that 5 bits per length were the best.
Quite good and fast format (for the autoimposed limits) but sure you haven't missed http://eab.abime.net/showpost.php?p=...&postcount=471?

Code:

            1.024 bootblock.bin
              877 bootblock.bin.lz32b
              922 bootblock.bin.wlz.ref
           68.708 copchunky.exe
           49.147 copchunky.exe.lz32b
           51.482 copchunky.exe.wlz.ref
          449.132 simple.bin
           15.893 simple.bin.lz32b
           16.884 simple.bin.wlz.ref
          164.344 twister.exe
           64.063 twister.exe.lz32b
           66.028 twister.exe.wlz.ref

[copchunky.exe (-c11), simple.bin (-c9), other (-c10, default)]
Mixed 8/16bit packed data and 32byte decoder:
http://eab.abime.net/showpost.php?p=...&postcount=480

Bye!
ross

No, but I wanted my own creation

And after messing with different lz4 inspired variations I found that I while it was easy to beat your implementation in compression ratio, I just couldn't do it while keeping the byte count down (without imposing too many encoder restrictions). So I ended up scraping it all and doing a fully word-oriented encoder so it would at least have the benefit of being novel

. BTW except for data that compresses poorly (e.g. bootblock.bin) I found that encoding most literal runs just wasn't worth it.

FWIW my final byte-oriented coder performs as follows:

Quote:

Writing 58584 bytes (58 KB) to test_data\twister.exe.lz4b2
Writing 46517 bytes (46 KB) to test_data\copchunky.exe.lz4b2
Writing 887 bytes (1 KB) to test_data\bootblock.bin.lz4b2
Writing 6576 bytes (7 KB) to test_data\simple.bin.lz4b2

But the decoder is a massive (!) 78 bytes (excluding the saving/restoring of registers):

Code:

        ; a0 = dest
        ; a1 = src
unpack:
        movem.l d2-d3/a2, -(a7)
.refill:
        moveq   #8, d0          ; make sure upper bits of d0 are clear
.getcontrolbits:
        move.w  d0, d1          ; d1 = number of control bits (8 normally, less on short run)
        move.b  (a1)+, d0       ; d0 = control bits
        bne.s   .mainloop       ; special case?
        move.b  (a1)+, d0       ; yes load another byte
        beq.s   .mainloop       ; the control bits just happened to be 0
        cmp.b   d0, d1          ; See if this is a short run/the end or a literal run
        beq.s   .quit		; =8 -> quit
        bcc.s	.getcontrolbits	; <8 -> the byte specified the number of control bits, d0 is copied to d1 and refilled (with a value guaranteed not to be 0)
        addq.w  #6, d0          ; >8 -> literal run length - 7, add 6 (for dbf)
.litcopy:
	move.b	(a1)+, (a0)+
	dbf	d0, .litcopy
        bra.s	.refill
.mainloop:
        ; d0.b = control bits
        ; d1.b = num control bits
        subq.b  #1, d1          ; we consume one control bit
        bmi.s   .refill         ; are we out of bits?
        add.b   d0, d0          ; get control bit
        bcc.s   .match          ; clear -> match
        move.b  (a1)+, (a0)+    ; copy literal
        bra.s   .mainloop       ; and continue loop
.match:
        moveq   #-1, d2         ; d2 = !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!
        move.b  (a1)+, d2       ; d2 = !!!!!!!!!!!!!!!! ObOaO9O8L3L2L1L0
        move.b  d2, d3          ; d3 = ???????????????? ObOaO9O8L3L2L1L0
        lsl.w   #4, d2          ; d2 = !!!!!!!!ObOaO9O8 L3L2L1L0........
        move.b  (a1)+, d2       ; d2 = !!!!!!!!ObOaO9O8 O7O6O5O4O3O2O1O0
        lea     (a0,d2.w), a2   ; a2 = &out[-1-offset]
        move.b  (a2)+, (a0)+    ; copy first byte
        moveq   #$f, d2         ; d2 = ................ ........!!!!!!!!
        and.w   d2, d3          ; d3 = ................ ........L3L2L1L0
        bne.s   .copyloop
        move.b  (a2)+, (a0)+    ; copy another byte
        move.b  (a1)+, d3       ; get extra length
        add.w   d2, d3          ; + $f (and with one extra iteration from dbf and the above copies we get +$12 (18))
.copyloop:
        move.b  (a2)+, (a0)+    ; copy the rest of the match
        dbf     d3, .copyloop
        bra.s   .mainloop       ; handle next lz value
.quit:
        movem.l (a7)+, d2-d3/a2 ; restore registers
        rts                     ; and return

But it was fun trying to compete

Now I guess it's back to doing something "productive" while using other peoples packers

ross · 17 April 2017, 15:12

Quote:

Originally Posted by paraj

No, but I wanted my own creation

And after messing with different lz4 inspired variations I found that I while it was easy to beat your implementation in compression ratio, I just couldn't do it while keeping the byte count down (without imposing too many encoder restrictions). So I ended up scraping it all and doing a fully word-oriented encoder so it would at least have the benefit of being novel

. BTW except for data that compresses poorly (e.g. bootblock.bin) I found that encoding most literal runs just wasn't worth it.

lzb32 was only a proof of concept: the smallest, effective, pretty fast LZ depacker.
It's useful for two thing: bootblock packing (surprisingly good!) and code prepacking/deduplication.
My real (unborn) packer, a sort of titanics replacer, is overbloated, so a prepacker was a necessity.

I assure you that i've (mine) better compressors algorithms

Quote:

FWIW my final byte-oriented coder performs as follows:
But the decoder is a massive (!) 78 bytes (excluding the saving/restoring of registers):

Quote:

But it was fun trying to compete

Now I guess it's back to doing something "productive" while using other peoples packers

You are right

Bye!
ross

litwr · 02 May 2017, 09:24

This message was misplaced. Please delete it. It was for the other thread.

Blueberry · 22 September 2018, 20:43

Wow, a long discussion about compression, and I wasn't there to take part in it!

Anyway, reading through the posts, I found my name mentioned by @ross:

Quote:

Originally Posted by ross

@Blueberry: i'm reading now Your last comment on Pouet!
"Interesting to see Shrinkler used for a bootblock. In my own experiments, it did not beat my simple bootblock cruncher (which does not crunch as well but has a much smaller decruncher). Could be because the code is written to be good for the simple one. I see from the sizes that you have some huge repetitions, in which case Shrinkler is definitely better."
I'm very curious about "the simple one". Your code is surely a must.

OK, here's the decrunch code I have used for the last handful of bootblocks (68020 version, 38 bytes; 68000 version is 2 bytes bigger):

Code:

Decrunch:
	; Source in A0, destination in A1
	move.b	(a0)+,d0
.literal:
	move.b	(a0)+,(a1)+
	bsr.b	GetBit
	bcc.b	.literal
	bsr.b	GetBit
	bcc.b	.ref
	moveq.l	#-1,d1
	move.b	(a0)+,d1
	beq.b	End
.ref:
	move.b	(a1,d1.w*2),(a1)+
	bsr.b	GetBit
	bcs.b	.ref
	bra.b	.literal

GetBit:
	add.b	d0,d0
	bne.b	.same
	move.b	(a0)+,d0
	addx.b	d0,d0
.same:
	; C = bit
End:	rts

The compressed data consists of a bit stream and a byte stream, interleaved with each other according to when the decruncher needs to read a byte from the byte stream or the next byte from the bit stream. This requires a somewhat steady hand in the encoder, but it works out.

The first byte of the bit stream contains only 7 bits of data. The least significant bit is always 1. This is used as a marker bit by the bit stream decoder to know when it has decoded all bits of the current byte. For subsequent bytes, the marker bit is inserted by the decoder, so all 8 bits can be used.

The first symbol is always a literal. After that, bits from the bit stream indicate what follows:

0: Literal. Copy one byte from the byte stream.
1 0: Match with repeated offset.
1 1: Match with new offset. Read a byte from the byte stream. Zero means end of data. Otherwise, multiply by two and one-extend to a word. Thus, all even offsets up to 510 can be represented (680x0 code has mostly even-offset matches).

For both kinds of match, the length is unary coded in the bit stream: a match of length k is coded as k-1 one bits followed by a zero bit.

A match is always followed by a literal, so the literal/match indicator bit is omitted in this case.

To compare the cruncher to the lz32b algorithm discussed here, I ran some of my bootblocks through the lz32b compressor.

The bootblocks of course contain a bit more overhead than just this code (12 byte bootblock header + some code to set up the pointers, flush the cache and call the decompressed code). So the actual payload (which is what is compressed here) compress to around 950-960 bytes using my algorithm.

These are the sizes I got (all were compressed with 10 bit offsets, which was the best setting in all cases):

Space Onion: 1364 -> 1046
BeelzeBub's Beat-Block: 1336 -> 1081
Swirl: 1280 -> 1042
Fluid Fire: 1388 -> 1076
Splatblob: 1186 -> 1055

As can be seen, my method beats lz32b by quite a significant margin (around 100 bytes), so it is well worth the slightly bigger decrunch code for a 1k target size.

The comparison is of course not entirely fair, since these particular payloads were tuned to compress well using my cruncher. But I still think it would have an edge in most practical cases, as long as the important matches are within the 510 bytes maximum distance (which can usually be achieved with some tweaking).

I have attached a zip containing the payload files in case anyone wants to experiment.

Blueberry · 22 September 2018, 21:35

Another data point:

The bootblock.bin data file that @paraj posted crunches to 790 bytes.

ross · 22 September 2018, 21:38

Quote:

Originally Posted by Blueberry

As can be seen, my method beats lz32b by quite a significant margin (around 100 bytes), so it is well worth the slightly bigger decrunch code for a 1k target size.

Ok, you code is a must like i've guessed, I had no doubt about it

But this cruncher is on another level: match with repeated offset and the even offset make too much difference.
There is a thread somewhere about a modified aplib packer: it simply beat everything I thrown at it because use repeated offset
(logically not counting for entropy encoder or cm but compared to other bare and fast lz, near 100kb/s on A500).

I simply tried to write the smallest depacker possible with maximum compression for that size and 32 byte seemed to me a great value to reach

Thanks for the code and the description!

PS: I always try to read what you write about the algorithms you build.
The last one in the checkboard challenge is simply awesome, I had to read it 3 times to understand it

02 May 2017, 09:24	#524
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	This message was misplaced. Please delete it. It was for the other thread. Last edited by litwr; 02 May 2017 at 09:33. Reason: it was a wrong place for this message.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Starting ASM coding on A1200. Which Assembler?	Nosferax	Coders. Asm / Hardware	68	27 November 2015 16:14
4th tutorial on ASM- and HW-coding	Vikke	Coders. Asm / Hardware	11	10 April 2013 20:32
3rd tutorial on ASM- and HW-coding	Vikke	Coders. Asm / Hardware	6	26 March 2013 15:57
First tutorial on ASM- and HW-coding	Vikke	Coders. Asm / Hardware	46	18 March 2013 12:33
2nd tutorial on ASM- and HW-coding	Vikke	Coders. Asm / Hardware	10	17 March 2013 11:49

22 September 2018, 21:35	#526
Blueberry Registered User Join Date: Dec 2007 Location: Aarhus / Denmark Posts: 40	Another data point: The bootblock.bin data file that @paraj posted crunches to 790 bytes.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)