English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 14 November 2017, 23:38   #1
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Real time 68000 unpacker test

Requesting help for comparison of Amiga data unpacker (in 68000 assembler).

I ask all great coders/crackers in this section because it is probably very easy for you inserting right 68k code in the snippet below.
To have a fair result, we must use the same minimal 68000 setup: a base real A500 or similar or a quickstart A500/A500+/A600 (full cycle exact) in WinUAE.
Packer speed is not significative for this test, offline packer can be used.

Only realtime decompressors. How to define "realtime"?
A decompressor that does not slow down the floppy load.
Amiga has a 250kbit/sec capable interface, so a loader that maintain a speed of 22.5KB/s is good.
Considering that a good compressor can reach 45% less space than original, >50KB/s of sustained data written to memory can be considered real-time on the A500.

Another big difference is the data to be compressed.
I have randomly selected different sized files from various games (probably some of you will recognize them..).
They are highly compressible (this too is random), but we can change them or add other!
If I was not clear feel free to ask
I could do it myself but I do not know if I have all the right compressors, variant and versions.
And the comments of other coders are very important.

For me, will provide results with the alpha version of the new compressor announced here: http://eab.abime.net/showpost.php?p=...6&postcount=13
The aim is to find the (relative) best compressor (pareto frontier) in a semi-scientific way.

Code:
; code speed tester by ross for EAB members *

; exec
_LVOOldOpenLibrary	equ     -408
_LVOCloseLibrary	equ	-414
_LVORawDoFmt		equ	-522
;dos
_LVOWrite		equ     -48
_LVOOutput		equ     -60

	section	code,code

start	movea.l	$4.w,a6
	move.l	a6,d4			; d4=execbase
	lea	$dff09a,a5		; a5=intena
	lea	$bfd800,a4		; a4=ciab tod (base)
	lea	buffer,a3		; a3=PutChData (after DataStream init)
	move.l	a3,d5			; d5=DataStream
	lea	stuffChar(pc),a2	; a2=PutChProc

	lea	dosname(pc),a1
	jsr	_LVOOldOpenLibrary(a6)
	move.l	d0,d7			; d7=dosbase
	beq.b 	exit
	movea.l d0,a6
	jsr 	_LVOOutput(a6)
	move.l 	d0,d6 			; d6=CLI handle
	beq.b 	close			; make sure we have a CLI handle

	move.w	#$4000,(a5)		; disable
.bwait	move.w	2-$9a(a5),d0		; blitter wait
	add.w	d0,d0
	bmi.b	.bwait

	; if you need to init/move/setup packed data
	jsr	init

	bsr.b	ciabtod			; start time
	move.l	d0,d2	

	jsr	unpack			; GO!

	bsr.b	ciabtod			; end time
	move.w	#$c000,(a5)		; enable
	sub.l	d2,d0			; elapsed

					; 15734 for NTSC
	divu.w	#15625*256/1000,d0	; d0=ms (pal_hfreq*scale_down/granularity)
	bvs.b	close			; overflow?, something over specs..
	bne.b	.inrng
	moveq	#1,d0			; 1ms minumum (also avoid division by zero)
.inrng	move.w	d0,(a3)+		; arg1, elapsed time, 65535ms max else not rt

	move.l	d1,(a3)+		; arg2, unpacked raw data
	divu.w	d0,d1
;	bvs.b	close			; overflow?!
	move.w	d1,(a3)+		; arg3, integer KB/sec
	swap	d1
	mulu.w	#100,d1			; scale fraction
	divu.w	d0,d1
	move.w	d1,(a3)+		; arg4, fractional KB/sec (not KiB!)
	

	movea.l	d4,a6			; execbase
	movea.l	d5,a1			; args
	lea	text(pc),a0		; format string
	;a2=stuffChar
	;a3=buffer
	jsr	_LVORawDoFmt(a6)

	movea.l	d7,a6			;dosbase
	move.l	d6,d1 			;CLI handle
	move.l	a3,d2			;text buffer
	moveq	#66,d3			;len
	jsr	_LVOWrite(a6)

close	movea.l	d4,a6	;execbase
	movea.l d7,a1	;dosbase
	jsr	_LVOCloseLibrary(a6)
exit	moveq	#0,d0
	rts

stuffChar
	move.b  d0,(a3)+        	;Put data to output string
	rts

ciabtod	move.b	$200(a4),d0
	swap	d0
	move.b	$100(a4),d0
	lsl.w	#8,d0
	move.b	(a4),d0
	lsl.l	#8,d0			; scale up (counter wrap proof)
	rts


dosname	dc.b	"dos.library",0
text	dc.b	"Elapsed: %d ms, data: %ld bytes, speed: %d,%d KB/s",$a,0


	section	bss,bss

buffer	ds.w	1	; time elapsed
	ds.l	1	; unpacked data length
	ds.w	1	; speed.i
	ds.w	1	; speed.f
	ds.b	66	; PtChData



	section	unpack,code_c

	; preserve all registers but d0/d1/a0/a1
	; input:  a0=data stream source, a1=destination
	; output: d1= unpacked data length

init	; insert your init code here (not timed)
	rts

unpack	; insert your unpack code here (a500 is the target)
	move.l	#2000000,d1
.loop	move.w	d1,$dff180	
	subq.l	#1,d1
	bne.b	.loop
	move.l	#200000,d1
	rts
Thanks.
Attached Files
File Type: zip test_files.zip (335.6 KB, 51 views)
ross is offline  
Old 15 November 2017, 19:09   #2
WayneK
Registered User
 
Join Date: May 2004
Location: Somewhere secret
Age: 45
Posts: 290
These are my results using my packer of choice (UPX, NRV2B algorithm) with your test harness (all packed sizes include a 4-byte signature):

Code:
80croc.def (ORIG: 170,782 bytes, PACKED: 18,393 bytes)
Elapsed: 1008 ms, data: 170782 bytes, speed: 169,42 KB/s

BLOX1.DAT (ORIG: 112,384 bytes, PACKED: 43,099 bytes)
Elapsed: 1093 ms, data: 112384 bytes, speed: 102,82 KB/s

jp2_000 (ORIG: 267,264 bytes, PACKED: 126,193 bytes)
Elapsed: 3049 ms, data: 267264 bytes, speed: 87,65 KB/s

jp2_001 (ORIG: 34,468 bytes, PACKED: 17,976 bytes)
Elapsed: 392 ms, data: 34468 bytes, speed: 87,92 KB/s

jp2_002 (ORIG: 51,100 bytes, PACKED: 22,504 bytes)
Elapsed: 525 ms, data: 51100 bytes, speed: 97,33 KB/s

MAIN.BIN (ORIG: 118,784 bytes, PACKED: 64,681 bytes)
Elapsed: 1548 ms, data: 118784 bytes, speed: 76,73 KB/s

Zombies.SHP (ORIG: 245,720 bytes, PACKED: 46,941 bytes)
Elapsed: 1685 ms, data: 245720 bytes, speed: 145,82 KB/s
Hope this helps, looking forward to seeing a better/faster packer from you
WayneK is offline  
Old 15 November 2017, 19:56   #3
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Thanks WayneK.

Speed for raw deflate (gzip -9 + degzip):
Code:
see tables that follow in subsequent messages
I was surprised that gzip could be real time on the A500!

[EDIT: from this very first test I prefer NRV2B for speed reason,
inflate unpacker is also a bit demanding in memory and code space]

Last edited by ross; 17 November 2017 at 18:29.
ross is offline  
Old 15 November 2017, 23:49   #4
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Original aplib (appack raw) added:

Code:
see tables that follow in subsequent messages
Interesting, aplib on par with upx but much slower.
NRV2B seems to be a serious competitor!

Quote:
Originally Posted by WayneK View Post
Hope this helps, looking forward to seeing a better/faster packer from you
Try to finish a beta test version in a short time

Last edited by ross; 17 November 2017 at 18:28.
ross is offline  
Old 17 November 2017, 18:26   #5
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Beta version of 'noname' completed.

Code:
file    80croc.def              BLOX1.DAT               jp2_000                 jp2_001     
dim.    170782                  112384                  267264                  34468       
                                                
        pack_to ratio   KB/s    pack_to ratio   KB/s    pack_to ratio   KB/s    pack_to ratio   KB/s
gzip    20291   11,88%  151,26  41904   37,29%  63,92   126252  47,24%  55,48   17104   49,62%  52,70
appack  19351   11,33%  103,62  43060   38,32%  63,92   126722  47,41%  48,62   17634   51,16%  44,13
nrv2b   18389   10,77%  169,42  43095   38,35%  102,82  126189  47,22%  87,65   17972   52,14%  87,92
noname  17604   10,31%  180,34  42288   37,63%  112,40  124762  46,68%  99,76   17401   50,48%  91,42
                                                
                                                
file    jp2_002                 MAIN.BIN                Zombies.SHP                 
dim.    51100                   118784                  245720                  
                                                
        pack_to ratio   KB/s    pack_to ratio   KB/s    pack_to ratio   KB/s            
gzip    21555   42,18%  56,90   63371   53,35%  49,80   52415   21,33%  91,51           
appack  22037   43,13%  49,18   64.976  54,70%  42,49   49293   20,06%  84,46           
nrv2b   22500   44,03%  97,33   64677   54,45%  76,73   46937   19,10%  145,82          
noname  21732   42,53%  100,00  64226   54,07%  89,40   46779   19,04%  154,73
Results are better than expected

Waiting comments.
ross is offline  
Old 17 November 2017, 19:41   #6
WayneK
Registered User
 
Join Date: May 2004
Location: Somewhere secret
Age: 45
Posts: 290
Awesome, already the best Amiga packer?!

Seems like you improved it a lot in a short space of time, do you ever sleep?
WayneK is offline  
Old 17 November 2017, 20:20   #7
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Quote:
Originally Posted by WayneK View Post
Awesome, already the best Amiga packer?!
We need more evidence

Quote:
Seems like you improved it a lot in a short space of time, do you ever sleep?
Not too much actually, but not because of 'noname'

Well, effectively unpacker contains ideas which I have had in far times.
I've made the bit flux with 68k in mind, not the opposite.
Surely the code i've written is intriguing and amusing

A great guy from another board immediately understood the potential and adapted his awesome optimal encoder.

The result is in table.

Last edited by ross; 17 November 2017 at 20:49. Reason: :)
ross is offline  
Old 15 December 2017, 17:28   #8
Trachu
Registered User
 
Join Date: Dec 2015
Location: Poland
Posts: 189
Hi Ross

Currently for 68000 fastest decompressor is LZ4W, while it will beat everything in case of speed (over 600kB/s) its compression ratio is weak compared to yours.

80croc.def - 58674 Bytes
Blox1.dat - 57980
jp2_000 - 176452
jp2_001 - 34468
jp2_002 - 51100
main.bin - 93898
zombies.shp - 94430

As you can see its nowhere yours results, but its speed comes from operating only on Words instead of Bytes. I think in case of your compressor you rely on Bytes, which mean you can make the version of it with double speed, which can be handy considering your high compression ratio.

Keep up the good work.
Trachu is offline  
Old 17 December 2017, 21:40   #9
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Quote:
Originally Posted by Trachu View Post
Hi Ross

Currently for 68000 fastest decompressor is LZ4W, while it will beat everything in case of speed (over 600kB/s) its compression ratio is weak compared to yours.
Hi Trachu, i know LZ4/W and is really fast!
But too weak in compression ratio for my taste

Quote:
As you can see its nowhere yours results, but its speed comes from operating only on Words instead of Bytes. I think in case of your compressor you rely on Bytes, which mean you can make the version of it with double speed, which can be handy considering your high compression ratio.
Yes, I rely on bytes (and bit, nibble and two different gamma encoding).
Unfortunately it is not so simple to increase the speed while maintaining that compression ratio.
Of course you could try a "word" version but I really think that it would not double the speed and you would lose at least a 10% compression ratio.


Quote:
Keep up the good work.
Thanks
ross is offline  
Old 18 December 2017, 20:05   #10
Trachu
Registered User
 
Join Date: Dec 2015
Location: Poland
Posts: 189
Quote:
Originally Posted by ross View Post
Hi Trachu, i know LZ4/W and is really fast!
But too weak in compression ratio for my taste


Yes, I rely on bytes (and bit, nibble and two different gamma encoding).
Unfortunately it is not so simple to increase the speed while maintaining that compression ratio.
Of course you could try a "word" version but I really think that it would not double the speed and you would lose at least a 10% compression ratio.



Thanks
I am not saying change what you have already as its blazing fast with this compression ratio. What i say make additional super fast version. I say 10% for 100% speed is worth it
You created nice piece of original code. Dont think about your code as single purpose for files. Think wider as for example ingame use.

Thats my 2c
Trachu is offline  
Old 19 December 2017, 14:12   #11
britelite
Registered User

 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 575
Quote:
Originally Posted by Trachu View Post
What i say make additional super fast version. I say 10% for 100% speed is worth it
Except that working with words instead of bytes doesn't automatically make things twice the speed, as ross already pointed out.
britelite is offline  
Old 14 July 2019, 22:53   #12
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Finally I've my version of the nrv2x family algorithm (I called it nrv2s).
Pack similar to nrv2b because heavily based on it but with a different bitcoding.

With *fast* 68k unpacker and support for in-place decompression!
Files are cross-packed on Windows.

Quote:
Originally Posted by WayneK View Post
nrv2b:
Code:
Zombies.SHP (ORIG: 245,720 bytes, PACKED: 46,941 bytes)
Elapsed: 1685 ms, data: 245720 bytes, speed: 145,82 KB/s
My new collected values:
Code:
Zombies.SHP (ORIG: 245,720 bytes, PACKED: 46,902 bytes)
Elapsed: 1408 ms, data: 245720 bytes, speed: 174,51 KB/s
So:
Code:
file: Zombies.SHP                 
dim:  245720                  
                                                
        pack_to  ratio    KB/s            
gzip    52415   21,331%   91,51           
appack  49293   20,061%   84,46           
nrv2b   46941   19,103%  145,82          
aplibx  46779   19,038%  154,73          
nrv2s   46902   19,088%  174,51
note: aplibx is a never finished appack code..

Logically a file is not significant at all, but just to give an idea

A very significant property is the native support for in place decompression.
The output is like this:
Code:
compressing file "zombies.shp"...

upx raw data start at 520, 33496 tokens found
33486 tokens used for the new encoding, 375176 raw bits
refactoring for in-place decompression
33485 in-place tokens, 1 from stack, 375207 raw bits

input size: 245720 bytes
output size: 46902 bytes
load encoded file at buffer offset 0x000308A2

execution time : 1.795 s
ross is offline  
Old 15 July 2019, 01:20   #13
alpine9000
Registered User

 
Join Date: Mar 2016
Location: Australia
Posts: 684
Quote:
Originally Posted by ross View Post
Finally I've my version of the nrv2x family algorithm (I called it nrv2s).
Pack similar to nrv2b because heavily based on it but with a different bitcoding.

With *fast* 68k unpacker and support for in-place decompression!
Files are cross-packed on Windows.



My new collected values:
Code:
Zombies.SHP (ORIG: 245,720 bytes, PACKED: 46,902 bytes)
Elapsed: 1408 ms, data: 245720 bytes, speed: 174,51 KB/s
So:
Code:
file: Zombies.SHP                 
dim:  245720                  
                                                
        pack_to  ratio    KB/s            
gzip    52415   21,331%   91,51           
appack  49293   20,061%   84,46           
nrv2b   46941   19,103%  145,82          
aplibx  46779   19,038%  154,73          
nrv2s   46902   19,088%  174,51
note: aplibx is a never finished appack code..

Logically a file is not significant at all, but just to give an idea

A very significant property is the native support for in place decompression.
The output is like this:
Code:
compressing file "zombies.shp"...

upx raw data start at 520, 33496 tokens found
33486 tokens used for the new encoding, 375176 raw bits
refactoring for in-place decompression
33485 in-place tokens, 1 from stack, 375207 raw bits

input size: 245720 bytes
output size: 46902 bytes
load encoded file at buffer offset 0x000308A2

execution time : 1.795 s
Sounds amazing, are you planning in releasing it?
alpine9000 is offline  
Old 15 July 2019, 04:59   #14
Bruce Abbott
Registered User

Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 235
Quote:
Originally Posted by ross View Post
Files are cross-packed on Windows.
Could they be packed on an Amiga?
Bruce Abbott is offline  
Old 15 July 2019, 06:45   #15
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 51
Posts: 1,140
Quote:
Originally Posted by ross View Post
Finally I've my version of the nrv2x family algorithm (I called it nrv2s).
Pack similar to nrv2b because heavily based on it but with a different bitcoding.

With *fast* 68k unpacker and support for in-place decompression!
Files are cross-packed on Windows.



My new collected values:
Code:
Zombies.SHP (ORIG: 245,720 bytes, PACKED: 46,902 bytes)
Elapsed: 1408 ms, data: 245720 bytes, speed: 174,51 KB/s
So:
Code:
file: Zombies.SHP                 
dim:  245720                  
                                                
        pack_to  ratio    KB/s            
gzip    52415   21,331%   91,51           
appack  49293   20,061%   84,46           
nrv2b   46941   19,103%  145,82          
aplibx  46779   19,038%  154,73          
nrv2s   46902   19,088%  174,51
note: aplibx is a never finished appack code..

Logically a file is not significant at all, but just to give an idea

A very significant property is the native support for in place decompression.
The output is like this:
Code:
compressing file "zombies.shp"...

upx raw data start at 520, 33496 tokens found
33486 tokens used for the new encoding, 375176 raw bits
refactoring for in-place decompression
33485 in-place tokens, 1 from stack, 375207 raw bits

input size: 245720 bytes
output size: 46902 bytes
load encoded file at buffer offset 0x000308A2

execution time : 1.795 s
Do you have results from Arj 7? And Pack Fire? Both versions, LZMA and fast.
Don_Adan is offline  
Old 15 July 2019, 18:47   #16
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Quote:
Originally Posted by Don_Adan View Post
Do you have results from Arj 7? And Pack Fire? Both versions, LZMA and fast.
Hi Don, I'll sure do (not today) some tests vs Arjm7, Pack fire (both), ProPack and Shrinkler.
Well, actually not directly comparable because the very different algos and speed classes.
But sure interesting.
ross is offline  
Old 15 July 2019, 18:48   #17
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Quote:
Originally Posted by Bruce Abbott View Post
Could they be packed on an Amiga?
Sorry, no.
ross is offline  
Old 15 July 2019, 19:14   #18
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 1,823
Quote:
Originally Posted by alpine9000 View Post
Sounds amazing, are you planning in releasing it?
It's certainly not a compressor for the public, is not generic and only usable by 68k hardcoder (and for those who know how to use it).
Actually I've made it for A500 low memory productions, where speed, high and in-place decompression is a necessity.
(well, on 020+ it flies )

But yes, after a tough test phase, why not.

For now I make the decompressor public, the bit format is now in its final version.
If you find similarities of the nrv2b source, it is because the skeleton from which I started is that one, but if you look closely at the two you will notice the not subtle differences.

It can be useful, it contains many small (and not so small..) non-trivial optimizations.


Code:
; nrv2s decompression in pure 68k asm
; by ross
;
; On entry:
;	a0	src packed data pointer
;	a1	dest pointer
; (decompress from a0 to a1)
;
; On exit:
;	a0 = dest start
;	a1 = dest end
;
; Register usage:
;	a2	m_pos
;	a3	constant: -$d00
;	a4	2nd src pointer (in stack)
;
;	d0	bit buffer
;	d1	m_off
;	d2	m_len or -1
;
;	d3	last_m_off
;	d4	constant: 2
;	d5	reserved space on stack (max 256)
;
;
; Notes:
;	we have max_offset = 2^23, so we can use some word arithmetics on d1
;	we have max_match = 65535, so we can use word arithmetics on d2
;

nrv2s_ross_unpack
		movem.l	d0-d5/a1-a4,-(sp)

		move.b	(a0)+,d0				; ~stack usage
		moveq	#-2,d5
		and.b	d0,d5
		lea	(sp),a4
		adda.l	d5,sp					; reserve space
_stk	move.b	(a0)+,-(a4)
		addq.b	#1,d0
		bne.b	_stk

; ------------- setup constants -----------

		moveq	#-$80,d0				; d0.b = $80 (byte refill flag)
		moveq	#-1,d2
		moveq	#-1,d3					; last_off = -1
		moveq	#2,d4
		movea.w	#-$d00,a3

; ------------- DECOMPRESSION -------------

decompr_literal
		move.b	(a0)+,(a1)+

decompr_loop
		add.b	d0,d0
		bcc.b	decompr_match
		bne.b	decompr_literal
		move.b	(a0)+,d0
		addx.b	d0,d0
		bcs.b	decompr_literal

decompr_match
		moveq	#-2,d1
decompr_gamma_1
		add.b	d0,d0
		bne.b	_g_1
		move.b	(a0)+,d0
		addx.b	d0,d0
_g_1	addx.w	d1,d1					; max 2^23!

		add.b	d0,d0
		bcc.b	decompr_gamma_1
		bne.b	decompr_select
		move.b	(a0)+,d0
		addx.b	d0,d0
		bcc.b	decompr_gamma_1

decompr_select
		addq.w	#3,d1
		beq.b	decompr_get_mlen		; last m_off
		bpl.b	decompr_exit_token
		lsl.l	#8,d1
		move.b	(a0)+,d1
		move.l	d1,d3					; last_m_off = m_off

decompr_get_mlen						; implicit d2 = -1
		add.b	d0,d0
		bne.b	_e_1
		move.b	(a0)+,d0
		addx.b	d0,d0

_e_1	addx.w	d2,d2
		add.b	d0,d0
		bne.b	_e_2
		move.b	(a0)+,d0
		addx.b	d0,d0

_e_2	addx.w	d2,d2

		lea		(a1,d3.l),a2
		addq.w	#2,d2
		bgt.b 	decompr_gamma_2  

decompr_tiny_mlen
		move.l	d3,d1
		sub.l	a3,d1
		addx.w	d4,d2

L_copy2	move.b	(a2)+,(a1)+
L_copy1	move.b	(a2)+,(a1)+
		dbra	d2,L_copy1
L_rep	bra.b	decompr_loop

decompr_gamma_2							; implicit d2 = 1
		add.b	d0,d0
		bne.b	_g_2
		move.b	(a0)+,d0
		addx.b	d0,d0
_g_2	addx.w  d2,d2
		add.b	d0,d0
		bcc.b	decompr_gamma_2
		bne.b	decompr_large_mlen
		move.b	(a0)+,d0
		addx.b	d0,d0
		bcc.b	decompr_gamma_2

decompr_large_mlen
		move.b	(a2)+,(a1)+
		move.b	(a2)+,(a1)+
		cmp.l   a3,d3
		bcs.b   L_copy2
		move.b	(a2)+,(a1)+
		dbra	d2,L_copy1

decompr_exit_token
		lea	(a4),a0
		bclr	d2,d2					; ;)
		bne.b	L_rep
		
		suba.l  d5,sp
		movem.l	(sp)+,d0-d5/a0/a2-a4
		rts
ross is offline  
Old 15 July 2019, 22:34   #19
alpine9000
Registered User

 
Join Date: Mar 2016
Location: Australia
Posts: 684
Quote:
Originally Posted by ross View Post
It's certainly not a compressor for the public, is not generic and only usable by 68k hardcoder (and for those who know how to use it).
Actually I've made it for A500 low memory productions, where speed, high and in-place decompression is a necessity.
(well, on 020+ it flies )

But yes, after a tough test phase, why not.

For now I make the decompressor public, the bit format is now in its final version.
If you find similarities of the nrv2b source, it is because the skeleton from which I started is that one, but if you look closely at the two you will notice the not subtle differences.

It can be useful, it contains many small (and not so small..) non-trivial optimizations.


Code:
; nrv2s decompression in pure 68k asm
; by ross
;
; On entry:
;	a0	src packed data pointer
;	a1	dest pointer
; (decompress from a0 to a1)
;
; On exit:
;	a0 = dest start
;	a1 = dest end
;
; Register usage:
;	a2	m_pos
;	a3	constant: -$d00
;	a4	2nd src pointer (in stack)
;
;	d0	bit buffer
;	d1	m_off
;	d2	m_len or -1
;
;	d3	last_m_off
;	d4	constant: 2
;	d5	reserved space on stack (max 256)
;
;
; Notes:
;	we have max_offset = 2^23, so we can use some word arithmetics on d1
;	we have max_match = 65535, so we can use word arithmetics on d2
;

nrv2s_ross_unpack
		movem.l	d0-d5/a1-a4,-(sp)

		move.b	(a0)+,d0				; ~stack usage
		moveq	#-2,d5
		and.b	d0,d5
		lea	(sp),a4
		adda.l	d5,sp					; reserve space
_stk	move.b	(a0)+,-(a4)
		addq.b	#1,d0
		bne.b	_stk

; ------------- setup constants -----------

		moveq	#-$80,d0				; d0.b = $80 (byte refill flag)
		moveq	#-1,d2
		moveq	#-1,d3					; last_off = -1
		moveq	#2,d4
		movea.w	#-$d00,a3

; ------------- DECOMPRESSION -------------

decompr_literal
		move.b	(a0)+,(a1)+

decompr_loop
		add.b	d0,d0
		bcc.b	decompr_match
		bne.b	decompr_literal
		move.b	(a0)+,d0
		addx.b	d0,d0
		bcs.b	decompr_literal

decompr_match
		moveq	#-2,d1
decompr_gamma_1
		add.b	d0,d0
		bne.b	_g_1
		move.b	(a0)+,d0
		addx.b	d0,d0
_g_1	addx.w	d1,d1					; max 2^23!

		add.b	d0,d0
		bcc.b	decompr_gamma_1
		bne.b	decompr_select
		move.b	(a0)+,d0
		addx.b	d0,d0
		bcc.b	decompr_gamma_1

decompr_select
		addq.w	#3,d1
		beq.b	decompr_get_mlen		; last m_off
		bpl.b	decompr_exit_token
		lsl.l	#8,d1
		move.b	(a0)+,d1
		move.l	d1,d3					; last_m_off = m_off

decompr_get_mlen						; implicit d2 = -1
		add.b	d0,d0
		bne.b	_e_1
		move.b	(a0)+,d0
		addx.b	d0,d0

_e_1	addx.w	d2,d2
		add.b	d0,d0
		bne.b	_e_2
		move.b	(a0)+,d0
		addx.b	d0,d0

_e_2	addx.w	d2,d2

		lea		(a1,d3.l),a2
		addq.w	#2,d2
		bgt.b 	decompr_gamma_2  

decompr_tiny_mlen
		move.l	d3,d1
		sub.l	a3,d1
		addx.w	d4,d2

L_copy2	move.b	(a2)+,(a1)+
L_copy1	move.b	(a2)+,(a1)+
		dbra	d2,L_copy1
L_rep	bra.b	decompr_loop

decompr_gamma_2							; implicit d2 = 1
		add.b	d0,d0
		bne.b	_g_2
		move.b	(a0)+,d0
		addx.b	d0,d0
_g_2	addx.w  d2,d2
		add.b	d0,d0
		bcc.b	decompr_gamma_2
		bne.b	decompr_large_mlen
		move.b	(a0)+,d0
		addx.b	d0,d0
		bcc.b	decompr_gamma_2

decompr_large_mlen
		move.b	(a2)+,(a1)+
		move.b	(a2)+,(a1)+
		cmp.l   a3,d3
		bcs.b   L_copy2
		move.b	(a2)+,(a1)+
		dbra	d2,L_copy1

decompr_exit_token
		lea	(a4),a0
		bclr	d2,d2					; ;)
		bne.b	L_rep
		
		suba.l  d5,sp
		movem.l	(sp)+,d0-d5/a0/a2-a4
		rts
I’d definitely be interested in using it for my games in place of RNC “in place”
alpine9000 is offline  
Old 15 July 2019, 22:38   #20
mcgeezer
Registered User

 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 1,247
Quote:
Originally Posted by alpine9000 View Post
I’d definitely be interested in using it for my games in place of RNC “in place”
Same here! Would love to try it!
mcgeezer is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help us test our new game on real hardware alpine9000 Retrogaming General Discussion 26 08 June 2016 09:21
PPC68k20 did anyone test this on real hardware? Michael Sykes Amiga scene 0 21 January 2016 00:07
How to test time for piece of code Powergoo Coders. Asm / Hardware 16 17 October 2015 23:08
ALWAYS test your code on real hardware!! h0ffman Coders. General 32 16 July 2015 21:02
Time Soldier v1.3 Uploaded to the Zone (Please Test and Provide Feedback) Abaddon project.WHDLoad 0 17 December 2010 19:33

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:34.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.09437 seconds with 14 queries