Fastest depacker - Page 3

Galahad/FLT · 09 July 2019, 01:32

Quote:

Originally Posted by ross

You're cheating! Nah, I'm kidding

If there is no other way this is the right one.
I look forward for the release

There is no other way, it is quite an ingeniously setup protection.

So Disk 1 is part interrupt loading format and part standard MFM, with $1900 size tracks.

Disk 2 is entirely interrupt loading format, and that format is like this.

So interrupt disk format is two different sync marks, $4489 and $4522, but the data is structured not as one complete $1900, but as two big "sectors" of $c80 bytes.

So first part of track is $4489 and $c80 worth of data, then the second part of that track is $4522 and also $c80 worth of data.

The problem is that the programmer only loads in $c80 sizes when interrupt loader is running, which means he has set a REALLY small track buffer to decode MFM data, which is neatly positioned inbetween code and onscreen graphics, when wanting to load next half of the track, he just swaps the SYNC and it will load the next part, of course AmigaDOS $4489 style track can not be decoded in such a tiny memory space and there is literally NO room in 512k in which to do it.

The disk format is also quite large, $FA000 per disk, so even if I could find the room, I wouldn't be able to fit all the data on the disks and keep it as two like the original.

Lots of sneaky stuff, like if you try and bypass his interrupt loader entirely, it will miss stuff that is setup in memory for other routines which will crash, lots of self modifying encryption (the weakest part of the protection), checksums (again, strangely weaker than the disk protection), if you try and mess with the interrupt loader, it does stuff like seek to track 83 which is all kinds of fun the first time it happens!!!!!

So now i've had to change the interrupt loader system to a multi loader, I have to preserve chip memory to extra memory so I can load the different parts and then restore that used memory, it was quite the headache to figure out a system that wouldn't fall foul of the programmers stuff trying to trip me up.

Very competent protection, and a few schoolboy errors by me along the way didn't help either!

ross · 09 July 2019, 09:02

Very detailed description, thanks!

Quote:

Originally Posted by Galahad/FLT

[--cut--]
So interrupt disk format is two different sync marks, $4489 and $4522, but the data is structured not as one complete $1900, but as two big "sectors" of $c80 bytes.
[--cut--]
The disk format is also quite large, $FA000 per disk,
[--cut--]

This somehow reminds me of WotD, same $FA000 per disk, same net $1900 per track, similar macro-sectors split ($640x4),
but fortunately big MFM buffer, same sync and no tight IRQ loader (so one track load per rotations).

Quote:

Lots of sneaky stuff, like if you try and bypass his interrupt loader entirely, it will miss stuff that is setup in memory for other routines which will crash, lots of self modifying encryption (the weakest part of the protection), checksums (again, strangely weaker than the disk protection), if you try and mess with the interrupt loader, it does stuff like seek to track 83 which is all kinds of fun the first time it happens!!!!!
[--cut--]
Very competent protection

WayneK · 09 July 2019, 10:51

Quote:

Originally Posted by ross

I quote myself.

Beta version for the new nrv2b 68k data compression handler ready.
Only a beta because my idea is to add some facilities for in-place decompression.
But would require a modification of the bit-coding..

I'm sure that WayneK would like to try it

I can confirm this

Been busy on other stuff (c64, playing some CTF's), but you can never leave Amigaaaaa!

jarre · 15 July 2019, 17:55

I love to read this kinda stories, and i'm perfectly clear now why we never where able to crack this back in the dayz....recognize some things about the disk format and the buffer between the code and the graphics, but i'm sure we couldn't crack this kinda impressive code anywayz. but i'll like it to see that this kinda things is possible after so many years.... a big, big cheers for all involved by this project......

Tigerskunk · 14 August 2019, 12:31

I'd need something that encodes on Mac OS X (I build all my stuff in a shellskript there), and decodes on the Amiga (off course

).

Seems LZ4 would be the way to go, but I'd also love to have a good compresson rate (which LZ4 doesn't seem to have?)

leonard · 24 August 2019, 11:33

if anyone is interested by a fast LZ4 68k depacker, you can use one of my three version here: (tiny, normal and fast )

https://github.com/arnaud-carre/lz4-68k

If you need extreme packing ratio and you don't care of decompression time then use Shrinkler.
If you need very good packing ratio with average depacking speed ( about 15KiB/s, same speed as floppy loading), use ARJ mode 7
If you need almost same good packing ratio than ARJ mode 7 and about 2 times faster depacking, use UPX (nrv2b )
If you need extremly fast depacking speed withtout ridiculous packing ratio, use LZ4

I did a ATARI depacking benchmark, you can look results here: https://ibb.co/JKtQFVt

first column is the packed binary file size ( smaller is better ). Then the name of the file ( you have lz77, pft=packfire tiny, lz, am7=arj mode 7, shk=shrinkler

the number between bracket () is the decompressor code size.

Then the last column is the decompressor speed ( number of 50hz tick to depack). Smaller is faster.

ross · 24 August 2019, 12:13

Quote:

Originally Posted by leonard

If you need almost same good packing ratio than ARJ mode 7 and about 3 times faster depacking, use UPX (nrv2b )

Hi leonard, you should try nrv2r.

Is an improvement and a fork over nrv2b, born for in-place decompression.

Depacker code:

Code:

; nrv2r decompression in 68000 assembly
; by ross
;
; On entry:
;	a0	src pointer
;	[a1	dest pointer]
; (decompress also to a1=a0)
;
; On exit:
;	all preserved but
;	a1 = dest start
;
; Register usage:
;	a2	m_pos
;	a3	constant: $cff
;	a4	2nd src pointer (in stack)
;
;	d0	bit buffer
;	d1	m_off
;	d2	m_len or -1
;
;	d3	last_m_off
;	d4	constant: 2
;	d5	reserved space on stack
;
;
; Notes:
;	we have max_offset = 2^23, so we can use some word arithmetics on d1
;	we have max_match = 65535, so we can use word arithmetics on d2
;

nrv2r_ross_unpack
		movem.l	d0-d5/a0/a2-a4,-(sp)

		lea	(a0),a1						; if (a1) lea (a0),a4
		adda.l	(a0),a0					; end of packed data
		move.l	-(a0),(a1)				; if (a1) move.l -(a0),(a4)
		adda.l	-(a0),a1				; end of buffer
		
		move.b	-(a0),d0				; ~stack usage
		moveq	#-2,d5
		and.b	d0,d5
		adda.l	d5,sp					; reserve space
		lea	(sp),a4
_stk	move.b	-(a0),(a4)+
		addq.b	#1,d0
		bne.b	_stk

; ------------- setup constants -----------

		moveq	#-$80,d0				; d0.b = $80 (byte refill flag)
		moveq	#-1,d2
		moveq	#0,d3					; last_off = 0(1)
		moveq	#2,d4
		movea.w	#$cff,a3

; ------------- DECOMPRESSION -------------

decompr_literal
		move.b	-(a0),-(a1)

decompr_loop
		add.b	d0,d0
		bcc.b	decompr_match
		bne.b	decompr_literal
		move.b	-(a0),d0
		addx.b	d0,d0
		bcs.b	decompr_literal

decompr_match
		moveq	#1,d1
decompr_gamma_1
		add.b	d0,d0
		bne.b	_g_1
		move.b	-(a0),d0
		addx.b	d0,d0
_g_1	addx.w	d1,d1					; max 2^23!

		add.b	d0,d0
		bcc.b	decompr_gamma_1
		bne.b	decompr_select
		move.b	-(a0),d0
		addx.b	d0,d0
		bcc.b	decompr_gamma_1

decompr_select
		subq.w	#3,d1
		bcs.b	decompr_get_mlen		; last m_off
		bmi.b	decompr_exit_token
		lsl.l	#8,d1
		move.b	-(a0),d1
		move.l	d1,d3					; last_m_off = m_off

decompr_get_mlen						; implicit d2 = -1
		add.b	d0,d0
		bne.b	_e_1
		move.b	-(a0),d0
		addx.b	d0,d0

_e_1	addx.w	d2,d2
		add.b	d0,d0
		bne.b	_e_2
		move.b	-(a0),d0
		addx.b	d0,d0

_e_2	addx.w	d2,d2

		lea		1(a1,d3.l),a2
		addq.w	#2,d2
		bgt.b 	decompr_gamma_2  

decompr_tiny_mlen
		move.l	a3,d1
		sub.l	d3,d1
		addx.w	d4,d2

L_copy2	move.b	-(a2),-(a1)
L_copy1	move.b	-(a2),-(a1)
		dbra	d2,L_copy1
L_rep	bra.b	decompr_loop

decompr_gamma_2							; implicit d2 = 1
		add.b	d0,d0
		bne.b	_g_2
		move.b	-(a0),d0
		addx.b	d0,d0
_g_2	addx.w  d2,d2
		add.b	d0,d0
		bcc.b	decompr_gamma_2
		bne.b	decompr_large_mlen
		move.b	-(a0),d0
		addx.b	d0,d0
		bcc.b	decompr_gamma_2

decompr_large_mlen
		move.b	-(a2),-(a1)
		move.b	-(a2),-(a1)
		cmpa.l   d3,a3
		bcs.b   L_copy2
		move.b	-(a2),-(a1)
		dbra	d2,L_copy1

decompr_exit_token
		lea	(a4),a0
		bclr	d2,d2					; ;)
		bne.b	L_rep
		
		suba.l  d5,sp
		movem.l	(sp)+,d0-d5/a0/a2-a4
		rts

leonard · 24 August 2019, 12:47

ross: interesting! do you have some numbers to share about perf difference between nrv2b and nrv2r you mentionned? ( both in term of packing ratio and compression speed? )

ross · 24 August 2019, 13:35

Quote:

Originally Posted by leonard

ross: interesting! do you have some numbers to share about perf difference between nrv2b and nrv2r you mentionned? ( both in term of packing ratio and compression speed? )

http://eab.abime.net/showpost.php?p=...3&postcount=24

or the whole thread:
http://eab.abime.net/showthread.php?t=89467

Not exaustive at all, I wanted to do something complete but I never found the time..

Antiriad_UK · 31 October 2019, 22:02

I've been adding some depacker support to my framework. Doing a onefiler targeting the A500 512+512 config. So I'm shunting data from fast mem to chip as needed. Instead of just copying I've been playing around with some of the packers in this thread. Got packfire/shrinkler/lz4 working nicely. I want to try upx nrv2b next but getting stuck.

First off how exactly do you get a data file compressed? Upx doesn't seem to do it. I saw reference to a modified upx exe on an ST forum but not been able to track it down. What are you guys using?

Decompression-wise is this the routine people are using? https://github.com/upx/upx/blob/mast...8k/nrv2b_d.ash

mcgeezer · 31 October 2019, 23:05

Quote:

Originally Posted by Antiriad_UK

I want to try upx nrv2b next but getting stuck.

First off how exactly do you get a data file compressed? Upx doesn't seem to do it. I saw reference to a modified upx exe on an ST forum but not been able to track it down. What are you guys using?

Decompression-wise is this the routine people are using? https://github.com/upx/upx/blob/mast...8k/nrv2b_d.ash

Drop a pm to Ross... he's the boy.

Antiriad_UK · 31 October 2019, 23:12

Quote:

Originally Posted by mcgeezer

Drop a pm to Ross... he's the boy.

He is, and he'd already PMed me

WayneK · 01 November 2019, 12:23

You could also take a look at DoynaxLZ, which was used in the Oxyron demo Planet Rocklobster.

It's a port of a c64 packer to 68000, he released the tools in the demo source.

ross · 01 November 2019, 12:39

Quote:

Originally Posted by WayneK

You could also take a look at DoynaxLZ, which was used in the Oxyron demo Planet Rocklobster.

It's a port of a c64 packer to 68000, he released the tools in the demo source.

Yep, good packer and very fast unpacker.
I use it when I'm not concerned by space constraints.

phx · 01 November 2019, 14:00

Yes, I used it for Trap Runner and Solid Gold. The portable packer is extremely slow, though. Takes hours on real 68k hardware.

ross · 01 November 2019, 14:24

A problem for Doynax 68k is that the way encoding is built is impossible to use for in-place decompression.

But the double stream is conversely a strong point because it allows to reach impossible speeds for the token byte fetch decoders.

WayneK · 02 November 2019, 13:14

Quote:

Originally Posted by ross

A problem for Doynax 68k is that the way encoding is built is impossible to use for in-place decompression.

Yes, I mentioned it to Antiriad_UK because he specifically said he was looking to depack from fast -> chip mem :P

ross · 02 November 2019, 13:25

Quote:

Originally Posted by WayneK

Yes, I mentioned it to Antiriad_UK because he specifically said he was looking to depack from fast -> chip mem :P

It would be interesting to make a comparison with LZ4 or LZ4W.
is there a volunteer?

Antiriad_UK · 02 November 2019, 13:42

Not a very scientific report here. But in one routine I've got a lev3 interrupt running using 99% of CPU and outside the interrupt I'm doing a depack of an image from fast to chip (just in time for it to be displayed after about 10 seconds)

So the CPU is maxed and depack is very much slowed but the relative results are interesting:
Original image: 40KB
LZ4, 9247 bytes, 1-2 seconds
Doynamite, 8125 bytes, 4 seconds
nrv2s, 7840 bytes, 5 seconds
shrinkler, 6528 bytes, 20 seconds

For all the images i've compressed I'm seeing similar sorts of packing ratios. I'm spoiled for choice really now. I'm thinking lz4 for anything that needs ludicrous speed and doynamite/nrv2s for pretty much anything else (nrv2s/r if in-place needed ofc). Maybe shrinkler where you have loads of free time and need the best ratio.

ross · 02 November 2019, 14:04

Thanks for testing.

This is somehow interesting.
While the difference in ratio is expected considering the algorithms, I expected greater speed difference between doynamite and nrv2s.
This makes me think that maybe a dual stream version for nrv2x might make sense.

24 August 2019, 11:33	#46
leonard Registered User Join Date: Apr 2013 Location: paris Posts: 133	if anyone is interested by a fast LZ4 68k depacker, you can use one of my three version here: (tiny, normal and fast ) https://github.com/arnaud-carre/lz4-68k If you need extreme packing ratio and you don't care of decompression time then use Shrinkler. If you need very good packing ratio with average depacking speed ( about 15KiB/s, same speed as floppy loading), use ARJ mode 7 If you need almost same good packing ratio than ARJ mode 7 and about 2 times faster depacking, use UPX (nrv2b ) If you need extremly fast depacking speed withtout ridiculous packing ratio, use LZ4 I did a ATARI depacking benchmark, you can look results here: https://ibb.co/JKtQFVt first column is the packed binary file size ( smaller is better ). Then the name of the file ( you have lz77, pft=packfire tiny, lz, am7=arj mode 7, shk=shrinkler the number between bracket () is the decompressor code size. Then the last column is the decompressor speed ( number of 50hz tick to depack). Smaller is faster. Last edited by leonard; 24 August 2019 at 16:26.

02 November 2019, 13:42	#59
Antiriad_UK OCS forever! Join Date: Mar 2019 Location: Birmingham, UK Posts: 418	Not a very scientific report here. But in one routine I've got a lev3 interrupt running using 99% of CPU and outside the interrupt I'm doing a depack of an image from fast to chip (just in time for it to be displayed after about 10 seconds) So the CPU is maxed and depack is very much slowed but the relative results are interesting: Original image: 40KB LZ4, 9247 bytes, 1-2 seconds Doynamite, 8125 bytes, 4 seconds nrv2s, 7840 bytes, 5 seconds shrinkler, 6528 bytes, 20 seconds For all the images i've compressed I'm seeing similar sorts of packing ratios. I'm spoiled for choice really now. I'm thinking lz4 for anything that needs ludicrous speed and doynamite/nrv2s for pretty much anything else (nrv2s/r if in-place needed ofc). Maybe shrinkler where you have loads of free time and need the best ratio. Last edited by Antiriad_UK; 02 November 2019 at 13:52. Reason: Added shrinkler

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
What depacker to use? Premier Manager II	pepsimaxman	Coders. General	13	15 July 2019 11:17
Fastest Amiga Games	Djay	Nostalgia & memories	73	09 June 2016 21:42
Fastest unZIP on 030?	Amiga1992	support.Apps	7	04 October 2010 01:15
fastest hardfile or directory ?	turrican3	New to Emulation or Amiga scene	10	06 June 2007 19:08
RNC Data File Depacker v2.1	Nico	New to Emulation or Amiga scene	8	05 May 2002 18:05

15 July 2019, 17:55	#44
jarre Registered User Join Date: Sep 2016 Location: Deventer - Netherlands Posts: 599	I love to read this kinda stories, and i'm perfectly clear now why we never where able to crack this back in the dayz....recognize some things about the disk format and the buffer between the code and the graphics, but i'm sure we couldn't crack this kinda impressive code anywayz. but i'll like it to see that this kinda things is possible after so many years.... a big, big cheers for all involved by this project......

14 August 2019, 12:31	#45
Tigerskunk Inviyya Dude! Join Date: Sep 2016 Location: Amiga Island Posts: 2,798	I'd need something that encodes on Mac OS X (I build all my stuff in a shellskript there), and decodes on the Amiga (off course ). Seems LZ4 would be the way to go, but I'd also love to have a good compresson rate (which LZ4 doesn't seem to have?)

24 August 2019, 12:47	#48
leonard Registered User Join Date: Apr 2013 Location: paris Posts: 133	ross: interesting! do you have some numbers to share about perf difference between nrv2b and nrv2r you mentionned? ( both in term of packing ratio and compression speed? )

31 October 2019, 22:02	#50
Antiriad_UK OCS forever! Join Date: Mar 2019 Location: Birmingham, UK Posts: 418	I've been adding some depacker support to my framework. Doing a onefiler targeting the A500 512+512 config. So I'm shunting data from fast mem to chip as needed. Instead of just copying I've been playing around with some of the packers in this thread. Got packfire/shrinkler/lz4 working nicely. I want to try upx nrv2b next but getting stuck. First off how exactly do you get a data file compressed? Upx doesn't seem to do it. I saw reference to a modified upx exe on an ST forum but not been able to track it down. What are you guys using? Decompression-wise is this the routine people are using? https://github.com/upx/upx/blob/mast...8k/nrv2b_d.ash

01 November 2019, 12:23	#53
WayneK Registered User Join Date: May 2004 Location: Somewhere secret Age: 50 Posts: 366	You could also take a look at DoynaxLZ, which was used in the Oxyron demo Planet Rocklobster. It's a port of a c64 packer to 68000, he released the tools in the demo source.

01 November 2019, 14:00	#55
phx Natteravn Join Date: Nov 2009 Location: Herford / Germany Posts: 2,553	Yes, I used it for Trap Runner and Solid Gold. The portable packer is extremely slow, though. Takes hours on real 68k hardware.

01 November 2019, 14:24	#56
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 54 Posts: 4,501	A problem for Doynax 68k is that the way encoding is built is impossible to use for in-place decompression. But the double stream is conversely a strong point because it allows to reach impossible speeds for the token byte fetch decoders.

02 November 2019, 14:04	#60
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 54 Posts: 4,501	Thanks for testing. This is somehow interesting. While the difference in ratio is expected considering the algorithms, I expected greater speed difference between doynamite and nrv2s. This makes me think that maybe a dual stream version for nrv2x might make sense.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)