English Amiga Board - View Single Post

ross · 03 June 2018, 11:31

Some non-scientific and quick tests.
Pure code seems slightly faster than this lazy bfextu 8bit LUT implementation:

Code:

_lut8flip:
	lea	_8lut(pc),a0
	move.l	d0,d1
	bfextu  d1{8:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{16:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{24:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{0:8},d2
	move.b	(a0,d2.w),d0
	rts

But the absolute winner is the 16bit LUT approach (even 50% faster).
Simple as:

Code:

_lut16flip:
	lea	_16lut+65536,a0
	move.w	(a0,d0.w*2),d0
	swap	d0
	move.w	(a0,d0.w*2),d0
	rts

The abuse of memory can be contestable, BUT:
suppose you have a lot of big AGA sprites (64x64,4planes) and also a lot of tiles (32x32,4/8planes) for a big total of 1MB of data, all to be flipped.
In this case may be useful

(the waste becomes proportionally less and less significant, and CPU time is precious on 020..)

But surely pure code, like Thorham suggested, is a great deal!

[EDIT, PS]
Why non-scientific?
I do not have a CD32, nor an Amiga for that matter

So it's all based on the emulation of WinUAE which for 020 is not CE perfect (or it is for this simple code? well, it's not that important..).
Also I had no will to write code other than bfextu and anyway the difference in speed between pure code and 8bit LUT does not seem significant enough to justify the exclusive use of LUT

03 June 2018, 11:31	#20
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,479	Some non-scientific and quick tests. Pure code seems slightly faster than this lazy bfextu 8bit LUT implementation: Code: _lut8flip: lea _8lut(pc),a0 move.l d0,d1 bfextu d1{8:8},d2 move.b (a0,d2.w),d0 ror.l #8,d0 bfextu d1{16:8},d2 move.b (a0,d2.w),d0 ror.l #8,d0 bfextu d1{24:8},d2 move.b (a0,d2.w),d0 ror.l #8,d0 bfextu d1{0:8},d2 move.b (a0,d2.w),d0 rts But the absolute winner is the 16bit LUT approach (even 50% faster). Simple as: Code: _lut16flip: lea _16lut+65536,a0 move.w (a0,d0.w2),d0 swap d0 move.w (a0,d0.w2),d0 rts The abuse of memory can be contestable, BUT: suppose you have a lot of big AGA sprites (64x64,4planes) and also a lot of tiles (32x32,4/8planes) for a big total of 1MB of data, all to be flipped. In this case may be useful (the waste becomes proportionally less and less significant, and CPU time is precious on 020..) But surely pure code, like Thorham suggested, is a great deal! [EDIT, PS] Why non-scientific? I do not have a CD32, nor an Amiga for that matter So it's all based on the emulation of WinUAE which for 020 is not CE perfect (or it is for this simple code? well, it's not that important..). Also I had no will to write code other than bfextu and anyway the difference in speed between pure code and 8bit LUT does not seem significant enough to justify the exclusive use of LUT Last edited by ross; 03 June 2018 at 12:04. Reason: PS