View Single Post
Old 03 June 2018, 11:31   #20
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
Some non-scientific and quick tests.
Pure code seems slightly faster than this lazy bfextu 8bit LUT implementation:
Code:
_lut8flip:
	lea	_8lut(pc),a0
	move.l	d0,d1
	bfextu  d1{8:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{16:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{24:8},d2
	move.b	(a0,d2.w),d0
	ror.l	#8,d0
	bfextu  d1{0:8},d2
	move.b	(a0,d2.w),d0
	rts
But the absolute winner is the 16bit LUT approach (even 50% faster).
Simple as:
Code:
_lut16flip:
	lea	_16lut+65536,a0
	move.w	(a0,d0.w*2),d0
	swap	d0
	move.w	(a0,d0.w*2),d0
	rts
The abuse of memory can be contestable, BUT:
suppose you have a lot of big AGA sprites (64x64,4planes) and also a lot of tiles (32x32,4/8planes) for a big total of 1MB of data, all to be flipped.
In this case may be useful (the waste becomes proportionally less and less significant, and CPU time is precious on 020..)

But surely pure code, like Thorham suggested, is a great deal!


[EDIT, PS]
Why non-scientific?
I do not have a CD32, nor an Amiga for that matter
So it's all based on the emulation of WinUAE which for 020 is not CE perfect (or it is for this simple code? well, it's not that important..).
Also I had no will to write code other than bfextu and anyway the difference in speed between pure code and 8bit LUT does not seem significant enough to justify the exclusive use of LUT

Last edited by ross; 03 June 2018 at 12:04. Reason: PS
ross is offline  
 
Page generated in 0.04535 seconds with 10 queries