English Amiga Board - View Single Post - Optimizing the 68020+ 32-bit math

litwr · 29 April 2021, 14:56

Quote:

Originally Posted by a/b

Why didn't you use "the same" algorithm, something like (code not tested)?

Code:

	move.l	d4,d0
	swap	d0
	cmp.l	d0,d6		; divident >= (divisor<<16)?
	bhs.b	.32bit

.16bit	divu.w	d4,d6
	swap	d6
	move.w	d6,(a3)

.32bit	divul.l	d4,d7:d6
	move.w	d7,(a3)

edit: To keep it simple: do a faster (32/16) div whenever possible *without* penalty of failing (it's still a slooow div), extra check is compensated for.

Thank you! However it is not that easy because we need d6 and d7 which must keep quotient and remainder in their 32-bit. So actually, we need a sequence of MOVE, CLR, BSWAP before .32bit - my code which is equivalent to the 386 code is the next:

Code:

     moveq.l #0,d7
     swap d6
     cmp.w d4,d6
     bcs .div32no

     swap d6
     divul.l d4,d7:d6
     move.w d7,(a3) 
     bra .div32f

.div32no
     swap d6
     divu.w d4,d6
     move.w d6,d7
     clr.w d6
     swap d6 
     move.w d6,(a3) 
.div32f

This makes 2 extra SWAPs.

Optimization for the 80386 gives only 2 or 3 saved cycles, for the 486 - 4 or 5. So it is really very complex. It is sad that neither 80386 nor 68020 are cycle exact in popular emulators. Moreover the emulators are very inaccurate especially for DIVU.L and DIVUL.