View Single Post
Old 29 April 2021, 14:56   #8
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by a/b View Post
Why didn't you use "the same" algorithm, something like (code not tested)?
Code:
	move.l	d4,d0
	swap	d0
	cmp.l	d0,d6		; divident >= (divisor<<16)?
	bhs.b	.32bit

.16bit	divu.w	d4,d6
	swap	d6
	move.w	d6,(a3)

.32bit	divul.l	d4,d7:d6
	move.w	d7,(a3)
edit: To keep it simple: do a faster (32/16) div whenever possible *without* penalty of failing (it's still a slooow div), extra check is compensated for.
Thank you! However it is not that easy because we need d6 and d7 which must keep quotient and remainder in their 32-bit. So actually, we need a sequence of MOVE, CLR, BSWAP before .32bit - my code which is equivalent to the 386 code is the next:

Code:
     moveq.l #0,d7
     swap d6
     cmp.w d4,d6
     bcs .div32no

     swap d6
     divul.l d4,d7:d6
     move.w d7,(a3) 
     bra .div32f

.div32no
     swap d6
     divu.w d4,d6
     move.w d6,d7
     clr.w d6
     swap d6 
     move.w d6,(a3) 
.div32f
This makes 2 extra SWAPs. Optimization for the 80386 gives only 2 or 3 saved cycles, for the 486 - 4 or 5. So it is really very complex. It is sad that neither 80386 nor 68020 are cycle exact in popular emulators. Moreover the emulators are very inaccurate especially for DIVU.L and DIVUL.

Last edited by litwr; 29 April 2021 at 15:02.
litwr is offline  
 
Page generated in 0.10598 seconds with 11 queries