Quote:
Originally Posted by litwr
We can just ignore the branch with DIVUL - it is executed very rarely.
|
Then I'd go for this:
Code:
move.l d6,d7
swap.w d6
cmp.w d4,d6
bcs.b .div32no
divul.l d4,d6:d7
move.w d6,(a3)
exg.l d6,d7
bra.b .div32f
.div32no
divu.w d4,d7
clr.l d6
move.w d7,d6
clr.w d7
swap.w d7
move.w d7,(a3)
.div32f
This code gives:
* 32-bit quotient in d6;
* 32-bit remainder in d7, with upper word set to 0;
* 16-bit remainder written to (a3).
Also, it executes some stuff in parallel, saving cycles.
If you don't care about the upper word of d7 being 0:
Code:
move.l d6,d7
swap.w d6
cmp.w d4,d6
bcs.b .div32no
divul.l d4,d6:d7
move.w d6,(a3)
exg.l d6,d7
bra.b .div32f
.div32no
divu.w d4,d7
clr.l d6
move.w d7,d6
swap.w d7
move.w d7,(a3)
.div32f
If you don't care about the registers being exchanged (as you mentioned in a post):
Code:
move.l d6,d7
swap.w d6
cmp.w d4,d6
bcs.b .div32no
divul.l d4,d6:d7
move.w d6,(a3)
bra.b .div32f
.div32no
divu.w d4,d7
clr.l d6
move.w d7,d6
swap.w d7
move.w d7,(a3)
.div32f
If you can afford to trash the word at (2,a3):
Code:
move.l d6,d7
swap.w d6
cmp.w d4,d6
bcs.b .div32no
divul.l d4,d6:d7
move.w d6,(a3)
bra.b .div32f
.div32no
divu.w d4,d7
clr.l d6
move.l d7,(a3)
move.w d7,d6
swap.w d7
.div32f
And if you don't care about the remainder in d7:
Code:
move.l d6,d7
swap.w d6
cmp.w d4,d6
bcs.b .div32no
divul.l d4,d6:d7
move.w d6,(a3)
bra.b .div32f
.div32no
divu.w d4,d7
clr.l d6
move.l d7,(a3)
move.w d7,d6
.div32f