English Amiga Board - View Single Post - Optimizing the 68020+ 32-bit math

saimo · 29 April 2021, 20:03

Quote:

Originally Posted by litwr

We can just ignore the branch with DIVUL - it is executed very rarely.

Then I'd go for this:

Code:

     move.l     d6,d7
     swap.w     d6
     cmp.w      d4,d6
     bcs.b      .div32no

     divul.l    d4,d6:d7
     move.w     d6,(a3)
     exg.l      d6,d7
     bra.b      .div32f

.div32no
     divu.w     d4,d7
     clr.l      d6
     move.w     d7,d6
     clr.w      d7
     swap.w     d7
     move.w     d7,(a3)

.div32f

This code gives:
* 32-bit quotient in d6;
* 32-bit remainder in d7, with upper word set to 0;
* 16-bit remainder written to (a3).
Also, it executes some stuff in parallel, saving cycles.

If you don't care about the upper word of d7 being 0:

Code:

     move.l     d6,d7
     swap.w     d6
     cmp.w      d4,d6
     bcs.b      .div32no

     divul.l    d4,d6:d7
     move.w     d6,(a3)
     exg.l      d6,d7
     bra.b      .div32f

.div32no
     divu.w     d4,d7
     clr.l      d6
     move.w     d7,d6
     swap.w     d7
     move.w     d7,(a3)

.div32f

If you don't care about the registers being exchanged (as you mentioned in a post):

Code:

     move.l     d6,d7
     swap.w     d6
     cmp.w      d4,d6
     bcs.b      .div32no

     divul.l    d4,d6:d7
     move.w     d6,(a3)
     bra.b      .div32f

.div32no
     divu.w     d4,d7
     clr.l      d6
     move.w     d7,d6
     swap.w     d7
     move.w     d7,(a3)

.div32f

If you can afford to trash the word at (2,a3):

Code:

     move.l     d6,d7
     swap.w     d6
     cmp.w      d4,d6
     bcs.b      .div32no

     divul.l    d4,d6:d7
     move.w     d6,(a3)
     bra.b      .div32f

.div32no
     divu.w     d4,d7
     clr.l      d6
     move.l     d7,(a3)
     move.w     d7,d6
     swap.w     d7

.div32f

And if you don't care about the remainder in d7:

Code:

     move.l     d6,d7
     swap.w     d6
     cmp.w      d4,d6
     bcs.b      .div32no

     divul.l    d4,d6:d7
     move.w     d6,(a3)
     bra.b      .div32f

.div32no
     divu.w     d4,d7
     clr.l      d6
     move.l     d7,(a3)
     move.w     d7,d6

.div32f