English Amiga Board - View Single Post - Optimizing the 68020+ 32-bit math

saimo · 30 April 2021, 22:42

Quote:

Thank you. But you also chose the slower code with CMP before the first division. The 68k has an advantage over the x86: the DIVU instructions set V-flag. Why don't use this advantage?

To be honest, I only skimmed through the thread and I thought that the code you landed at in post #8 was for some reason the form you were aiming at, so I just applied some optimizations to that

But, yes, I agree that it's better to perform the division first, given that you said that the worst case (overflow set) is very rare.

Quote:

Finally, your code just replaces MOVEQ and SWAP with MOVE.L and CLR.L - it hardly makes any speed boost.

Other than on 68060, swap is slower. The code I proposed also aims to save cycles by allowing the CPU to execute more stuff in parallel thanks to less register dependencies (and the long write to memory, which, if I understand correctly, is not an option).
Anyway, on to the divu-first code...

Leaving aside the bvs optimization (that depends on the structure of your code), there's still one thing you can do to avoid the moveq at the beginning of the code, thus saving a little time in the case of the bvs branch:

Code:

     divu.w   d4,d6
     bvc.b    .div32no

     divul.l  d4,d7:d6
     move.w   d7,(a3)
     bra.b    .div32f

.div32no
     move.l   d6,d7 
     clr.w    d6
     eor.l    d6,d7
     swap.w   d6
     move.w   d6,(a3)
     
.div32f