View Single Post
Old 30 April 2021, 22:42   #19
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
Quote:
Thank you. But you also chose the slower code with CMP before the first division. The 68k has an advantage over the x86: the DIVU instructions set V-flag. Why don't use this advantage?
To be honest, I only skimmed through the thread and I thought that the code you landed at in post #8 was for some reason the form you were aiming at, so I just applied some optimizations to that But, yes, I agree that it's better to perform the division first, given that you said that the worst case (overflow set) is very rare.

Quote:
Finally, your code just replaces MOVEQ and SWAP with MOVE.L and CLR.L - it hardly makes any speed boost.
Other than on 68060, swap is slower. The code I proposed also aims to save cycles by allowing the CPU to execute more stuff in parallel thanks to less register dependencies (and the long write to memory, which, if I understand correctly, is not an option).
Anyway, on to the divu-first code...

Leaving aside the bvs optimization (that depends on the structure of your code), there's still one thing you can do to avoid the moveq at the beginning of the code, thus saving a little time in the case of the bvs branch:
Code:
     divu.w   d4,d6
     bvc.b    .div32no

     divul.l  d4,d7:d6
     move.w   d7,(a3)
     bra.b    .div32f

.div32no
     move.l   d6,d7 
     clr.w    d6
     eor.l    d6,d7
     swap.w   d6
     move.w   d6,(a3)
     
.div32f
saimo is offline  
 
Page generated in 0.04334 seconds with 11 queries