View Single Post
Old 01 May 2021, 08:50   #20
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by saimo View Post
The code proposed by litwr doesn't have increments, and from the context it looks like (a3) is a variable rather than an item in an array/buffer, so it seems it should be possible to long-align it.
A3 is a pointer to an array element. The element size is 2 byte. The pointer decreases. So long word access may slow down the algo.

Quote:
Originally Posted by saimo View Post
To be honest, I only skimmed through the thread and I thought that the code you landed at in post #8 was for some reason the form you were aiming at, so I just applied some optimizations to that But, yes, I agree that it's better to perform the division first, given that you said that the worst case (overflow set) is very rare.
I showed the CMP-first version only because it was a/b's demand. Sorry I didn't add more information about it afore.

Quote:
Originally Posted by saimo View Post
Other than on 68060, swap is slower. The code I proposed also aims to save cycles by allowing the CPU to execute more stuff in parallel thanks to less register dependencies (and the long write to memory, which, if I understand correctly, is not an option).
Thank you. I didn't know this. However CLR vs SWAP timing is rather odd. The 68000 executes SWAP faster than CLR but the 68020 executes CLR faster than SWAP! It is interesting to reduce instruction dependency in the code that may speed up the execution on the 68020 and higher 68k. Actually I didn't think about it. But I have just checked the code and IMHO it is difficult to improve it this way. The code for the main loop is short, it is only 17 lines (or 25 if MULUopt=1) between .l2 and BCC .l2 - one can check it too.

Quote:
Originally Posted by saimo View Post
Anyway, on to the divu-first code...
Your code again corrupts D7.

Quote:
Originally Posted by saimo View Post
Leaving aside the bvs optimization (that depends on the structure of your code), there's still one thing you can do to avoid the moveq at the beginning of the code, thus saving a little time in the case of the bvs branch
My BVS optimization is in the last attachment. However it is not important because it is independent from other optimizations. However, it is short and I can show it here too

Code:
         divu.w d4,d6
         bvs.s .longdiv

         moveq.l #0,d7
         move.w d6,d7
         clr.w d6
         swap d6
         move.w d6,(a3)
litwr is offline  
 
Page generated in 0.05936 seconds with 10 queries