English Amiga Board - View Single Post - Optimizing the 68020+ 32-bit math

litwr · 29 April 2021, 08:36

Quote:

Originally Posted by Thomas Richter

That is NOT equivalent code. A DIVU.L dx,da:db is a 64 by 32 bit division. That's not what your code is doing. To emulate that on a machine without a 32-bit quotient, you need something like Algorithm D. And yes, that would be slower than a DIVU.L.

Well, it doesn't really matter too much. The major problem is here is the quotient size of 32 bit, and this requires a more complicated algorithm than a cascaded divu with has a quotient of 16 bit. A 32/16 full division with 32 remainder and 32 bit quotient is easy with divu, but a full 32/32 division requires "algorithm D" (or some other division algorithm) and is more complex than what is shown here.

Sorry I missed to show an important detail. It is all about 32/16 division with 32q:16r result. So the 68000 code is actually equivalent to the shorter 68020/30 code for this case. Indeed, as it was noted by a/b the DIVUL instruction is used, so, anyway, we have 32/32 division instruction, not 64/32.

Quote:

Originally Posted by a/b

Well, he's not using divu.L, it's divuL(.L) which is 32/32 -> 32:32.
I didn't look too closely at this because... Don't really want to start any flame wars but that code nearly gave me cancer. *Every* single instruction that should have a size specifier (because of multiple valid options), and there's almost a dozen of them is that short clip, doesn't have it. And moveq, which is *always* a longword operation and doesn't ever need it, has it.
When I see that I just check out. Having to decipher M68K code because it's butchered so badly is not my idea of fun. I mean, it's OK when it's some fresh guy who's still learning the basics and does all kind of stuff, we all did that at some point, but this... Nothing personal, I just don't understand why would anyone write so ambiguous code: it's more error prone, less portable, others have to waste time figuring out what you wanted to do, ...

It is impossible to please everyone. It is quite common to skip .w qualifier. You can find proofs for this in many books. However I can try to make the code specifically for your tastes.
68020/30:

Code:

     divul.l d4,d7:d6
     move.w d7,(a3)

68000:

Code:

     moveq.l #0,d7
     divu.w d4,d6
     bvc .div32no

     swap d6
     move.w d6,d7
     divu.w d4,d7
     swap d7
     move.w d7,d6
     swap d6
     divu.w d4,d6
.div32no
     move.w d6,d7
     clr.w d6
     swap d6
     move.w d6,(a3)

68020/30 (improved):

Code:

     moveq.l #0,d7
     divu.w d4,d6
     bvc .div32no

     divul.l d4,d7:d6
     move.w d7,(a3)     ;r[i] <- d%b
     bra .div32f

.div32no
     move.w d6,d7
     clr.w d6
     swap d6
     move.w d6,(a3)     ;r[i] <- d%b
.div32f

I hope it is ok for you now. And I can assure you that if your text was less poisonous the result would be the same.

However I almost sure that it is impossible to optimize this 68020/30 division better than I show in my improved code.

It is interesting that I could optimize the 80386 code for the same case.
The initial 80386 code

Code:

         div esi
         mov [si+ra+1],dx

The optimized 80386 code (it gives even a larger gain on the 80486)

Code:

         rol eax,16
         cmp ax,si
         jnc .lx

         mov dx,ax
         shr eax,16
         div si
.lxc:    mov [si+ra+1],dx

Code:

.lx:     rol eax,16
         div esi
         jmp .lxc

Indeed it is rather a complex optimization but I had a hope that the Amiga experts can know some 68k tricks which I missed.