Quote:
Originally Posted by Thomas Richter
That is NOT equivalent code. A DIVU.L dx,da:db is a 64 by 32 bit division. That's not what your code is doing. To emulate that on a machine without a 32-bit quotient, you need something like Algorithm D. And yes, that would be slower than a DIVU.L.
Well, it doesn't really matter too much. The major problem is here is the quotient size of 32 bit, and this requires a more complicated algorithm than a cascaded divu with has a quotient of 16 bit. A 32/16 full division with 32 remainder and 32 bit quotient is easy with divu, but a full 32/32 division requires "algorithm D" (or some other division algorithm) and is more complex than what is shown here.
|
Sorry I missed to show an important detail. It is all about 32/16 division with 32q:16r result. So the 68000 code is actually equivalent to the shorter 68020/30 code for this case. Indeed, as it was noted by a/b the DIVUL instruction is used, so, anyway, we have 32/32 division instruction, not 64/32.
Quote:
Originally Posted by a/b
Well, he's not using divu.L, it's divuL(.L) which is 32/32 -> 32:32.
I didn't look too closely at this because... Don't really want to start any flame wars but that code nearly gave me cancer. *Every* single instruction that should have a size specifier (because of multiple valid options), and there's almost a dozen of them is that short clip, doesn't have it. And moveq, which is *always* a longword operation and doesn't ever need it, has it.
When I see that I just check out. Having to decipher M68K code because it's butchered so badly is not my idea of fun. I mean, it's OK when it's some fresh guy who's still learning the basics and does all kind of stuff, we all did that at some point, but this... Nothing personal, I just don't understand why would anyone write so ambiguous code: it's more error prone, less portable, others have to waste time figuring out what you wanted to do, ...
|
It is impossible to please everyone. It is quite common to skip .w qualifier. You can find proofs for this in many books. However I can try to make the code specifically for your tastes.
68020/30:
Code:
divul.l d4,d7:d6
move.w d7,(a3)
68000:
Code:
moveq.l #0,d7
divu.w d4,d6
bvc .div32no
swap d6
move.w d6,d7
divu.w d4,d7
swap d7
move.w d7,d6
swap d6
divu.w d4,d6
.div32no
move.w d6,d7
clr.w d6
swap d6
move.w d6,(a3)
68020/30 (improved):
Code:
moveq.l #0,d7
divu.w d4,d6
bvc .div32no
divul.l d4,d7:d6
move.w d7,(a3) ;r[i] <- d%b
bra .div32f
.div32no
move.w d6,d7
clr.w d6
swap d6
move.w d6,(a3) ;r[i] <- d%b
.div32f
I hope it is ok for you now. And I can assure you that if your text was less poisonous the result would be the same.
However I almost sure that it is impossible to optimize this 68020/30 division better than I show in my improved code.
It is interesting that I could optimize the 80386 code for the same case.
The initial 80386 code
Code:
div esi
mov [si+ra+1],dx
The optimized 80386 code (it gives even a larger gain on the 80486)
Code:
rol eax,16
cmp ax,si
jnc .lx
mov dx,ax
shr eax,16
div si
.lxc: mov [si+ra+1],dx
Code:
.lx: rol eax,16
div esi
jmp .lxc
Indeed it is rather a complex optimization but I had a hope that the Amiga experts can know some 68k tricks which I missed.