32 bit multiplies and divides on 68000

Mrs Beanbag · 06 September 2015, 19:52

68000 has signed and unsigned multiply of two words into one longword, and division of a longword by a word giving a word result/remainder.

68020 can multiply and divide with longwords.

But sometimes we need to multiply and divide using longwords even on a 68000. So what is the best/easiest/fastest way to achieve this?

Don_Adan · 06 September 2015, 20:08

Quote:

Originally Posted by Mrs Beanbag

68000 has signed and unsigned multiply of two words into one longword, and division of a longword by a word giving a word result/remainder.

68020 can multiply and divide with longwords.

But sometimes we need to multiply and divide using longwords even on a 68000. So what is the best/easiest/fastest way to achieve this?

You can check utility.library source code from Wanted Team page. It contains optimised 68000 mul and div routines. Can be a few fastest if more scratch registers available.

Mrs Beanbag · 07 September 2015, 00:20

i don't seem able to find it

Don_Adan · 07 September 2015, 11:12

Quote:

Originally Posted by Mrs Beanbag

i don't seem able to find it

http://wt.exotica.org.uk/test.html
check short ROM package archive.

Mrs Beanbag · 07 September 2015, 13:27

thanks for that, but trying to extract the LZX archive seems to crash fs-uae for some reason...
weird

Thorham · 07 September 2015, 13:52

Here's a zip archive: ROM.zip

Mrs Beanbag · 07 September 2015, 14:14

Quote:

Originally Posted by Thorham

Here's a zip archive: Attachment 45386

Thanks Thorham!

I'm a bit confused though, signed and unsigned 32-bit multiplies appear to be the same function? Is that right?

Code:

SMult32S:
UMult32S:
    move.l    D2,-(SP)    ; 
    move.l    D0,-(SP)    ; A
    mulu.w    D1,D0        ; D0=Al*Bl
    move.l    D1,D2        ; B
    mulu.w    (SP)+,D1    ; D1=Ah*Bl
    swap    D2        ; D2=Bh
    mulu.w    (SP)+,D2    ; D2=Al*Bh
    add.w    D2,D1        ;
    swap    D1        ;
    move.l    (SP)+,D2    ;
    clr.w    D1        ;
    add.l    D1,D0        ;
    rts

Leffmann · 07 September 2015, 21:31

Yes it's right, the 32 most significant digits of the 64-bit product will differ, but the least significant digits will always be the same.

Mrs Beanbag · 07 September 2015, 23:17

interesting... makes sense now you mention it, and yet MULS.L and MULU.L have different opcodes.

Then again ASL and LSL have different opcodes too, and achieve the same result.

The 32 bit divide is much more complex though, i'm going to have to stare that that for a while...

Leffmann · 08 September 2015, 00:37

They give the same numerical result, but they differ in when they signal numerical overflow.

Mrs Beanbag · 08 September 2015, 10:44

of course they do, how silly of me! somehow i never had to check the overflow of a left-shift before...

Overflows also differ between MULU.L and MULS.L so i suppose the above code doesn't emit a correct overflow flag.

ReadOnlyCat · 10 September 2015, 05:23

Quote:

Originally Posted by Leffmann

Yes it's right, the 32 most significant digits of the 64-bit product will differ, but the least significant digits will always be the same.

I am completely befuddled by this affirmation.

I do not doubt it is correct but to me it sounds like you are saying that when multiplying the same (bit wise) two numbers with SMult32S you will obtain differing results than with UMult32S which is downright impossible.

So where does my misinterpretation lie?

TheDarkCoder · 10 September 2015, 13:43

it seems to me that the SMult32S and UMult32S compute a 32 bit result.
They do 32 x 32 -> 32.
Others have said that if you multiply two 32 x 32 numbers, the differences between the signed and unsigned mult do only affect the 32 most significant bits of the 64-bit result.
So if you only compute the lower 32 bits of the result, signed and unsigned produce the same number.
However the two operations should differ in how they deal with overflows and sign of the result (I think this is where the 020 instructions differ), while the proposed routines are the same.

ReadOnlyCat · 11 September 2015, 05:08

Quote:

Originally Posted by TheDarkCoder

it seems to me that the SMult32S and UMult32S compute a 32 bit result.
They do 32 x 32 -> 32.
Others have said that if you multiply two 32 x 32 numbers, the differences between the signed and unsigned mult do only affect the 32 most significant bits of the 64-bit result.
So if you only compute the lower 32 bits of the result, signed and unsigned produce the same number.
However the two operations should differ in how they deal with overflows and sign of the result (I think this is where the 020 instructions differ), while the proposed routines are the same.

Ah oki, the hypothetical high 32 bits of the 64 bit result.
Now this makes sense. Thanks!

This said I must admit I am surprised that two's complement multiplication works just like addition, I probably learned about it at the time but I must have forgotten since I expected it to fail somehow.

meynaf · 12 October 2015, 15:55

Here are my long mul & div routines. That's 64 bit versions of long mulu+divu. I hope they can be of any help ?
To be used when 32 bits are not enough.
If you want signed versions, well, do a few NEGs before and after

Here's the mul :

Code:

; umult64 - mulu.l d0,d0:d1
 move.l d2,-(a7)
 move.w d0,d2
 mulu d1,d2
 move.l d2,-(a7)
 move.l d1,d2
 swap d2
 move.w d2,-(a7)
 mulu d0,d2
 swap d0
 mulu d0,d1
 mulu (a7)+,d0
 add.l d2,d1
 moveq #0,d2
 addx.w d2,d2
 swap d2
 swap d1
 move.w d1,d2
 clr.w d1
 add.l (a7)+,d1
 addx.l d2,d0
 move.l (a7)+,d2
 rts

And for the div. Result is undefined in case of overflow, but you get V properly set.

Code:

; udivmod64 - divu.l d2,d0:d1
 move.l d3,-(a7)
 moveq #31,d3
.loop
 add.l d1,d1
 addx.l d0,d0
 bcs.s .over
 cmp.l d2,d0
 bcs.s .sui
 sub.l d2,d0
.re
 addq.b #1,d1
.sui
 dbf d3,.loop
 move.l (a7)+,d3	; v=0
 rts
.over
 sub.l d2,d0
 bcs.s .re
 move.l (a7)+,d3
 ori #4,ccr		; v=1
 rts

Thorham · 10 May 2017, 16:08

64bit / 32bit = 64bit with 32bit remainder. Not well tested yet:

Code:

;
; in:
;
; d0 = 32bit divisor
; d1 = low 32bit numerator
; d2 = high 32bit numerator
;
; out:
;
; d1 = low 32bit quotient
; d2 = high 32bit quotient
; d3 = 32bit remainder
;
divu64
    move.l  d7,-(sp)

    clr.l   d3

    move.l  #64-1,d7
.loop
    add.l   d1,d1
    addx.l  d2,d2
    addx.l  d3,d3
    bcs     .l1

    cmp.l   d0,d3
    bcs     .l2
.l1
    sub.l   d0,d3
    addq.l  #1,d1
.l2
    dbra    d7,.loop

    move.l  (sp)+,d7
    rts

06 October 2018, 21:05

I don't know if this has already been covered but Karatsuba multiplication will perform a 32-bit x 32-bit --> 64-bit result using 3 MULU instructions. The larger yo go, the more efficient it gets (25%+12.5%+6.25%...) until you get over about 4096 bits.

[ Show youtube player ]

If you do want to go for >4096 bit factors, Toom-Cook multiplication is faster.

https://www.spectroom.com/1022825714...multiplication

I hope this is of value to someone

Don_Adan · 07 October 2018, 20:01

Quote:

Originally Posted by Clubcard

I don't know if this has already been covered but Karatsuba multiplication will perform a 32-bit x 32-bit --> 64-bit result using 3 MULU instructions. The larger yo go, the more efficient it gets (25%+12.5%+6.25%...) until you get over about 4096 bits.

[ Show youtube player ]

If you do want to go for >4096 bit factors, Toom-Cook multiplication is faster.

https://www.spectroom.com/1022825714...multiplication

I hope this is of value to someone

Yes. Nice idea, but step 3 of Karatsuba multiplication needs mulu.l, not mulu.w. Then this version can not be fastest for 68000. 2 mulu.w and 1 mulu.l is necessary, if i understand this idea correctly. 32-bit x 32-bit --> 64-bit needs 4 mulu.w instructions. Maybe when I will back to life I will check it close.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Best Way to Convert 32-bit Signed Value to 16 Bit?	AGS	Coders. Asm / Hardware	31	29 December 2013 13:58
32-bit access on 16-bit bus?	NorthWay	Coders. Asm / Hardware	7	04 September 2013 00:46
REQ: 17-Bit Artwork 2 (1988-04)(17-Bit Software)	Sea7	request.Demos	5	13 May 2011 01:07
8 bit to optimized 6 bit palette histogram improvements needed	NovaCoder	Coders. General	0	14 April 2011 02:13
My A500 is dying bit by bit :(	Old Fool	support.Hardware	3	03 July 2009 17:12

06 September 2015, 19:52	#1
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	32 bit multiplies and divides on 68000 68000 has signed and unsigned multiply of two words into one longword, and division of a longword by a word giving a word result/remainder. 68020 can multiply and divide with longwords. But sometimes we need to multiply and divide using longwords even on a 68000. So what is the best/easiest/fastest way to achieve this?

07 September 2015, 00:20	#3
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	i don't seem able to find it

07 September 2015, 13:27	#5
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	thanks for that, but trying to extract the LZX archive seems to crash fs-uae for some reason... weird

07 September 2015, 13:52	#6
Thorham Computer Nerd Join Date: Sep 2007 Location: Rotterdam/Netherlands Age: 47 Posts: 3,751	Here's a zip archive: ROM.zip

07 September 2015, 21:31	#8
Leffmann Join Date: Jul 2008 Location: Sweden Posts: 2,269	Yes it's right, the 32 most significant digits of the 64-bit product will differ, but the least significant digits will always be the same.

07 September 2015, 23:17	#9
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	interesting... makes sense now you mention it, and yet MULS.L and MULU.L have different opcodes. Then again ASL and LSL have different opcodes too, and achieve the same result. The 32 bit divide is much more complex though, i'm going to have to stare that that for a while...

08 September 2015, 00:37	#10
Leffmann Join Date: Jul 2008 Location: Sweden Posts: 2,269	They give the same numerical result, but they differ in when they signal numerical overflow.

08 September 2015, 10:44	#11
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	of course they do, how silly of me! somehow i never had to check the overflow of a left-shift before... Overflows also differ between MULU.L and MULS.L so i suppose the above code doesn't emit a correct overflow flag.

10 September 2015, 13:43	#13
TheDarkCoder Registered User Join Date: Dec 2007 Location: Dark Kingdom Posts: 213	it seems to me that the SMult32S and UMult32S compute a 32 bit result. They do 32 x 32 -> 32. Others have said that if you multiply two 32 x 32 numbers, the differences between the signed and unsigned mult do only affect the 32 most significant bits of the 64-bit result. So if you only compute the lower 32 bits of the result, signed and unsigned produce the same number. However the two operations should differ in how they deal with overflows and sign of the result (I think this is where the 020 instructions differ), while the proposed routines are the same.

06 October 2018, 21:05	#17
Clubcard Posts: n/a	I don't know if this has already been covered but Karatsuba multiplication will perform a 32-bit x 32-bit --> 64-bit result using 3 MULU instructions. The larger yo go, the more efficient it gets (25%+12.5%+6.25%...) until you get over about 4096 bits. [ Show youtube player ] If you do want to go for >4096 bit factors, Toom-Cook multiplication is faster. https://www.spectroom.com/1022825714...multiplication I hope this is of value to someone

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)