English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 06 September 2015, 20:52   #1
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
32 bit multiplies and divides on 68000

68000 has signed and unsigned multiply of two words into one longword, and division of a longword by a word giving a word result/remainder.

68020 can multiply and divide with longwords.

But sometimes we need to multiply and divide using longwords even on a 68000. So what is the best/easiest/fastest way to achieve this?
Mrs Beanbag is offline  
Old 06 September 2015, 21:08   #2
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 50
Posts: 1,050
Quote:
Originally Posted by Mrs Beanbag View Post
68000 has signed and unsigned multiply of two words into one longword, and division of a longword by a word giving a word result/remainder.

68020 can multiply and divide with longwords.

But sometimes we need to multiply and divide using longwords even on a 68000. So what is the best/easiest/fastest way to achieve this?
You can check utility.library source code from Wanted Team page. It contains optimised 68000 mul and div routines. Can be a few fastest if more scratch registers available.
Don_Adan is offline  
Old 07 September 2015, 01:20   #3
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
i don't seem able to find it
Mrs Beanbag is offline  
Old 07 September 2015, 12:12   #4
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 50
Posts: 1,050
Quote:
Originally Posted by Mrs Beanbag View Post
i don't seem able to find it
http://wt.exotica.org.uk/test.html
check short ROM package archive.
Don_Adan is offline  
Old 07 September 2015, 14:27   #5
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
thanks for that, but trying to extract the LZX archive seems to crash fs-uae for some reason...
weird
Mrs Beanbag is offline  
Old 07 September 2015, 14:52   #6
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
Here's a zip archive: ROM.zip
Thorham is offline  
Old 07 September 2015, 15:14   #7
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by Thorham View Post
Here's a zip archive: Attachment 45386
Thanks Thorham!

I'm a bit confused though, signed and unsigned 32-bit multiplies appear to be the same function? Is that right?

Code:
SMult32S:
UMult32S:
    move.l    D2,-(SP)    ; 
    move.l    D0,-(SP)    ; A
    mulu.w    D1,D0        ; D0=Al*Bl
    move.l    D1,D2        ; B
    mulu.w    (SP)+,D1    ; D1=Ah*Bl
    swap    D2        ; D2=Bh
    mulu.w    (SP)+,D2    ; D2=Al*Bh
    add.w    D2,D1        ;
    swap    D1        ;
    move.l    (SP)+,D2    ;
    clr.w    D1        ;
    add.l    D1,D0        ;
    rts
Mrs Beanbag is offline  
Old 07 September 2015, 22:31   #8
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,186
Yes it's right, the 32 most significant digits of the 64-bit product will differ, but the least significant digits will always be the same.
Leffmann is offline  
Old 08 September 2015, 00:17   #9
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
interesting... makes sense now you mention it, and yet MULS.L and MULU.L have different opcodes.

Then again ASL and LSL have different opcodes too, and achieve the same result.

The 32 bit divide is much more complex though, i'm going to have to stare that that for a while...
Mrs Beanbag is offline  
Old 08 September 2015, 01:37   #10
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,186
They give the same numerical result, but they differ in when they signal numerical overflow.
Leffmann is offline  
Old 08 September 2015, 11:44   #11
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
of course they do, how silly of me! somehow i never had to check the overflow of a left-shift before...

Overflows also differ between MULU.L and MULS.L so i suppose the above code doesn't emit a correct overflow flag.
Mrs Beanbag is offline  
Old 10 September 2015, 06:23   #12
ReadOnlyCat
Code Kitten

 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 47
Posts: 1,088
Quote:
Originally Posted by Leffmann View Post
Yes it's right, the 32 most significant digits of the 64-bit product will differ, but the least significant digits will always be the same.
I am completely befuddled by this affirmation.
I do not doubt it is correct but to me it sounds like you are saying that when multiplying the same (bit wise) two numbers with SMult32S you will obtain differing results than with UMult32S which is downright impossible.

So where does my misinterpretation lie?
ReadOnlyCat is offline  
Old 10 September 2015, 14:43   #13
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 211
it seems to me that the SMult32S and UMult32S compute a 32 bit result.
They do 32 x 32 -> 32.
Others have said that if you multiply two 32 x 32 numbers, the differences between the signed and unsigned mult do only affect the 32 most significant bits of the 64-bit result.
So if you only compute the lower 32 bits of the result, signed and unsigned produce the same number.
However the two operations should differ in how they deal with overflows and sign of the result (I think this is where the 020 instructions differ), while the proposed routines are the same.
TheDarkCoder is offline  
Old 11 September 2015, 06:08   #14
ReadOnlyCat
Code Kitten

 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 47
Posts: 1,088
Quote:
Originally Posted by TheDarkCoder View Post
it seems to me that the SMult32S and UMult32S compute a 32 bit result.
They do 32 x 32 -> 32.
Others have said that if you multiply two 32 x 32 numbers, the differences between the signed and unsigned mult do only affect the 32 most significant bits of the 64-bit result.
So if you only compute the lower 32 bits of the result, signed and unsigned produce the same number.
However the two operations should differ in how they deal with overflows and sign of the result (I think this is where the 020 instructions differ), while the proposed routines are the same.
Ah oki, the hypothetical high 32 bits of the 64 bit result.
Now this makes sense. Thanks!

This said I must admit I am surprised that two's complement multiplication works just like addition, I probably learned about it at the time but I must have forgotten since I expected it to fail somehow.
ReadOnlyCat is offline  
Old 12 October 2015, 16:55   #15
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,239
Here are my long mul & div routines. That's 64 bit versions of long mulu+divu. I hope they can be of any help ?
To be used when 32 bits are not enough.
If you want signed versions, well, do a few NEGs before and after


Here's the mul :
Code:
; umult64 - mulu.l d0,d0:d1
 move.l d2,-(a7)
 move.w d0,d2
 mulu d1,d2
 move.l d2,-(a7)
 move.l d1,d2
 swap d2
 move.w d2,-(a7)
 mulu d0,d2
 swap d0
 mulu d0,d1
 mulu (a7)+,d0
 add.l d2,d1
 moveq #0,d2
 addx.w d2,d2
 swap d2
 swap d1
 move.w d1,d2
 clr.w d1
 add.l (a7)+,d1
 addx.l d2,d0
 move.l (a7)+,d2
 rts
And for the div. Result is undefined in case of overflow, but you get V properly set.
Code:
; udivmod64 - divu.l d2,d0:d1
 move.l d3,-(a7)
 moveq #31,d3
.loop
 add.l d1,d1
 addx.l d0,d0
 bcs.s .over
 cmp.l d2,d0
 bcs.s .sui
 sub.l d2,d0
.re
 addq.b #1,d1
.sui
 dbf d3,.loop
 move.l (a7)+,d3	; v=0
 rts
.over
 sub.l d2,d0
 bcs.s .re
 move.l (a7)+,d3
 ori #4,ccr		; v=1
 rts
meynaf is offline  
Old 10 May 2017, 17:08   #16
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
64bit / 32bit = 64bit with 32bit remainder. Not well tested yet:

Code:
;
; in:
;
; d0 = 32bit divisor
; d1 = low 32bit numerator
; d2 = high 32bit numerator
;
; out:
;
; d1 = low 32bit quotient
; d2 = high 32bit quotient
; d3 = 32bit remainder
;
divu64
    move.l  d7,-(sp)

    clr.l   d3

    move.l  #64-1,d7
.loop
    add.l   d1,d1
    addx.l  d2,d2
    addx.l  d3,d3
    bcs     .l1

    cmp.l   d0,d3
    bcs     .l2
.l1
    sub.l   d0,d3
    addq.l  #1,d1
.l2
    dbra    d7,.loop

    move.l  (sp)+,d7
    rts
Thorham is offline  
Old 06 October 2018, 22:05   #17
Clubcard
Registered User

 
Join Date: Sep 2018
Location: Peterborough
Posts: 4
I don't know if this has already been covered but Karatsuba multiplication will perform a 32-bit x 32-bit --> 64-bit result using 3 MULU instructions. The larger yo go, the more efficient it gets (25%+12.5%+6.25%...) until you get over about 4096 bits.

[ Show youtube player ]

If you do want to go for >4096 bit factors, Toom-Cook multiplication is faster.

https://www.spectroom.com/1022825714...multiplication

I hope this is of value to someone
Clubcard is offline  
Old 07 October 2018, 21:01   #18
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 50
Posts: 1,050
Quote:
Originally Posted by Clubcard View Post
I don't know if this has already been covered but Karatsuba multiplication will perform a 32-bit x 32-bit --> 64-bit result using 3 MULU instructions. The larger yo go, the more efficient it gets (25%+12.5%+6.25%...) until you get over about 4096 bits.

[ Show youtube player ]

If you do want to go for >4096 bit factors, Toom-Cook multiplication is faster.

https://www.spectroom.com/1022825714...multiplication

I hope this is of value to someone
Yes. Nice idea, but step 3 of Karatsuba multiplication needs mulu.l, not mulu.w. Then this version can not be fastest for 68000. 2 mulu.w and 1 mulu.l is necessary, if i understand this idea correctly. 32-bit x 32-bit --> 64-bit needs 4 mulu.w instructions. Maybe when I will back to life I will check it close.
Don_Adan is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Best Way to Convert 32-bit Signed Value to 16 Bit? AGS Coders. Asm / Hardware 31 29 December 2013 14:58
32-bit access on 16-bit bus? NorthWay Coders. Asm / Hardware 7 04 September 2013 01:46
REQ: 17-Bit Artwork 2 (1988-04)(17-Bit Software) Sea7 request.Demos 5 13 May 2011 02:07
8 bit to optimized 6 bit palette histogram improvements needed NovaCoder Coders. General 0 14 April 2011 03:13
My A500 is dying bit by bit :( Old Fool support.Hardware 3 03 July 2009 18:12

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 08:36.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.
Page generated in 0.08518 seconds with 16 queries