English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 15 May 2016, 14:07   #81
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,200
Quote:
Originally Posted by alpine9000 View Post
I'd still be interested in seeing your 1000bit solution if you already have it.
I sent it to you via PM.

Quote:
Originally Posted by alkis View Post
An algorithm with no division
http://www.eng.utah.edu/~nmcdonal/Tu...onversion.html

Technically is binary2bcd, but you can go bcd2string pretty easy.
That's really smart, I always imagined converting to BCD was a lot more complex.

I don't know if it will be less code than doing long-form division, but it has to be much much faster.
Leffmann is offline  
Old 15 May 2016, 14:34   #82
alkis
Registered User

 
Join Date: Dec 2010
Location: Athens/Greece
Age: 47
Posts: 456
Quote:
Originally Posted by meynaf View Post
Unfortunately it doesn't work this well for large numbers (>8 bits), does it ?
I don't think there is a limit

https://en.wikipedia.org/wiki/Double_dabble
alkis is offline  
Old 15 May 2016, 14:43   #83
alpine9000
Registered User

 
Join Date: Mar 2016
Location: Australia
Posts: 620
Quote:
Originally Posted by alkis View Post
I don't think there is a limit

https://en.wikipedia.org/wiki/Double_dabble
And as an added bonus I can now directly convert binary to decimal in my head
alpine9000 is offline  
Old 15 May 2016, 14:50   #84
alkis
Registered User

 
Join Date: Dec 2010
Location: Athens/Greece
Age: 47
Posts: 456
Quote:
Originally Posted by meynaf View Post
Dunno if it can be considered as better (perhaps it's even a lot worse ), but i might have one.

It's quite simple : reverse the bit order of a 16-bit int.
This can easily be done with a table, so i put in a limitation : not more than 80 bytes, including data.
The winner is the fastest routine (020/030 timing). All 020+ instructions are allowed. All regs can be used.

Results can be published here. I won't personnally participate (but i can give my routine at the end if asked).
At second thoughts, the challenge may be interesting even if someone rushes in with a good routine : someone else can then find an optimization in that routine and win the game

And trust me : this cannot be done with just a 5 liner
68 bytes. No tables. (No idea how fast it runs, nothing to compare it to...)
Code:
	XDEF _reverse
EAB	SET	1
_reverse:
	IFND EAB
        move.l 	d2,-(sp)	
        move.l 	8(sp),d0
	ENDC
        move.w 	d0,d2
        lsr.w 	#1,d2
        and.w	#21845,d2
        and.w 	#21845,d0
        add.w	d0,d0
        or.w 	d0,d2
        move.w 	d2,d1
        lsr.w	#2,d1
        and.w	#13107,d1
        move.w	d2,d0
        and.w	#13107,d0
        lsl.w	#2,d0
        move.w	d1,d2
        or.w	d0,d2
        move.w	d2,d1
        lsr.w	#4,d1
        and.w	#3855,d1
        move.w	d2,d0
        and.w	#3855,d0
        lsl.w	#4,d0
        move.w	d1,d2
        or.w	d0,d2
        moveq.l	#0,d0
        move.b	d2,d0
        lsl.w	#8,d0
        lsr.w	#8,d2
        or.b	d2,d0
	IFND EAB
        move.l  (sp)+,d2
	ENDC
        rts
This is hand-tuned gcc's assembly output on (a slightly modified) http://graphics.stanford.edu/~seande...everseParallel
alkis is offline  
Old 15 May 2016, 14:50   #85
alkis
Registered User

 
Join Date: Dec 2010
Location: Athens/Greece
Age: 47
Posts: 456
Quote:
Originally Posted by alpine9000 View Post
And as an added bonus I can now directly convert binary to decimal in my head
Hah hah, true that.
alkis is offline  
Old 15 May 2016, 15:35   #86
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,338
Quote:
Originally Posted by alkis View Post
68 bytes. No tables. (No idea how fast it runs, nothing to compare it to...)
I counted 80 clocks (shifts take 4, immediates take 4, the others take 2).
meynaf is offline  
Old 15 May 2016, 16:15   #87
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
Quote:
Originally Posted by meynaf View Post
It's quite simple : reverse the bit order of a 16-bit int.
Code:
;
; d0 = 16 bit int
; d1 = scratch
;
reverse
    rol.w   #8,d0

    move.w  d0,d1
    and.w   #$f0f0,d0
    eor.w   d0,d1
    lsr.w   #4,d0
    lsl.w   #4,d1
    or.w    d1,d0

    move.w  d0,d1
    and.w   #$cccc,d0
    eor.w   d0,d1
    lsr.w   #2,d0
    lsl.w   #2,d1
    or.w    d1,d0

    move.w  d0,d1
    and.w   #$aaaa,d0
    eor.w   d0,d1
    lsr.w   #1,d0
    add.w   d1,d1
    or.w    d1,d0

    rts

Last edited by Thorham; 15 May 2016 at 16:22.
Thorham is offline  
Old 15 May 2016, 17:15   #88
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,225
Quote:
Originally Posted by Leffmann View Post
That's really smart, I always imagined converting to BCD was a lot more complex.
There are no divisions, but it's still time consuming.

Quote:
I don't know if it will be less code than doing long-form division, but it has to be much much faster.
It might still be a challenge to implement it with the double-dabble algorithm. You can do the adding of 3 and the shifting together. And you may choose to add $7b instead of 3, so that shifting a byte generates the X/C-flag for the next byte.

I made a quick example routine, which converts an unsigned 32 bit word into a 10-character ASCII buffer, but even with above optimizations it means that the main loop will be executed 320 times! I don't want to imagine the numbers for 1000 bits.

A problem is also that you lose the X-flag when adding $7b (so this code is 68000 only).
Code:
DIGITS  equ     10

u32_to_asc:
; a0 = ASCII outbuf buffer (10 digits)
; d0 = 32-bit unsigned binary

        moveq   #DIGITS-1,d1
.1:     clr.b   (a0)+
        dbf     d1,.1

        moveq   #32-1,d1
.loop:
        moveq   #DIGITS-1,d3
        move.l  a0,a1
        lsl.l   #1,d0
.2:     move.b  -(a1),d2
        cmp.b   #5,d2
        blo     .3
        move    sr,d4
        add.b   #$7b,d2
        move    d4,ccr
.3:     roxl.b  #1,d2
        move.b  d2,(a1)
        dbf     d3,.2
        dbf     d1,.loop

        moveq   #'0',d2
        moveq   #DIGITS-1,d1
.4:     add.b   d2,-(a0)
        dbf     d1,.4
        rts
phx is offline  
Old 15 May 2016, 21:38   #89
alkis
Registered User

 
Join Date: Dec 2010
Location: Athens/Greece
Age: 47
Posts: 456
Quote:
Originally Posted by phx View Post
I made a quick example routine, which converts an unsigned 32 bit word into a 10-character ASCII buffer, but even with above optimizations it means that the main loop will be executed 320 times! I don't want to imagine the numbers for 1000 bits.
Implementation with 32 times loop.
https://github.com/alberthynek/publi...2_68k.lst#L110

PS: Well actually it's 32x5 unrolled, I guess

Last edited by alkis; 15 May 2016 at 21:44. Reason: added the PS:
alkis is offline  
Old 15 May 2016, 21:40   #90
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
Quote:
Originally Posted by meynaf View Post
this cannot be done with just a 5 liner
Couldn't resist
Code:
;
; d0 = 16 bit int (scratch)
; d1 = reversed int
;
reverse
    moveq   #16-1,d7
.loop
    lsr.w   #1,d0
    addx.w  d1,d1
    dbra    d7,.loop
    rts
Thorham is offline  
Old 15 May 2016, 21:47   #91
alkis
Registered User

 
Join Date: Dec 2010
Location: Athens/Greece
Age: 47
Posts: 456
Quote:
Originally Posted by Thorham View Post
Couldn't resist
Code:
;
; d0 = 16 bit int (scratch)
; d1 = reversed int
;
reverse
    moveq   #16-1,d7
.loop
    lsr.w   #1,d0
    addx.w  d1,d1
    dbra    d7,.loop
    rts
lol nice one. You've never initialised d1 though :P
alkis is offline  
Old 15 May 2016, 22:11   #92
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by alkis View Post
lol nice one. You've never initialised d1 though :P
I'm not so sure it works either. I don't believe the following needs a number initialized though.

Code:
; 
; input: d1 = 16 bit int
; scratch: d1,d7
; return: d0 = reversed int
;
bitrev:
    moveq   #16-1,d7
.loop:
    lsr.w   #1,d1
    roxl.w  #1,d0
    dbra    d7,.loop
    rts
Edit: Tested and seems to be working now.

Last edited by matthey; 15 May 2016 at 22:35.
matthey is offline  
Old 15 May 2016, 22:33   #93
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,338
Quote:
Originally Posted by matthey View Post
I'm not so sure it works either. I don't believe the following needs d1 initialized though.
Neither need d1 initialised. And this, simply because 16 shifts are made so old bits are all thrown away.
The version with roxl instead of addx works as well but it's slower, btw

That said, though this is the shortest method, it's also the slowest one (this is why i didn't count it when i spoke about a 5-liner ).
meynaf is offline  
Old 15 May 2016, 22:36   #94
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
Quote:
Originally Posted by alkis View Post
lol nice one. You've never initialised d1 though :P
It doesn't have to be initialized, because the first 16 bits get shifted out. Works for 32 bits, too.

Quote:
Originally Posted by matthey View Post
I'm not so sure it works either.
The code is tested, and works just fine.

Code:
reverse
    moveq   #16-1,d7
.loop
    lsr.w   #1,d1
    lsl.w   #1,d0
    dbra    d7,.loop
    rts
This doesn't work, because lsl and lsr always shift in 0 bits. Tested, by the way.

Don't guess, guys, test

Edit: Code got corrected while writing this post

Last edited by Thorham; 15 May 2016 at 22:42.
Thorham is offline  
Old 15 May 2016, 22:39   #95
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
Quote:
Originally Posted by meynaf View Post
That said, though this is the shortest method, it's also the slowest one (this is why i didn't count it when i spoke about a 5-liner ).
I knew it! You said it on purpose

How about the other one I posted? Yeah, it's pretty obvious, of course.
Thorham is offline  
Old 15 May 2016, 22:44   #96
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by meynaf View Post
Neither need d1 initialised. And this, simply because 16 shifts are made so old bits are all thrown away.
The version with roxl instead of addx works as well but it's slower, btw

That said, though this is the shortest method, it's also the slowest one (this is why i didn't count it when i spoke about a 5-liner ).
You are correct. Thorham's method works and doesn't need d1 initialized (tested). One has to be clever with the ADDX, SUBX instructions on the 68000/68020/68030 .
matthey is offline  
Old 15 May 2016, 23:07   #97
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,200
Quote:
Originally Posted by meynaf View Post
It's quite simple : reverse the bit order of a 16-bit int.
This can easily be done with a table, so i put in a limitation : not more than 80 bytes, including data.
The winner is the fastest routine (020/030 timing). All 020+ instructions are allowed. All regs can be used.
I've got it down to 59 cycles, if I counted correctly, but I'm going to try a bit more.

Quote:
Originally Posted by phx View Post
I made a quick example routine, which converts an unsigned 32 bit word into a 10-character ASCII buffer, but even with above optimizations it means that the main loop will be executed 320 times! I don't want to imagine the numbers for 1000 bits.
It would be over 300 000 runs of that main loop in that case but maybe there's a way of driving this via a table to make it faster.
Leffmann is offline  
Old 16 May 2016, 02:08   #98
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
Code:
;
; d0 = 16 bit int
; d1 = scratch
;
reverse
    rol.b   #4,d0

    move.w  d0,d1
    and.w   #$cccc,d0
    eor.w   d0,d1
    rol.w   #4,d1
    or.w    d1,d0

    move.w  d0,d1
    and.w   #$aaaa,d0
    eor.w   d0,d1
    rol.w   #2,d1
    or.w    d1,d0

    rol.w   #5,d0
    rol.b   #4,d0

    rts
Thorham is offline  
Old 16 May 2016, 09:45   #99
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 45
Posts: 3,338
Seems you're the best now, my friend.

My version was :
Code:
 ror.b #4,d0
 ror.w #8,d0
 ror.b #4,d0
 move.w d0,d1
 lsr.w #2,d1
 lsl.w #2,d0
 eor.w d0,d1
 and.w #$3333,d1
 eor.w d1,d0
 move.w d0,d1
 lsr.w #1,d1
 add.w d0,d0
 eor.w d0,d1
 and.w #$5555,d1
 eor.w d1,d0
meynaf is offline  
Old 16 May 2016, 19:19   #100
Thorham
Computer Nerd

Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 42
Posts: 3,085
Quote:
Originally Posted by meynaf View Post
Seems you're the best now, my friend.
I wonder for how long
Thorham is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Starting ASM coding on A1200. Which Assembler? Nosferax Coders. Asm / Hardware 68 27 November 2015 17:14
4th tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 11 10 April 2013 21:32
3rd tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 6 26 March 2013 16:57
First tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 46 18 March 2013 13:33
2nd tutorial on ASM- and HW-coding Vikke Coders. Asm / Hardware 10 17 March 2013 12:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 07:12.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.
Page generated in 0.10123 seconds with 14 queries