English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 22 June 2012, 09:49   #21
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
@Photon - Your example "non-div, non-table, non-BCD" do not works - or I'm missing something. On out D4 contains - $3c3b3b3b.
Asman is offline  
Old 22 June 2012, 09:56   #22
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by Photon View Post
If you need to replace the movep you've saved 1 cycle per digit, if you add to the score maximum once per frame. If the player should kill 2 baddies in one frame, the BCD method takes longer.
Hm. As I'm not drawing the scores every frame, but only when changed, my advantage is probably not as high as I thought. Also my example code was wrong. When I wrote it from memory I swapped the nibbles during conversion. In reality I cannot use (a0)+, but need d16(a0), which costs some more cycles.

Quote:
The table takes 200 bytes
You're right. I had multiplied it by 2, although it is always the same table for both references.

Quote:
Here's a non-div, non-table, non-BCD alternative:
Doesn't seem to be much better.

Quote:
I was really only interested in showing BCD gives no real gain
Admitted the gain is small. But I still see a minor advantage using it, which is reason enough for me.
phx is offline  
Old 22 June 2012, 13:18   #23
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
Why subq.l #8,SP, not subq.l #4,SP?
Anyway, 512 bytes table version will be perhaps the fastest for your routine and 68000 CPU.


Code:
Code:
4       moveq #0,D0
12      move.b  Score(a4),d0
4       add.w D0,D0
14      move.w Table(PC,D0.W),D0
4       swap D0
12      move.b  Score+1(a4),d0
4       add.w D0,D0
14      move.w Table(PC,D0.W),D0
12      move.l D0,-(SP)
---
80
Don_Adan is offline  
Old 22 June 2012, 15:14   #24
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Indeed, that's much faster, although 512 bytes is a lot. I must think about it (16 modules also have to fit into chip memory, preferably into 512K).

In the original version I used subq.l #8,sp, because I also need a string terminating 0-byte, which I forgot here.
phx is offline  
Old 23 June 2012, 19:52   #25
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Quote:
Originally Posted by Don_Adan View Post
Why subq.l #8,SP, not subq.l #4,SP?
Anyway, 512 bytes table version will be perhaps the fastest for your routine and 68000 CPU.


Code:
Code:
4       moveq #0,D0
12      move.b  Score(a4),d0
4       add.w D0,D0
14      move.w Table(PC,D0.W),D0
4       swap D0
12      move.b  Score+1(a4),d0
4       add.w D0,D0
14      move.w Table(PC,D0.W),D0
12      move.l D0,-(SP)
---
80
Doesn't work, though. And doesn't include add-instructions, so no wonder it's fast...

Quote:
Originally Posted by Asman View Post
@Photon - Your example "non-div, non-table, non-BCD" do not works - or I'm missing something. On out D4 contains - $3c3b3b3b.
Well, I wrote it in 10 minutes without an assembler. Includes imagination time. Was supposed to show the general technique. I changed branch polarity without changing carry polarity. Changes in bold.
Code:
maxdigit="0"+"0"+10
mindigit="0"+1<<8
;8	lea score(pc),a1 
28	add.l #"1000",score(a4)

4	moveq #maxdigit,d0
4	moveq #mindigit,d1
4	moveq #10-2<<8,d2
;8	move.w #1<<8,d3

16	move.l score(a4),d4
	REPT 4
4	sub.b d0,d4
18/2	bpl.s .ok
4/2	add.w d2,d4
;4/2	add.w d3,d4	;carry
.ok:4	add.w d1,d4
24	ror.l #8,d4
45x4	ENDR
16	move.l d4,score(a4)
=244
...
score:  dc.b "0000"
Here's a simpler one that's 136 cycles, also made in 10 minutes. Did you say just the ASCII conversion took 140 cycles?
Code:
	lea score(PC),a1
	add.l #"2498"-"0000",(a1)+
;...

	moveq #"9"+1,d0
	moveq #9+1,d1
;36
	cmp.b -(a1),d0
	bgt.s .ok
	sub.b d1,(a1)
	addq.b #1,-1(a1)
.ok:
	cmp.b -(a1),d0
	bgt.s .ok2
	sub.b d1,(a1)
	addq.b #1,-1(a1)
.ok2:
	cmp.b -(a1),d0
	bgt.s .ok3
	sub.b d1,(a1)
	addq.b #1,-(a1)
.ok3:
;34x3-2
;	cmp.b -(a1),d0
;	bgt.s .ok4
;	sub.b d1,(a1)
;;	addq.b #1,-(a1)
;.ok4:
Actually, for already ASCII-based scores there's no table method, div or not, that can be faster than this.
Photon is offline  
Old 24 June 2012, 13:06   #26
Codetapper
2 contact me: email only!
 
Codetapper's Avatar
 
Join Date: May 2001
Location: Auckland / New Zealand
Posts: 3,182
Lightbulb 8 digit BCD score without movep, div or a table lookup

What about this routine to convert an 8 digit score from BCD ready to print with an 8x8 font already in registers d0-d3?

By shifting the initial score left by 3 places you have effectively eliminated the need to multiply each byte by 8 to lookup the graphics in an 8x8 font table. The graphic offsets all end up in registers d0-d3 at the end so the printing code will need an unrolled loop to select the correct register and a swap needed on each after the first four score digits have been printed.

Note: For simplification purposes I have left the values of the registers on the right hand side as if they had not been multiplied by 8, as the numbers are hard to picture in your head otherwise!

Code:
 12        move.l  #$000f000f<<3,d4      ;d4=$000f000f*8 (=$00780078)
 16        move.l  ScoreAsBCD(pc),d3     ;d3=$12345678
 14        rol.l   #3,d3                 ;d3=Score*8 (pretend it's $12345678)
  4        move.l  d3,d0
 16        rol.l   #4,d0                 ;d0=$23456781
  4        move.l  d0,d1
 16        rol.l   #4,d1                 ;d1=$34567812
  4        move.l  d1,d2
 16        rol.l   #4,d2                 ;d2=$45678123
  6        and.l   d4,d0                 ;d0=$00050001
  6        and.l   d4,d1                 ;d1=$00060002
  6        and.l   d4,d2                 ;d2=$00070003
  6        and.l   d4,d3                 ;d3=$00040008
  4        swap    d3                    ;d3=$00080004
---
130 cycles
To print out the score you'd need something like the following with a PRINT macro that outputs the 8 bytes to the screen and advances the pointer by 1 each time:

Code:
        lea     FontDataDigits(pc),a0   ;8x8 font for chars 0-9
        lea     (a0,d0.w),a1            ;a1=Graphics data to print
        PRINT
        lea     (a0,d1.w),a1
        PRINT
        lea     (a0,d2.w),a1
        PRINT
        lea     (a0,d3.w),a1
        PRINT
        swap    d0
        lea     (a0,d0.w),a1
        PRINT
        swap    d1
        lea     (a0,d1.w),a1
        PRINT
        swap    d2
        lea     (a0,d2.w),a1
        PRINT
        swap    d3
        lea     (a0,d3.w),a1
        PRINT
If your font was an odd height like 9 pixels tall, you could still use the same technique, shift by 4 to begin with (thus multiplying the table lookup by 16), and pad the font graphics out to 16 pixel boundaries for the digits. Or shift by 5 (thus multiplying by 32) to use 16x16 fonts (2 bytes * 16 pixels tall) etc.

Last edited by Codetapper; 24 June 2012 at 13:15. Reason: Added different font size bit
Codetapper is offline  
Old 24 June 2012, 16:23   #27
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
That's fine, although this separate task would be faster with ASCII than BCD. With BCD, the digit-pair blitjumptable mentioned would be the fastest.

If I were to update 8 digits, I'd definitely only blit the digits that changed.

and.l Dn,Dn takes 8 cycles.

I still think it's silly to optimize these things. It's not even done every frame. Just saving 2 cycles by profiling branch polarity in the render loops would save much more.
Photon is offline  
Old 24 June 2012, 20:51   #28
Codetapper
2 contact me: email only!
 
Codetapper's Avatar
 
Join Date: May 2001
Location: Auckland / New Zealand
Posts: 3,182
You can still use the PRINT macro to only output the changed digits with this method aswell.

The digit-pair blit table would end up rather large, as you'd have $99+1 valid combinations of numbers, each say 8 pixels tall by a 2 bytes wide. Really starting to consume some memory just to print the score.

BTW I have rechecked the Motorola reference and it shows and.l dn,dn is 6 cycles with a 16 bit CPU unless it's a misprint? See p120 here. (It doesn't make sense to me that an 'add' long operation would take 6 cycles yet an 'and' would be 8).

Last edited by Codetapper; 24 June 2012 at 23:35.
Codetapper is offline  
Old 25 June 2012, 01:38   #29
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
You can modify any digit rendering code known to man to only blit the changed digits My point was selecting which digits to render will save much more than optimized render code AND optimized score adding/conversion routine.

Using a table of pointers or pointer-pairs to graphics with BCD is only of any use if you want to blit digit-pairs as opposed to digits. Its real use is in bypassing both BCD and ASCII.

Add/sub/and/or/eor.l Rn,Rn all take 8, check the asterisks.

(In some cases also, the plus sign at the end, signifying adding the ea time from the ea tables, is missing. If the column heading contains 1 or more of <ea> or M, you should add the correct timing addition for each occurrence.)


My bigger point was that it's not really fruitful to shave cycles for score updating. If I were to do it, I'd certainly not abcd 4x36 cycles all over the code and then save 12 cycles in the conversion. If I were on a cycle-budget I'd just have a counter 1..n digits and every nth frame check if that digit has changed and update it that frame if so.
Photon is offline  
Old 25 June 2012, 07:33   #30
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
I think you are starting to scare away beginner programmers with this thread they will think that printing a number on the screen is a delicate and complex science of its own!
Leffmann is offline  
Old 25 June 2012, 12:05   #31
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
Quote:
Originally Posted by Photon View Post
Doesn't work, though. And doesn't include add-instructions, so no wonder it's fast...
If original Phx's version works, then this version can works without problem too. Seems you don't understand, how can works good table add.l #'0000',Dx result is already stored in the table, but of course you can add special instruction too, if you like slow code.
Don_Adan is offline  
Old 25 June 2012, 12:14   #32
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
Quote:
Originally Posted by phx View Post
Indeed, that's much faster, although 512 bytes is a lot. I must think about it (16 modules also have to fit into chip memory, preferably into 512K).

In the original version I used subq.l #8,sp, because I also need a string terminating 0-byte, which I forgot here.
Then you must use clr.l/w -(SP), not subq.l.
You can use also similar code, but this is fast only for 68020+.

Code:
Code:
       moveq #0,D0
       move.w Score(a4),d0
        lsl.l #4,D0
        lsr.w #4,D0
        lsl.l #8,D0
        lsr.w #4,D0
        lsr.b #4,D0
        clr.l -(SP)
        add.l #'0000',D0
        move.l D0,-(SP)
Don_Adan is offline  
Old 25 June 2012, 15:34   #33
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Quote:
Originally Posted by Don_Adan View Post
If original Phx's version works, then this version can works without problem too. Seems you don't understand, how can works good table add.l #'0000',Dx result is already stored in the table, but of course you can add special instruction too, if you like slow code.
Adding to the score. To compare with the others, your example would be 92+80=172 cycles, with the abcd etc instructions.

Quote:
Originally Posted by Leffmann View Post
I think you are starting to scare away beginner programmers with this thread they will think that printing a number on the screen is a delicate and complex science of its own!
Heh, maybe we are. But if someone can't quite get this BCD thing, this thread now shows a few different options. And I got a better NumToDec routine out of it, so I'm happy
Photon is offline  
Old 27 June 2012, 13:43   #34
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
Two register version of Mcoder version:

Code:
Code:

        lea     score(pc),a0
        move.l  (A0),d0
        and.l   #$0F0F0F0F,d0
        add.l   #$30303030,d0
        movep.l d0,5(a0)
        move.l  (A0),d0
        lsr.l   #4,d0
        and.l   #$0F0F0F0F,d0
        add.l   #$30303030,d0
        movep.l d0,4(a0)
        rts

score
       dc.l 0
digits
       ds.b 8

Last edited by Don_Adan; 02 July 2012 at 12:09.
Don_Adan is offline  
Old 23 August 2014, 09:54   #35
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Eek

Sorry to bump this old thread, but while trying to freshen up my 68k assembler skills by looking at some example code, this seemingly simple challenge appeared "* converting to decimal is left as an exercise to the reader!".

I thought it would be an easy task, but it wasn't, hehe. After trying a few ideas which failed I resorted to google, several searches later I ended up in this thread (should have known to look here first).

In contrast to the brilliant suggestions in this thread, I present to you what has to be the slowest method ever conceived.

Code:
	lea	thestring(pc),a0
	bsr	decimalconvert
	
	;...

; convert d0.l to decimal ascii string in (a0) 
decimalconvert
	movem.l	d1-d2,-(sp)
	move.w	#10-1,d2
	adda.l	#10,a0
.loop	move.l	#10,d1
	jsr	l32div
	move.b	d1,-(a0)
	add.b	#'0',(a0)
	dbra	d2,.loop
	movem.l	(sp)+,d1-d2
	rts

; long 32bit division - sloooooow!
; in: d0 = dividend, d1 = divisor
; out: d0 = quotient, d1 = remainder

l32div
	movem.l	d2-d3,-(sp)
	clr.l	d2
	clr.l	d3
	tst.l	d0
	bge	.x1
	addq.l	#1,d3
	neg.l	d0
.x1	tst.l	d1
	bge	.loop
	addq.l	#1,d3
	neg.l	d1
.loop	cmp.l	d0,d1
	bgt	.done
	sub.l	d1,d0
	addq.l	#1,d2
	bra	.loop
.done	btst	#0,d3	
	beq	.x2
	neg.l	d2
	neg.l	d0
.x2	move.l	d0,d1
	move.l	d2,d0
	movem.l	(sp)+,d2-d3
	rts
This crap works, but takes 7 seconds on my A1200 (68030)!!

I tried to use some of the code posted in this forum thread and failed, this example requires 10 digits decimal ASCII (full 32bit).

The problem could easily be solved by using divul.l (instead of that loop horror pasted above) and set the compiler to 68020+, but want this to work on a plain 68000.

Any ideas how to make it faster?
(Doesn't have to be super fast, but better than 7 seconds, hehe.)

Alternately if someone can help me get the ABCD routines in this forum thread to work with 10 digit decimal.

BTW: Shouldn't this thread be in "Coders. Asm / Hardware"?
modrobert is offline  
Old 23 August 2014, 11:01   #36
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by modrobert View Post
This crap works, but takes 7 seconds on my A1200 (68030)!!
It's instant in both (emulated) A1200/020 and A500/68000.
The "7 seconds" was a typo, wasn't it?
alkis is offline  
Old 23 August 2014, 11:12   #37
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by alkis View Post
It's instant in both (emulated) A1200/020 and A500/68000.
The "7 seconds" was a typo, wasn't it?
No, it's real unfortunately, 7 seconds before the number appears in my window. I compile and run with Devpac on the A1200. If I use divul.l (68020+) instead of the l32div subroutine it is "instant".

Try "cycle exact" A500 if using emulator, any mode with JIT turned off. Also, make sure the number in d0 is big (>1,000,000,000). When getting 7 seconds my d0 was roughly 100,000,000 (FreeMem result).

EDIT: Full source code attached.
Attached Files
File Type: s freemem_mod.s (5.2 KB, 144 views)

Last edited by modrobert; 23 August 2014 at 11:34. Reason: Added source code and the part about large number in d0.
modrobert is offline  
Old 23 August 2014, 11:26   #38
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,502
I don't think this has anything to do with BCD, neither source or destination is a BCD number..

Anyway, very boring and naive method is to simply first subtract 1 000 000 000 and keep subtracting until value becomes smaller (if it was larger originally) than 1 000 000 000. Number of times subtracted = first digit. (or blank if you want to remove leading zeros and count was zero)

Then do the same with 1 000 000 00 and then 1 000 000 0 (put these values in array) and so on.. This method does not need multiplication or division, in worst case it loops 10 * number of digits which is not that bad. Very tiny and fast loop, at least when compared to relatively slow 68000 multiplication and division instructions.
Toni Wilen is offline  
Old 23 August 2014, 11:36   #39
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
OK, thanks for the suggestion.

I tried using ABCD from Codetapper's routines in this thread to speed things up, but could only get it to work for 8 digit decimal numbers.

EDIT: Toni, hmm, the l32div routine pasted (in my first post) does what you suggest already.

Last edited by modrobert; 23 August 2014 at 11:54. Reason: Second thought.
modrobert is offline  
Old 23 August 2014, 12:41   #40
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
You can always use the OS.
Code:
* converts d0 long number to decimal ascii at (a0)
decimalconvert
	movem.l d0/a0-a3,-(sp)
	move.l d0,savelongvalue
	lea.l savelongvalue(pc),a1
	move.l a0,a3
	lea.l formatString(pc),a0
	lea.l stuffChar(pc),a2
	CALLEXEC RawDoFmt
	movem.l (sp)+,d0/a0-a3
	rts

stuffChar:
    move.b  d0,(a3)+
    rts

savelongvalue dc.l 0
formatString  dc.b '%10ld',0
	EVEN
alkis is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Prefs/DefIcons howto ? amiga support.Apps 1 04 October 2008 18:34
Got a Catweasel MK2... howto? Photon support.Hardware 3 27 July 2008 16:22
MKick Howto? maxlock support.Other 2 12 June 2008 19:01
My CD32-compilation HOWTO... frostwork Amiga scene 1 05 January 2005 15:53

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 09:16.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.18187 seconds with 14 queries