22 June 2012, 09:49 | #21 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
@Photon - Your example "non-div, non-table, non-BCD" do not works - or I'm missing something. On out D4 contains - $3c3b3b3b.
|
22 June 2012, 09:56 | #22 | ||||
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
|
Quote:
Quote:
Quote:
Quote:
|
||||
22 June 2012, 13:18 | #23 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Why subq.l #8,SP, not subq.l #4,SP?
Anyway, 512 bytes table version will be perhaps the fastest for your routine and 68000 CPU. Code: Code:
4 moveq #0,D0 12 move.b Score(a4),d0 4 add.w D0,D0 14 move.w Table(PC,D0.W),D0 4 swap D0 12 move.b Score+1(a4),d0 4 add.w D0,D0 14 move.w Table(PC,D0.W),D0 12 move.l D0,-(SP) --- 80 |
22 June 2012, 15:14 | #24 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
|
Indeed, that's much faster, although 512 bytes is a lot. I must think about it (16 modules also have to fit into chip memory, preferably into 512K).
In the original version I used subq.l #8,sp, because I also need a string terminating 0-byte, which I forgot here. |
23 June 2012, 19:52 | #25 | ||
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
Quote:
Quote:
Code:
maxdigit="0"+"0"+10 mindigit="0"+1<<8 ;8 lea score(pc),a1 28 add.l #"1000",score(a4) 4 moveq #maxdigit,d0 4 moveq #mindigit,d1 4 moveq #10-2<<8,d2 ;8 move.w #1<<8,d3 16 move.l score(a4),d4 REPT 4 4 sub.b d0,d4 18/2 bpl.s .ok 4/2 add.w d2,d4 ;4/2 add.w d3,d4 ;carry .ok:4 add.w d1,d4 24 ror.l #8,d4 45x4 ENDR 16 move.l d4,score(a4) =244 ... score: dc.b "0000" Code:
lea score(PC),a1 add.l #"2498"-"0000",(a1)+ ;... moveq #"9"+1,d0 moveq #9+1,d1 ;36 cmp.b -(a1),d0 bgt.s .ok sub.b d1,(a1) addq.b #1,-1(a1) .ok: cmp.b -(a1),d0 bgt.s .ok2 sub.b d1,(a1) addq.b #1,-1(a1) .ok2: cmp.b -(a1),d0 bgt.s .ok3 sub.b d1,(a1) addq.b #1,-(a1) .ok3: ;34x3-2 ; cmp.b -(a1),d0 ; bgt.s .ok4 ; sub.b d1,(a1) ;; addq.b #1,-(a1) ;.ok4: |
||
24 June 2012, 13:06 | #26 |
2 contact me: email only!
Join Date: May 2001
Location: Auckland / New Zealand
Posts: 3,182
|
8 digit BCD score without movep, div or a table lookup
What about this routine to convert an 8 digit score from BCD ready to print with an 8x8 font already in registers d0-d3?
By shifting the initial score left by 3 places you have effectively eliminated the need to multiply each byte by 8 to lookup the graphics in an 8x8 font table. The graphic offsets all end up in registers d0-d3 at the end so the printing code will need an unrolled loop to select the correct register and a swap needed on each after the first four score digits have been printed. Note: For simplification purposes I have left the values of the registers on the right hand side as if they had not been multiplied by 8, as the numbers are hard to picture in your head otherwise! Code:
12 move.l #$000f000f<<3,d4 ;d4=$000f000f*8 (=$00780078) 16 move.l ScoreAsBCD(pc),d3 ;d3=$12345678 14 rol.l #3,d3 ;d3=Score*8 (pretend it's $12345678) 4 move.l d3,d0 16 rol.l #4,d0 ;d0=$23456781 4 move.l d0,d1 16 rol.l #4,d1 ;d1=$34567812 4 move.l d1,d2 16 rol.l #4,d2 ;d2=$45678123 6 and.l d4,d0 ;d0=$00050001 6 and.l d4,d1 ;d1=$00060002 6 and.l d4,d2 ;d2=$00070003 6 and.l d4,d3 ;d3=$00040008 4 swap d3 ;d3=$00080004 --- 130 cycles Code:
lea FontDataDigits(pc),a0 ;8x8 font for chars 0-9 lea (a0,d0.w),a1 ;a1=Graphics data to print PRINT lea (a0,d1.w),a1 PRINT lea (a0,d2.w),a1 PRINT lea (a0,d3.w),a1 PRINT swap d0 lea (a0,d0.w),a1 PRINT swap d1 lea (a0,d1.w),a1 PRINT swap d2 lea (a0,d2.w),a1 PRINT swap d3 lea (a0,d3.w),a1 PRINT Last edited by Codetapper; 24 June 2012 at 13:15. Reason: Added different font size bit |
24 June 2012, 16:23 | #27 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
That's fine, although this separate task would be faster with ASCII than BCD. With BCD, the digit-pair blitjumptable mentioned would be the fastest.
If I were to update 8 digits, I'd definitely only blit the digits that changed. and.l Dn,Dn takes 8 cycles. I still think it's silly to optimize these things. It's not even done every frame. Just saving 2 cycles by profiling branch polarity in the render loops would save much more. |
24 June 2012, 20:51 | #28 |
2 contact me: email only!
Join Date: May 2001
Location: Auckland / New Zealand
Posts: 3,182
|
You can still use the PRINT macro to only output the changed digits with this method aswell.
The digit-pair blit table would end up rather large, as you'd have $99+1 valid combinations of numbers, each say 8 pixels tall by a 2 bytes wide. Really starting to consume some memory just to print the score. BTW I have rechecked the Motorola reference and it shows and.l dn,dn is 6 cycles with a 16 bit CPU unless it's a misprint? See p120 here. (It doesn't make sense to me that an 'add' long operation would take 6 cycles yet an 'and' would be 8). Last edited by Codetapper; 24 June 2012 at 23:35. |
25 June 2012, 01:38 | #29 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
You can modify any digit rendering code known to man to only blit the changed digits My point was selecting which digits to render will save much more than optimized render code AND optimized score adding/conversion routine.
Using a table of pointers or pointer-pairs to graphics with BCD is only of any use if you want to blit digit-pairs as opposed to digits. Its real use is in bypassing both BCD and ASCII. Add/sub/and/or/eor.l Rn,Rn all take 8, check the asterisks. (In some cases also, the plus sign at the end, signifying adding the ea time from the ea tables, is missing. If the column heading contains 1 or more of <ea> or M, you should add the correct timing addition for each occurrence.) My bigger point was that it's not really fruitful to shave cycles for score updating. If I were to do it, I'd certainly not abcd 4x36 cycles all over the code and then save 12 cycles in the conversion. If I were on a cycle-budget I'd just have a counter 1..n digits and every nth frame check if that digit has changed and update it that frame if so. |
25 June 2012, 07:33 | #30 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
I think you are starting to scare away beginner programmers with this thread they will think that printing a number on the screen is a delicate and complex science of its own!
|
25 June 2012, 12:05 | #31 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
If original Phx's version works, then this version can works without problem too. Seems you don't understand, how can works good table add.l #'0000',Dx result is already stored in the table, but of course you can add special instruction too, if you like slow code.
|
25 June 2012, 12:14 | #32 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
You can use also similar code, but this is fast only for 68020+. Code: Code:
moveq #0,D0 move.w Score(a4),d0 lsl.l #4,D0 lsr.w #4,D0 lsl.l #8,D0 lsr.w #4,D0 lsr.b #4,D0 clr.l -(SP) add.l #'0000',D0 move.l D0,-(SP) |
|
25 June 2012, 15:34 | #33 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
Quote:
Heh, maybe we are. But if someone can't quite get this BCD thing, this thread now shows a few different options. And I got a better NumToDec routine out of it, so I'm happy |
|
27 June 2012, 13:43 | #34 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Two register version of Mcoder version:
Code:
Code: lea score(pc),a0 move.l (A0),d0 and.l #$0F0F0F0F,d0 add.l #$30303030,d0 movep.l d0,5(a0) move.l (A0),d0 lsr.l #4,d0 and.l #$0F0F0F0F,d0 add.l #$30303030,d0 movep.l d0,4(a0) rts score dc.l 0 digits ds.b 8 Last edited by Don_Adan; 02 July 2012 at 12:09. |
23 August 2014, 09:54 | #35 |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
Sorry to bump this old thread, but while trying to freshen up my 68k assembler skills by looking at some example code, this seemingly simple challenge appeared "* converting to decimal is left as an exercise to the reader!".
I thought it would be an easy task, but it wasn't, hehe. After trying a few ideas which failed I resorted to google, several searches later I ended up in this thread (should have known to look here first). In contrast to the brilliant suggestions in this thread, I present to you what has to be the slowest method ever conceived. Code:
lea thestring(pc),a0 bsr decimalconvert ;... ; convert d0.l to decimal ascii string in (a0) decimalconvert movem.l d1-d2,-(sp) move.w #10-1,d2 adda.l #10,a0 .loop move.l #10,d1 jsr l32div move.b d1,-(a0) add.b #'0',(a0) dbra d2,.loop movem.l (sp)+,d1-d2 rts ; long 32bit division - sloooooow! ; in: d0 = dividend, d1 = divisor ; out: d0 = quotient, d1 = remainder l32div movem.l d2-d3,-(sp) clr.l d2 clr.l d3 tst.l d0 bge .x1 addq.l #1,d3 neg.l d0 .x1 tst.l d1 bge .loop addq.l #1,d3 neg.l d1 .loop cmp.l d0,d1 bgt .done sub.l d1,d0 addq.l #1,d2 bra .loop .done btst #0,d3 beq .x2 neg.l d2 neg.l d0 .x2 move.l d0,d1 move.l d2,d0 movem.l (sp)+,d2-d3 rts I tried to use some of the code posted in this forum thread and failed, this example requires 10 digits decimal ASCII (full 32bit). The problem could easily be solved by using divul.l (instead of that loop horror pasted above) and set the compiler to 68020+, but want this to work on a plain 68000. Any ideas how to make it faster? (Doesn't have to be super fast, but better than 7 seconds, hehe.) Alternately if someone can help me get the ABCD routines in this forum thread to work with 10 digit decimal. BTW: Shouldn't this thread be in "Coders. Asm / Hardware"? |
23 August 2014, 11:01 | #36 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
|
|
23 August 2014, 11:12 | #37 | |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
Quote:
Try "cycle exact" A500 if using emulator, any mode with JIT turned off. Also, make sure the number in d0 is big (>1,000,000,000). When getting 7 seconds my d0 was roughly 100,000,000 (FreeMem result). EDIT: Full source code attached. Last edited by modrobert; 23 August 2014 at 11:34. Reason: Added source code and the part about large number in d0. |
|
23 August 2014, 11:26 | #38 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,502
|
I don't think this has anything to do with BCD, neither source or destination is a BCD number..
Anyway, very boring and naive method is to simply first subtract 1 000 000 000 and keep subtracting until value becomes smaller (if it was larger originally) than 1 000 000 000. Number of times subtracted = first digit. (or blank if you want to remove leading zeros and count was zero) Then do the same with 1 000 000 00 and then 1 000 000 0 (put these values in array) and so on.. This method does not need multiplication or division, in worst case it loops 10 * number of digits which is not that bad. Very tiny and fast loop, at least when compared to relatively slow 68000 multiplication and division instructions. |
23 August 2014, 11:36 | #39 |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
OK, thanks for the suggestion.
I tried using ABCD from Codetapper's routines in this thread to speed things up, but could only get it to work for 8 digit decimal numbers. EDIT: Toni, hmm, the l32div routine pasted (in my first post) does what you suggest already. Last edited by modrobert; 23 August 2014 at 11:54. Reason: Second thought. |
23 August 2014, 12:41 | #40 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
|
You can always use the OS.
Code:
* converts d0 long number to decimal ascii at (a0) decimalconvert movem.l d0/a0-a3,-(sp) move.l d0,savelongvalue lea.l savelongvalue(pc),a1 move.l a0,a3 lea.l formatString(pc),a0 lea.l stuffChar(pc),a2 CALLEXEC RawDoFmt movem.l (sp)+,d0/a0-a3 rts stuffChar: move.b d0,(a3)+ rts savelongvalue dc.l 0 formatString dc.b '%10ld',0 EVEN |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Prefs/DefIcons howto ? | amiga | support.Apps | 1 | 04 October 2008 18:34 |
Got a Catweasel MK2... howto? | Photon | support.Hardware | 3 | 27 July 2008 16:22 |
MKick Howto? | maxlock | support.Other | 2 | 12 June 2008 19:01 |
My CD32-compilation HOWTO... | frostwork | Amiga scene | 1 | 05 January 2005 15:53 |
|
|