03 July 2021, 12:57 | #461 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
I actually just did what I was "arguing" about with Alkis ;P. Installing nuke detector as I type this...
Optionally merged the prompt stuff into a single message. It's 4 bytes shorter exe (6 bytes code), so it's still pain time for me to gain 2 more (without obvious corner cutting! :P) to round it down, and that would also annulate the inevitable "must handle day transition" 8 bytes code. Also, "#" instead of "number" would've worked better in your case. |
03 July 2021, 14:25 | #462 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
Umm, why do we call Forbid (even with macro) again?
|
03 July 2021, 14:36 | #463 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
486 bytes (removing Forbid)
Edit: Yeah and if we dont Forbid, there is no reason for Exec. 482 bytes. Last edited by alkis; 03 July 2021 at 14:43. Reason: removed move.l $4.w,a5 |
03 July 2021, 16:23 | #464 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
|
03 July 2021, 19:02 | #465 | |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
Quote:
1.54% difference on 3000 digits. Could go without DMA for more speed, or 1 bitplane screen/shell if display always on is needed. And a question I've been meaning to ask. What's your actual executable size on disk? Cause if it isn't 64k I am assembling this with wrong parameters |
|
03 July 2021, 19:17 | #466 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
|
|
03 July 2021, 19:56 | #467 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
If you are using asm-one&co like myself (I noticed you commented in the PRINTVs so I guess that's the case), they can't handle merged code+bss or data+bss, they always write the entire bss part at the end to disk as well. So it's 65536+36=65572 bytes.
I'm a pleb so I handle this manually, RB + edit hunk size + WB with new size ;P. |
03 July 2021, 20:04 | #468 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
|
03 July 2021, 20:04 | #469 | |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
Quote:
|
|
03 July 2021, 20:27 | #470 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
Ah, ok. Now you made me curious... vasm can handle printt/printv as well, nice.
I guess I should clarify what I meant with 3x slower: "accidentally" (that's why the red dude emoticon) having a few juicy processes running and proclaim that on my machine it's 3x slower ;P. |
03 July 2021, 21:40 | #471 | |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
Quote:
|
|
03 July 2021, 22:31 | #472 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
So the range is now 468 (466 code) to 508 bytes depending on user settings (prompt, day timer, mt, doslib hack). The annoying part is most of the combinations could use another 2 byte reduction to drop the exe size by 4 and I can't find (yet) ;\.
I'll post my source by monday, gonna keep looking some more... And that will probably be it for me. edit: Typical, right? There it is (in getnum): Code:
; move.b #256-'0',d3 move.b d7,d3 ; 2000&255 = $d0 = -'0' Last edited by a/b; 03 July 2021 at 23:08. |
03 July 2021, 23:17 | #473 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
And here is the whole thing... Gonna spend the rest of the evening watching Gubbdata.
edit: Added cursor disabling option by alkis. edit2: Further size reduction by 2 bytes (4th print in getnum). edit3: -2 bytes, zero value reg reuse. Code:
;*************************************************************** ; user settings PRINT_DIGITS = 1 OPT_PROMPT = 1 ; use optimized prompt text? LONG_TIMER = 0 ; check for day transition? DISABLE_MT = 1 ; disable multitasking? CURSOR_OFF = 0 ; disable cursor (faster printing)? HACKS = 0 ; use undocumented OS stuff? ; exec TDNestCnt = 295 LibList = 378 LN_NAME = 10 ; list node name ; dos Input = -54 Output = -60 Read = -42 Write = -48 DateStamp = -192 TICKS_PER_SECOND = 50 ; dos timer frequency ; N = 7*D, D = digits, e.g. N = 700 for 100 digits ;*************************************************************** start IFEQ HACKS&(~DISABLE_MT) move.l 4.w,a5 ; exec library ENDIF IFEQ HACKS lea LibList(a5),a6 ; find dos in library list .lib_loop move.l (a6),a6 ; next library move.l LN_NAME(a6),a0 move.l #'.sod',d0 .lib_name cmp.b (a0)+,d0 bne.b .lib_loop lsr.l #8,d0 bne.b .lib_name ELSE lea -$148(a2),a6 ; dos library from bcpl vector ENDIF ; HACKS jsr Output(a6) move.l d0,a3 ; a3 = stdout lea workspace(pc),a4 movem.w (a4),d5/d6/d7/a2 ; 10000, MAXD, 2000, 7*4 bsr.w getnum ; returns N in d6 (k = N = 7*D) IFNE DISABLE_MT addq.b #1,TDNestCnt(a5) ; FORBID macro, a5 is free now ENDIF bsr.b .gettime ; reg copy: a0 = d7, d7 = d6 move.l d1,-(a7) ; start time ;*** TIMED PART START ****************************************** .fill move.w a0,(a4)+ ; 2000 subq.w #2,d7 bne.b .fill ; outer+inner loop: ; d3 upper word must initially be and remain 0 ; d7 must initially be 0 (c = 0) ; d0=*, d1=d, d2=b, d3=tmp, d4=10, d5=10000, d6=k, d7=c ; a0=*, a1=*, a2=7*4, a4=r[] (a3=stdout, a5=--, a6=dos) .outer_loop moveq #0,d1 ; d = 0 move.w d6,d2 subq.w #1,d2 ; b = k-1 bra.b .inner_entry .gettime ; returns ticks in d1, and copies: d7->a0, d6->d7 movem.l d0/d1/d2/d6/d7,-(sp) move.l sp,d1 jsr DateStamp(a6) movem.l (sp)+,d0/d1/d2/d7/a0 ; d0=days, d1=minutes, d2=ticks mulu.w #TICKS_PER_SECOND*60,d1 ; minutes to ticks add.l d2,d1 rts .longdiv ; d0/d2, 32/16 -> 32q/16r swap d0 move.w d0,d3 divu.w d2,d3 swap d3 move.w d3,d0 swap d0 divu.w d2,d0 move.w d0,d3 clr.w d0 swap d0 move.w d0,(a4) ; r[i] = d%b exg d0,d3 subq.w #2,d2 ; b -= 2 bcs.b .inner_done .inner_loop sub.l d0,d1 ; d = (d-d/b-d%b)/2 sub.l d3,d1 ; (same as d *= i) lsr.l #1,d1 .inner_entry move.w -(a4),d0 ; r[i] mulu.w d5,d0 add.l d0,d1 ; d += r[i]*10000 move.l d1,d0 divu.w d2,d0 ; d/b bvs.b .longdiv move.w d0,d3 ; d/b clr.w d0 swap d0 ; d%b move.w d0,(a4) ; r[i] = d%b subq.w #2,d2 ; b -= 2 bcc.b .inner_loop .inner_done divu.w d5,d1 ; d/10000 add.w d7,d1 ; d = c+d/10000 (to be printed out) move.l d1,d7 swap d7 ; c = d%10000 IFNE PRINT_DIGITS bsr.b PR0000 ENDIF sub.w a2,d6 ; k -= 7*4 add.l d6,a4 ; &r[k/2] bne.b .outer_loop ; k = 0? ;*** TIMED PART END ******************************************** bsr.b .gettime sub.l (a7)+,d1 ; end-start time IFNE LONG_TIMER ; I'll shoot if you ask for DST adjustment or anything similar. bpl.b .same_day add.l #TICKS_PER_SECOND*60*60*24,d1 .same_day ENDIF add.l d1,d1 ; dos ticks (1/50) to 1/100 divu.w #100,d1 ; 100ths upper, seconds lower move.l a4,d2 ; print buffer move.b #' ',(a4)+ bsr.b SPrintTime ; must have: d5 = 10000, d6 = 0 move.b #'.',(a4)+ moveq #'0',d6 ; print leading zeroes swap d1 moveq #10,d5 bsr.b SPrintTime move.b d4,(a4)+ ; newline move.l a4,d3 sub.l d2,d3 ; string length bra.b callwrite ; END OF PROGRAM (exec will re-enable multitasking) ;*************************************************************** SPrintTime ; d1=value, a4=buffer move.w d1,d0 .Next ext.l d0 divu.w d5,d0 ; digit 0-9 cmp.b d6,d0 beq.b .LeadZero moveq #'0',d6 add.b d6,d0 move.b d0,(a4)+ .LeadZero swap d0 divu.w d4,d5 bne.b .Next rts PR0000 ; d1=value move.l #'0000'-$01010001,d0 move.w -(a4),d3 .Loop addq.b #1,d0 ; top 3 digits in a loop add.w d3,d1 bpl.b .Loop sub.w d3,d1 rol.l #8,d0 move.w -(a4),d3 ; last value is string length (4) bmi.b .Loop add.b d1,d0 ; 4th digit move.l d0,-(a4) ; to print buffer moveq #pbuffer-workspace,d2 sub.l d2,a4 writetext add.l a4,d2 ; offset to buffer address callwrite move.l a3,d1 ; stdout jmp Write(a6) ; call Write(stdout,buffer,length) ;*************************************************************** ; Data must be in this order all up to msg1. CNOP 0,4 pbuffer DCB.B 4,0 ; keep it lword aligned preferably dec2str DC.W dec2str-pbuffer,-10,-100,-1000 ;*** OVERWRITTEN CODE/DATA STARTS HERE ************************* workspace MAXD = ((65536-(workspace-start))/7)&(~3) ; multiple of 4 DC.W 10000,MAXD,2000,7*4 msg1 DC.B "number pi calculator v18",10 ; odd length msg1end msg2 DC.B "number of digits (up to " ; even length IFNE OPT_PROMPT X SET MAXD DC.B '0'+X/1000 X SET X-(X/1000)*1000 DC.B '0'+X/100 X SET X-(X/100)*100 DC.B '0'+X/10 X SET X-(X/10)*10 DC.B '0'+X ENDIF msg2end msg3 DC.B ")? " ; odd length IFNE CURSOR_OFF DC.B $1b,"[0 p" ; to even length ENDIF msg3end IFEQ OPT_PROMPT EVEN printnum bra.b PR0000 ; chained short branch ENDIF msg4 DC.B " digits will be printed",10 ; even length msg4end EVEN ;*************************************************************** getnum moveq #10,d4 ; global const (here for alignment) moveq #msg1-workspace,d2 moveq #msg1end-msg1,d3 bsr.b writetext .error moveq #msg2-workspace,d2 IFNE OPT_PROMPT moveq #msg3end-msg2,d3 ELSE moveq #msg2end-msg2,d3 bsr.b writetext move.w d6,d1 ; MAXD bsr.b printnum moveq #msg3-workspace,d2 moveq #msg3end-msg3,d3 ENDIF bsr.b writetext jsr Input(a6) move.l d0,d1 ; stdin move.l a4,d2 ; read buffer moveq #4+1,d3 ; up to 4 digits + newline jsr Read(a6) ; returns length in d0 move.l d2,a0 moveq #0,d1 .nextch subq.w #1,d0 beq.b .parsed move.b d7,d3 ; 2000&$ff = $d0 = -'0' add.b (a0)+,d3 cmp.b d4,d3 ; digit 0-9? bhs.b .error mulu.w d4,d1 ; D = D*10+digit add.w d3,d1 ; d3 bits 8-15 must be clear bra.b .nextch .parsed cmp.w d6,d1 ; D > MAXD? bhi.b .error move.w d1,d3 ; D = 0? beq.b .error addq.w #3,d1 and.w #~3,d1 ; adjust D to a multiple of 4 moveq #7,d6 mulu.w d1,d6 ; k = N = 7*D cmp.b (a0),d4 ; last char is newline (1-4 digits)? bne.b .adjusted sub.w d1,d3 beq.b .not_adjusted .adjusted ; either 5 digits or adjusted D IFNE OPT_PROMPT bsr.w PR0000 ELSE bsr.b printnum ENDIF moveq #msg4end-msg4,d3 .not_adjusted moveq #msg4-workspace,d2 bra.w writetext ;*************************************************************** bss DS.B 65536-(*-start) ; 64kb allowed for code+data ;*************************************************************** ; Enable these if your assembler can handle them: ; PRINTV bss-start+36 ; 36 = hunk overhead ; PRINTV MAXD ; max number of digits ; PRINTV (pbuffer-start)&3 ; pbuffer alignment ;*************************************************************** Last edited by a/b; 05 July 2021 at 03:52. |
04 July 2021, 00:00 | #474 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
You can add this to user settings
Code:
CURSOROFF = 0 ; minor speed increase on printing Code:
msg3 DC.B ")? " ; odd length IFNE CURSOROFF DC.B $1B,"[0 p" ENDIF msg3end |
04 July 2021, 01:51 | #475 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,030
|
Quote:
|
|
04 July 2021, 03:08 | #476 |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,710
|
I bumped the priority up to +127 and it reduced execution time from 9.58 seconds to 9.52 seconds when running from a shell window on Workbench. However if I booted with no startup-sequence it only took 9.44 seconds with or without changing the task priority. So that means the background tasks I normally have running are using ~1.5% of the CPU.
The 'forbid' code doesn't seem to make any difference, perhaps because it is broken by DOS calls? I think the 'hack' option without other speedup tricks and run from the initial CLI with high priority is a fair comparison to the 386 DOS code, as it is closest to being the same environment. ltwr's pi.ibmpc.com program is 623 bytes. At 480 bytes we are only 77% of that size, as well as faster on a 50MHz 030 than the fastest 386 ever made (40MHz). Now to answer the question this thread was started for. litwr claimed that the 68020 was hard to optimize for compared to a 386, and he was right! Turns out that (in this case) using 68020 instructions provides no benefit. However we discovered that 68k code in general can easily be optimized for speed and code and density without having to limit it to a specific processor. This is good news for those of us who wish to 'write once, run anywhere' on any Amiga no matter what CPU it has. Last edited by Bruce Abbott; 04 July 2021 at 03:16. |
04 July 2021, 03:38 | #477 | ||
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 721
|
Quote:
Quote:
|
||
04 July 2021, 10:44 | #478 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
Yes, printing is breaking it. If you run with PRINT_DIGITS=0 it will work as expected, it will "freeze" WB and everything else.
|
04 July 2021, 10:57 | #479 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
|
|
04 July 2021, 10:57 | #480 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,054
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68020 Bit Field Instructions | mcgeezer | Coders. Asm / Hardware | 9 | 27 October 2023 23:21 |
68060 64-bit integer math | BSzili | Coders. Asm / Hardware | 7 | 25 January 2021 21:18 |
Discovery: Math | Audio Snow | request.Old Rare Games | 30 | 20 August 2018 12:17 |
Math apps | mtb | support.Apps | 1 | 08 September 2002 18:59 |
|
|