04 July 2021, 11:00 | #481 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
|
04 July 2021, 11:16 | #482 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
|
04 July 2021, 13:31 | #483 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
Well, we have another -2 bytes.
The trick is to rearrange the registers in getnum and always do the 4th print call, but if there is no digit number adjustment that call is with a zero length (d3 ends up being 0) so nothing is printed. This makes an rts at the end redundant. I'll update the last code I posted. edit: And another -2 bytes (reusing a zero value register). Also, have a new time calc&print code done with testing to go, should be another -4 bytes. Will post it when ready. Last edited by a/b; 05 July 2021 at 03:49. |
05 July 2021, 05:05 | #484 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
Latest version (time calc&print reimplemented, -4 bytes), 456-500 bytes depending on settings.
edit: Slightly changed getnum, same byte size but shorter (-1 branch/label). edit2: -2 bytes print_time (but the same exe size in most cases). edit3: -4 bytes getnum, more cosmetics Code:
;*************************************************************** ; user settings PRINT_DIGITS = 1 OPT_PROMPT = 1 ; use optimized prompt text? DAY_TRANS = 0 ; check for day transition? DISABLE_MT = 0 ; disable multitasking? ALIGN_DATA = 1 ; 32-bit alignment for data? CURSOR_OFF = 0 ; disable cursor (faster printing)? HACKS = 0 ; use undocumented OS stuff? ; exec TDNestCnt = 295 LibList = 378 LN_NAME = 10 ; list node name ; dos Input = -54 Output = -60 Read = -42 Write = -48 DateStamp = -192 TICKS_PER_SECOND = 50 ; dos timer frequency ;*************************************************************** start IFEQ HACKS&(~DISABLE_MT) move.l 4.w,a5 ; exec library ENDIF IFEQ HACKS lea LibList(a5),a6 ; find dos in library list .lib_loop move.l (a6),a6 ; next library move.l LN_NAME(a6),a0 move.l #'.sod',d0 .lib_name cmp.b (a0)+,d0 bne.b .lib_loop lsr.l #8,d0 bne.b .lib_name ELSE lea -$148(a2),a6 ; dos library from bcpl vector ENDIF ; HACKS jsr Output(a6) move.l d0,a3 ; a3 = stdout lea workspace(pc),a4 moveq #10,d4 ; global const movem.w (a4),d5/d6/d7/a2 ; 10000, MAXD, 2000, 7*4 bsr.w getnum ; returns N in d6 (k = N = 7*D) IFNE DISABLE_MT addq.b #1,TDNestCnt(a5) ; FORBID macro, a5 is free now ENDIF bsr.b .gettime ; reg copy: a0 = d7, d7 = d6 move.l d1,-(a7) ; start time ;*** TIMED PART START ****************************************** .fill move.w a0,(a4)+ ; a0 = 2000 subq.w #2,d7 bne.b .fill ; outer+inner loop: ; d3 upper word must initially be and remain 0 ; d7 must initially be 0 (c = 0) ; d0=*, d1=d, d2=b, d3=tmp, d4=10, d5=10000, d6=k, d7=c ; a0=*, a1=*, a2=7*4, a4=r[] (a3=stdout, a5=--, a6=dos) .outer_loop moveq #0,d1 ; d = 0 move.w d6,d2 subq.w #1,d2 ; b = k-1 bra.b .inner_entry .gettime ; returns ticks in d1, and copies: d7->a0, d6->d7 movem.l d0/d1/d2/d6/d7,-(sp) move.l sp,d1 jsr DateStamp(a6) movem.l (sp)+,d0/d1/d2/d7/a0 ; d0=days, d1=minutes, d2=ticks mulu.w #TICKS_PER_SECOND*60,d1 ; minutes to ticks add.l d2,d1 rts .longdiv ; d0/d2, 32/16 -> 32q/16r swap d0 move.w d0,d3 divu.w d2,d3 swap d3 move.w d3,d0 swap d0 divu.w d2,d0 move.w d0,d3 clr.w d0 swap d0 move.w d0,(a4) ; r[i] = d%b exg d0,d3 subq.w #2,d2 ; b -= 2 bcs.b .inner_done .inner_loop sub.l d0,d1 ; d = (d-d/b-d%b)/2 sub.l d3,d1 ; (same as d *= i) lsr.l #1,d1 .inner_entry move.w -(a4),d0 ; r[i] mulu.w d5,d0 add.l d0,d1 ; d += r[i]*10000 move.l d1,d0 divu.w d2,d0 ; d/b bvs.b .longdiv move.w d0,d3 ; d/b clr.w d0 swap d0 ; d%b move.w d0,(a4) ; r[i] = d%b subq.w #2,d2 ; b -= 2 bcc.b .inner_loop .inner_done divu.w d5,d1 ; d/10000 add.w d7,d1 ; d = c+d/10000 (to be printed out) move.l d1,d7 swap d7 ; c = d%10000 IFNE PRINT_DIGITS bsr.b PR0000 ENDIF sub.w a2,d6 ; k -= 7*4 add.l d6,a4 ; &r[k/2] bne.b .outer_loop ; k = 0? ;*** TIMED PART END ******************************************** bsr.b .gettime sub.l (a7)+,d1 ; end-start time IFNE DAY_TRANS bpl.b .same_day add.l #TICKS_PER_SECOND*60*60*24,d1 .same_day ENDIF print_time move.l a4,d2 ; print buffer moveq #TICKS_PER_SECOND/10,d0 ; d0 = 5 mulu.w d0,d5 ; d5 = 5*10000 moveq #' ',d3 .next_part move.b d3,(a4)+ .next_digit divu.w d5,d1 ; digit 0-9 cmp.b d6,d1 ; d6 must initially be 0 beq.b .skip_lz moveq #'0',d6 ; print zeroes from now on add.b d6,d1 move.b d1,(a4)+ clr.w d1 .skip_lz swap d1 divu.w d4,d5 beq.b .done cmp.w d0,d5 bne.b .next_digit add.w d1,d1 ; fraction, switch from 1/50 to 1/100 add.w d5,d5 moveq #'0',d6 ; print all fraction zeroes moveq #'.',d3 bra.b .next_part .done move.b d4,(a4)+ ; newline move.l a4,d3 sub.l d2,d3 ; string length bra.b callwrite ; END OF PROGRAM (exec will re-enable multitasking) ;*************************************************************** PR0000 ; d1=value move.l #'0000'-$01010001,d0 move.w -(a4),d3 .Loop addq.b #1,d0 ; top 3 digits in a loop add.w d3,d1 bpl.b .Loop sub.w d3,d1 rol.l #8,d0 move.w -(a4),d3 ; last value is string length (4) bmi.b .Loop add.b d1,d0 ; 4th digit move.l d0,-(a4) ; to print buffer moveq #pbuffer-workspace,d2 sub.l d2,a4 writetext add.l a4,d2 ; string offset to address callwrite move.l a3,d1 ; stdout jmp Write(a6) ; call Write(stdout,buffer,length) ;*************************************************************** ; Data must be in this order all up to msg1. IFNE ALIGN_DATA ; keep it 32-bit aligned CNOP 0,4 ; preferably, for 020+ ENDIF pbuffer DCB.B 4,0 dec2str DC.W dec2str-pbuffer,-10,-100,-1000 ;*** OVERWRITTEN CODE/DATA STARTS HERE ************************* workspace MAXD = ((65536-(workspace-start))/7)&(~3) ; multiple of 4 DC.W 10000,MAXD,2000,7*4 msg1 DC.B "number pi calculator v20",10 ; odd length msg1end msg2 DC.B "number of digits (up to " ; even length IFNE OPT_PROMPT X SET MAXD DC.B '0'+X/1000 X SET X-(X/1000)*1000 DC.B '0'+X/100 X SET X-(X/100)*100 DC.B '0'+X/10 X SET X-(X/10)*10 DC.B '0'+X ENDIF msg2end msg3 DC.B ")? " ; odd length IFNE CURSOR_OFF DC.B $1b,"[0 p" ; to even length ENDIF msg3end IFEQ OPT_PROMPT EVEN printnum bra.b PR0000 ; chained short branch ENDIF msg4 DC.B " digits will be printed",10 ; even length msg4end EVEN ;*************************************************************** getnum moveq #msg1-workspace,d2 moveq #msg1end-msg1,d3 bsr.b writetext .error moveq #msg2-workspace,d2 IFNE OPT_PROMPT moveq #msg3end-msg2,d3 ELSE moveq #msg2end-msg2,d3 bsr.b writetext move.w d6,d1 ; MAXD bsr.b printnum moveq #msg3-workspace,d2 moveq #msg3end-msg3,d3 ENDIF ; OPT_PROMPT bsr.b writetext jsr Input(a6) move.l d0,d1 ; stdin move.l a4,d2 ; read buffer (overwrites msg1) moveq #msg1end-1-msg1,d3 ; -1 ensures a trailing newline jsr Read(a6) move.l a4,a0 moveq #0,d1 ; D = 0 move.b (a0)+,d3 .next_char add.b d7,d3 ; 2000&$ff = $d0 = -'0' cmp.b d4,d3 ; digit 0-9? bhs.b .error mulu.w d4,d1 ; D = D*10+digit add.l d3,d1 cmp.l d6,d1 ; D > MAXD? bhi.b .error move.b (a0)+,d3 cmp.b d4,d3 ; newline? bne.b .next_char move.w d1,d3 ; D = 0? beq.b .error addq.w #3,d1 and.w #~3,d1 ; adjust D to a multiple of 4 moveq #7,d6 mulu.w d1,d6 ; k = N = 7*D sub.w d1,d3 beq.b .not_adjusted .adjusted IFNE OPT_PROMPT bsr.w PR0000 ELSE bsr.b printnum ENDIF moveq #msg4end-msg4,d3 .not_adjusted ; msg4 isn't printed (length = 0) moveq #msg4-workspace,d2 bra.w writetext ;*************************************************************** bss DS.B 65536-(*-start) ; 64kb allowed for code+data ;*************************************************************** ; Enable/disable these as needed: PRINTT "Code+data size:" PRINTV (bss-start) PRINTT "Executable size:" PRINTV (bss-start+36+3)&(~3) ; 36 = hunk overhead PRINTT "Maximum number of digits:" PRINTV MAXD PRINTT "Data 32-bit alignment:" PRINTV (pbuffer-start)&3 ;*************************************************************** Last edited by a/b; 10 July 2021 at 17:38. |
05 July 2021, 10:07 | #485 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 722
|
@a/b doesn't guard for 2 days transition *evil grin*
You know, I could **accidentally** run a +127 priority busy wait task that releases a quantum to multitasking every 1 hour |
05 July 2021, 10:22 | #486 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
|
Quote:
It might be a solution in search of a problem then |
|
05 July 2021, 10:27 | #487 | |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 722
|
Quote:
Nah, I am just teasing. a/b did a tremendous work there |
|
05 July 2021, 10:39 | #488 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
|
Absolutely, I'm super impressed with the level of optimisation I've seen so far
|
05 July 2021, 10:41 | #489 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
I could always add another "I''ll shoot if you ask for..." comment .
|
06 July 2021, 19:28 | #490 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
I have to slow down, not even close to 1000 posts, so only -2 bytes (in print_time) this time. Code updated.
|
10 July 2021, 17:36 | #491 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
I took another look from a fresh perspective after a few days. Slightly changed the input logic, small buffer size doesn't really work well with buffered I/O. It's the same thing but in practical terms it works better (at least for me). And the code is simpler, so -4 bytes.
Also added another setting to disable 32-bit data align because it's not needed for 000/010 and could potentially save 2 bytes. Code updated, this will probably be my last one. |
19 July 2021, 20:20 | #492 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,039
|
Then after holidays you can try to optimize ramlib
http://eab.abime.net/showpost.php?p=919977&postcount=83 Latest source version is on the Wanted Team page: http://wt.exotica.org.uk/test.html Perhaps has one bug too (bad apply for ramlib usage fix or something different ?), i dont remember, too many time passed. |
20 July 2021, 23:52 | #493 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
By chance i noticed that
HACKS = 1doesn't work on KS1.2 (I was doing something unrelated and I realized that the offset is different) |
21 July 2021, 04:31 | #494 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 722
|
|
25 July 2021, 01:28 | #495 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,719
|
Quote:
However I found out that printing is faster if you cover the CLI window up with another one. With the window made as short as possible it takes 9.6 seconds to print 3000 digits. With the window covered it only takes 9.35 seconds. You can still see the last page of digits by expanding the window, so this technique is valid. I changed your code slightly to print the time on a separate line, which takes no more code. I also changed the optimized prompt number to make it compatible with ProAsm. With sensible options the file size is 488 bytes, exactly 1 OFS disk sector. |
|
16 June 2023, 09:22 | #496 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,039
|
Resurrection. If TRAPV method can be implemented, then perhaps speed will be much fastest and size of code will be greater. Of course if a/b still want to made next version.
https://eab.abime.net/showthread.php?t=114314 |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68020 Bit Field Instructions | mcgeezer | Coders. Asm / Hardware | 9 | 27 October 2023 23:21 |
68060 64-bit integer math | BSzili | Coders. Asm / Hardware | 7 | 25 January 2021 21:18 |
Discovery: Math | Audio Snow | request.Old Rare Games | 30 | 20 August 2018 12:17 |
Math apps | mtb | support.Apps | 1 | 08 September 2002 18:59 |
|
|