12 June 2021, 12:18 | #301 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
This is not equivalent:
Code:
move.l #(65536-(ra-start))/7,D7 ; D7=maxn ; move.l #$10000-(ra-start),d7 ; divu.w #7*4,D7 ; lsl.l #2,D7 ; d7.w=maxn 8/7 = 1 (8/28)<<2 = 0 7777/7 = 1111 (7777/28)<<2 = 1108 It should be written as either of these: Code:
move.l #((65536-(ra-start))/7)&(~3),D7 ; d7=maxn move.l #((65536-(ra-start))/(7<<2))<<2,D7 ; d7=maxn |
12 June 2021, 12:21 | #302 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,544
|
Quote:
Here's a few more interesting timings. First a straight copy from FastRAM to ChipRAM, which took 46 clock cycles per loop. Code:
lea fastram,a0 ; a0 = pointer to fastram lea chipram,a1 ; a1 = pointer to chipram move.w #1000-1,d5 ; repeat inner loop code 1000 times ; -- inner loop -- loop: move.l (a0)+,(a1)+ ; copy longword from fastram to chipram dbf d5,loop Code:
loop: move.l (a0)+,d0 ; read longword from next fastram address move.l d0,(a1)+ ; write longword to next chipram address dbf d5,loop Code:
loop: move.l (a0)+,d0 ; read longword from next fastram address move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d0,(a1)+ ; write longword to next chipram address move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d2,d2 move.l d2,d2 dbf d5,loop I don't know where this effect is coming from, but it certainly could be useful. 4.3MB/s may not be so much of a bottleneck if you can combine it with some other processing. Maybe this analysis is a bit off topic, but it shows that when dealing with slow memory it pays to interleave data memory accesses with internal operations. The pi-spigot code has mostly register to register instructions and no consecutive data memory accesses in its inner loop, so it (fortunately?) has nothing to gain from this principle. |
|
12 June 2021, 12:41 | #303 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
Quote:
Ext.l is not necessary for this version. Because D7 (D5 later) is handled as word only. Ext.l is only necessary for litwr version of PR0000 routine with divu.w, for sub.w version can be ignored. |
|
12 June 2021, 12:45 | #304 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Quote:
For the writes, we know why already. Now perhaps it's possible to do better. What about : Code:
loop move.l (a0)+,d0 move.l (a0)+,d1 move.l (a0)+,d2 move.l d0,(a1)+ move.l d1,(a1)+ move.l d2,(a1)+ dbf d5,loop |
||
12 June 2021, 14:31 | #305 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
@Thread
I've edited the thread title so the 040 and 060 can be included as on-topic here. As you can see, the topic diversity is something to think about when the thread is created. |
18 June 2021, 20:21 | #306 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
First, thanks to people who helped to optimize my code. I have just made a commit with some Don_Adan's suggestions. However I must notice that I was invited to start this thread by meynaf.
off topic removed - Bippym Last edited by BippyM; 20 June 2021 at 14:57. |
19 June 2021, 20:05 | #307 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
|
Quote:
How much of that you can do seems to depend at least on the clock speed of the CPU vs the speed of the RAM and whether or not the code is in the cache (i.e. on a Chip RAM only A1200 the effect can be quite extreme). One thing that may also be worth considering is that some of the instructions involved in the code here write words (or even bytes) to memory. Combining these into a single, bigger write can be a lot faster on at least 68020/68030 (I think this also goes for 68040+, but I'm not too sure), especially if the writes are long word aligned. I did some tests on this and found that, for my code and word based results at least, the extra overhead of needing a register to store the half-results in and the extra need for some extra code to keep track of the half results properly still usually ended up with a notable speed increase. Note however, I also found that the slower the RAM, the bigger the speed increase from doing this. If you have zero-wait-state RAM, the difference will be a lot smaller than say writing to Chip RAM. Last edited by roondar; 19 June 2021 at 20:14. |
|
19 June 2021, 23:17 | #308 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
End code reworked, but untested.
Edit, its buggy for now. Time string must be reversed. Edit2, perhaps fixed now. Still can be optimised a few bytes. Perhaps using litwr idea. Code:
OldOpenLibrary = -408 CloseLibrary = -414 Output = -60 Input = -54 Write = -48 Read = -42 Forbid = -132 Permit = -138 AddIntServer = -168 RemIntServer = -174 VBlankFrequency = 530 INTB_VERTB = 5 ;for vblank interrupt NT_INTERRUPT = 2 ;node type ;N = 7*D/2 ;D digits, e.g., N = 350 for 100 digits start lea libname(pc),a1 ;open the dos library move.l 4.W,a5 move.l a5,a6 jsr OldOpenLibrary(a6) move.l d0,a6 jsr Output(a6) ;get stdout lea cout(PC),A4 move.l d0,(A4) ;cout move.l d0,d1 ;call Write(stdout,buff,size) moveq #msg1-cout,D2 ; must be checked if in moveq range, the longest text can be moved at end add.l A4,D2 moveq #msg4-msg1,d3 jsr Write(a6) move.l #((65536-(ra-start))/(7<<2))<<2,D7 ; d7=maxn .l20 move.l (A4),D1 ; cout moveq #msg4-cout,D2 add.l A4,D2 moveq #msg5-msg4,d3 jsr Write(a6) move.l d7,d5 bsr.w PR0000 move.l (A4),D1 ; cout moveq #msg5-cout,D2 add.l A4,D2 moveq #msg3-msg5,d3 jsr Write(a6) bsr.w getnum cmp.w d7,d5 bhi.b .l20 move.w d5,d1 beq.b .l20 addq.w #3,d5 and.w #$fffc,d5 cmp.b #10,(a0) bne.b .l21 move.w d5,d6 cmp.w d1,d5 beq.b .l7 .l21 bsr.w PR0000 move.l (A4),D1 ; cout moveq #msg3-cout,D2 add.l A4,D2 moveq #msg2-msg3+1,d3 jsr Write(a6) .l7 mulu.w #7,d6 ;kv = d6 lsr.l #2,D6 ; /4 move.l d6,d7 lea ra(pc),a3 exg a5,a6 jsr Forbid(a6) moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr AddIntServer(a6) exg a5,a6 ;move.w #$4000,$dff096 ;DMA off move.l #2000*65537,d0 move.l a3,a0 .fill move.l d0,(a0)+ subq.l #1,D7 bne.b .fill move.l D7,-(SP) ; cv lea 10000.W,A2 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf .l0 moveq #0,D5 ;d <- 0 move.l d6,d4 ;i <- kv, i <- i*2 lsl.l #2,D4 ; *4 adda.l d4,a3 subq.l #1,d4 ;b <- 2*i-1 move.l A2,D1 bra.b .l4 .longdiv swap d0 move.w d0,d7 divu.w d4,d7 swap d7 move.w d7,d0 swap d0 divu.w d4,d0 move.w d0,d7 exg d0,d7 clr.w d7 swap d7 move.w d7,(a3) ;r[i] <- d%b bra.b .enddiv .l2 sub.l d0,d5 sub.l d7,d5 lsr.l #1,d5 .l4 move -(a3),d0 ; r[i] mulu.w d1,d0 ;r[i]*10000 add.l d0,d5 ;d += r[i]*10000 move.l d5,d0 divu.w d4,d0 bvs.s .longdiv move.w d0,d7 clr.w d0 swap d0 move.w d0,(a3) ;r[i] <- d%b .enddiv subq.l #2,d4 ;i <- i - 1 bcc.b .l2 ;the main loop divu.w d1,d5 ;removed with MULU optimization add.w (SP),D5 ; cv move.l D5,(SP) ; cv bsr.w PR000N subq.l #7,d6 ;kv bne.b .l0 addq.l #4,SP ; restore stack move.l time(pc),d5 ;move.w #$c000,$dff096 ;DMA on exg a5,a6 moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr RemIntServer(a6) jsr Permit(a6) exg a5,a6 moveq #1+3+1,D4 lea string(PC),A3 moveq #10,D1 move.b D1,(A3)+ ; newline move.l d5,d0 add.l D5,D5 cmp.b #50,VBlankFrequency(a5) beq.b .l8 add.l D5,D5 ;60 Hz add.l d0,d5 divu.w #3,d5 swap d5 lsr.w #2,d5 swap d5 negx.l d5 neg.l d5 .l8 moveq #$30,D0 ; move.l d5,d6 ; moveq #0,d7 ; not necessary D7 highword is already cleared divu.w d1,d5 bvc.b .div32no swap d5 move.w d5,d7 divu.w d1,d7 swap d7 move d7,d5 swap d5 divu.w d1,d5 .div32no move.w d5,d7 swap d5 add.b D0,D5 move.b d5,(a3)+ divu.w d1,d7 swap d7 add.b D0,D7 move.b d7,(a3)+ clr.w d7 swap d7 move.b #'.',(a3)+ ; dot .l12 tst.w d7 beq .l11 addq.l #1,D4 divu.w d1,d7 swap d7 add.b D0,D7 move.b d7,(a3)+ clr.w d7 swap d7 bra .l12 .l11 move.b #32,(A3)+ ; newline move.l A3,D2 moveq #1,D3 .next move.l (A4),D1 ; cout subq.l #1,D2 jsr Write(a6) subq.l #1,D4 bne.b .next move.l a6,a1 move.l a5,a6 jmp CloseLibrary(a6) PR0000 ;prints d5, uses a0,a1(scratch),d0,d1,d2,d3 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf PR000N move.w #$0100,a0 move.l #$2f3a2f2f,d0 move.w #1000,d1 .b1000 add.w a0,d0 sub.w d1,d5 bcc.b .b1000 add.w d1,d5 moveq #100,d1 .b100 addq.b #1,d0 sub.w d1,d5 bcc.b .b100 add.w d1,d5 swap d0 moveq #10,d1 .b10 add.w a0,d0 sub.w d1,d5 bcc.b .b10 add.b d5,d0 move.l D0,4(A4) ; buf move.l (A4),D1 ; cout jmp Write(A6) ;call Write(stdout,buff,size) rasteri addq.l #1,(a1) ;If you set your interrupt to priority 10 or higher then a0 must point at $dff000 on exit moveq #0,d0 ; must set Z flag on exit! rts VBlankServer: dc.l 0,0 ;ln_Succ,ln_Pred dc.b NT_INTERRUPT,0 ;ln_Type,ln_Pri dc.l 0 ;ln_Name dc.l time,rasteri ;is_Data,is_Code ;msgx dc.b 32,10 cnop 0,4 time dc.l 0 cout dc.l 0 buf ds.b 4 ; Overwritten code/data start here. ra string = msg1 libname dc.b "dos.library",0 msg1 dc.b 'number pi calculator v13',10 msg4 dc.b 'number of digits (up to ' msg5 dc.b ')? ' msg3 dc.b ' digits will be printed' msg2 dc.b 10,0 even getnum jsr Input(a6) ;get stdin moveq #msg1-cout,D2 add.l A4,D2 move.l d0,d1 moveq #5,d3 ;+ newline jsr Read(a6) move.l d2,a0 moveq #0,d5 .loop subq.w #1,d0 beq.b .done move.w #256-'0',d6 add.b (a0)+,d6 cmp.w #9,d6 bhi.b .error mulu.w #10,d5 add.w d6,d5 bra.b .loop .error moveq #0,d5 .done rts Buffy ds.b 65536-(Buffy-start) Last edited by Don_Adan; 20 June 2021 at 14:40. |
20 June 2021, 14:57 | #309 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
BTW. Perhaps rasteri counter can be changed too.
rasteri addq.l #1,(a1) moveq #0,d0 rts After using addq.l #2,(a1), one less command for 50 Hz. But the longer is 60 Hz version. Then maybe exist value which can shortened 60 Hz and is short for 50Hz too. But present i dont have idea. |
20 June 2021, 17:12 | #310 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Unless it's one of those 'it works in this version but it's needed for that version', d5/d7 are only used as a word (where this is relevant, up to label .l7):
Code:
; move.l #((65536-(ra-start))/(7<<2))<<2,D7 ; d7=maxn move.w #((65536-(ra-start))/(7<<2))<<2,D7 ; d7=maxn ... ; move.l d7,d5 move.w d7,d5 Another option to make it shorter is to place these three just before label .longdiv: Code:
.write move.l (a4),d1 add.l a4,d2 jmp Write(a6) .longdiv ... 4*8-(3*2+1*4)-8 = 14 bytes shorter This is all done early and it doesn't affect the speed. edit: further size reduction... Last edited by a/b; 20 June 2021 at 18:03. |
21 June 2021, 03:45 | #311 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
More size optimisations from a/b. And used litwr idea for time too.
Code:
OldOpenLibrary = -408 CloseLibrary = -414 Output = -60 Input = -54 Write = -48 Read = -42 Forbid = -132 Permit = -138 AddIntServer = -168 RemIntServer = -174 VBlankFrequency = 530 INTB_VERTB = 5 ;for vblank interrupt NT_INTERRUPT = 2 ;node type ;N = 7*D/2 ;D digits, e.g., N = 350 for 100 digits start lea libname(pc),a1 ;open the dos library move.l 4.W,a5 move.l a5,a6 jsr OldOpenLibrary(a6) move.l d0,a6 jsr Output(a6) ;get stdout lea cout(PC),A4 move.l d0,(A4) ;cout move.w #((65536-(ra-start))/(7<<2))<<2,D7 ; d7.w=maxn (moved here) ;call Write(stdout,buff,size) moveq #msg1-cout,D2 ; must be checked if in moveq range, the longest text can be moved at end moveq #msg4-msg1,d3 bsr .write .l20 moveq #msg4-cout,D2 moveq #msg5-msg4,d3 bsr.b .write move.w d7,d5 bsr.w PR0000 moveq #msg5-cout,D2 moveq #msg3-msg5,d3 bsr.b .write bsr.w getnum cmp.w d7,d5 bhi.b .l20 move.w d5,d1 beq.b .l20 addq.w #3,d5 and.w #$fffc,d5 cmp.b #10,(a0) bne.b .l21 move.w d5,d6 cmp.w d1,d5 beq.b .l7 .l21 bsr.w PR0000 moveq #msg3-cout,D2 moveq #msg2-msg3+1,d3 bsr.b .write .l7 mulu.w #7,d6 ;kv = d6 lsr.l #2,D6 ; /4 move.l d6,d7 lea ra(pc),a3 exg a5,a6 jsr Forbid(a6) moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr AddIntServer(a6) exg a5,a6 ;move.w #$4000,$dff096 ;DMA off move.l #2000*65537,d0 move.l a3,a0 .fill move.l d0,(a0)+ subq.l #1,D7 bne.b .fill move.l D7,-(SP) ; cv lea 10000.W,A2 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf .l0 moveq #0,D5 ;d <- 0 move.l d6,d4 ;i <- kv, i <- i*2 lsl.l #2,D4 ; *4 adda.l d4,a3 subq.l #1,d4 ;b <- 2*i-1 move.l A2,D1 bra.b .l4 .write move.l (A4),D1 ; cout add.l A4,D2 jmp Write(a6) .longdiv swap d0 move.w d0,d7 divu.w d4,d7 swap d7 move.w d7,d0 swap d0 divu.w d4,d0 move.w d0,d7 exg d0,d7 clr.w d7 swap d7 move.w d7,(a3) ;r[i] <- d%b bra.b .enddiv .l2 sub.l d0,d5 sub.l d7,d5 lsr.l #1,d5 .l4 move -(a3),d0 ; r[i] mulu.w d1,d0 ;r[i]*10000 add.l d0,d5 ;d += r[i]*10000 move.l d5,d0 divu.w d4,d0 bvs.s .longdiv move.w d0,d7 clr.w d0 swap d0 move.w d0,(a3) ;r[i] <- d%b .enddiv subq.l #2,d4 ;i <- i - 1 bcc.b .l2 ;the main loop divu.w d1,d5 ;removed with MULU optimization add.w (SP),D5 ; cv move.l D5,(SP) ; cv bsr.w PR000N subq.l #7,d6 ;kv bne.b .l0 addq.l #4,SP ; restore stack move.l time(pc),d5 ;move.w #$c000,$dff096 ;DMA on exg a5,a6 moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr RemIntServer(a6) jsr Permit(a6) exg a5,a6 moveq #1+3+1,D4 lea string(PC),A3 move.b #10-$30,(A3)+ ; newline move.l d5,d0 add.l D5,D5 cmp.b #50,VBlankFrequency(a5) beq.b .l8 add.l D5,D5 ;60 Hz add.l d0,d5 divu.w #3,d5 swap d5 lsr.w #2,d5 swap d5 negx.l d5 neg.l d5 .l8 moveq #10,D1 ; moveq #0,d7 ; not necessary D7 highword is already cleared divu.w d1,d5 bvc.b .div32no swap d5 move.w d5,d7 divu.w d1,d7 swap d7 move d7,d5 swap d5 divu.w d1,d5 .div32no move.w d5,d7 swap d5 move.b d5,(a3)+ divu.w d1,d7 swap d7 move.b d7,(a3)+ clr.w d7 swap d7 move.b #'.'-$30,(a3)+ ; dot .l12 tst.w d7 beq .l11 addq.l #1,D4 divu.w d1,d7 swap d7 move.b d7,(a3)+ clr.w d7 swap d7 bra .l12 .l11 move.b #32-$30,(A3)+ ; newline moveq #1,D3 .next move.l (A4),D1 ; cout add.b #$30,-(A3) move.l A3,D2 jsr Write(a6) subq.l #1,D4 bne.b .next move.l a6,a1 move.l a5,a6 jmp CloseLibrary(a6) PR0000 ;prints d5, uses a0,a1(scratch),d0,d1,d2,d3 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf PR000N move.w #$0100,a0 move.l #$2f3a2f2f,d0 move.w #1000,d1 .b1000 add.w a0,d0 sub.w d1,d5 bcc.b .b1000 add.w d1,d5 moveq #100,d1 .b100 addq.b #1,d0 sub.w d1,d5 bcc.b .b100 add.w d1,d5 swap d0 moveq #10,d1 .b10 add.w a0,d0 sub.w d1,d5 bcc.b .b10 add.b d5,d0 move.l D0,4(A4) ; buf move.l (A4),D1 ; cout jmp Write(A6) ;call Write(stdout,buff,size) rasteri addq.l #1,(a1) ;If you set your interrupt to priority 10 or higher then a0 must point at $dff000 on exit moveq #0,d0 ; must set Z flag on exit! rts VBlankServer: dc.l 0,0 ;ln_Succ,ln_Pred dc.b NT_INTERRUPT,0 ;ln_Type,ln_Pri dc.l 0 ;ln_Name dc.l time,rasteri ;is_Data,is_Code ;msgx dc.b 32,10 cnop 0,4 time dc.l 0 cout dc.l 0 buf ds.b 4 ; Overwritten code/data start here. ra string = msg1 libname dc.b "dos.library",0 msg1 dc.b 'number pi calculator v13',10 msg4 dc.b 'number of digits (up to ' msg5 dc.b ')? ' msg3 dc.b ' digits will be printed' msg2 dc.b 10,0 even getnum jsr Input(a6) ;get stdin moveq #msg1-cout,D2 add.l A4,D2 move.l d0,d1 moveq #5,d3 ;+ newline jsr Read(a6) move.l d2,a0 moveq #0,d5 .loop subq.w #1,d0 beq.b .done move.w #256-'0',d6 add.b (a0)+,d6 cmp.w #9,d6 bhi.b .error mulu.w #10,d5 add.w d6,d5 bra.b .loop .error moveq #0,d5 .done rts Buffy ds.b 65536-(Buffy-start) Last edited by Don_Adan; 21 June 2021 at 20:17. |
21 June 2021, 09:24 | #312 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,215
|
Do you really run byte output over Write()? This is not advisable, Write() makes a context switch for every single call. Please see FPutC/Printf/FPrintf or related *buffered* calls from the dos.library that are much more efficient for single-character output.
|
21 June 2021, 13:36 | #313 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
Quote:
Code called before "; DMA off" and after "; DMA on" is optimised for size only/mostly. Single character write is not efficient, i know. In my previous version i want to use only one write for full end text, but this text (time value) must be at first reversed. I dont see short enough routine to reverse time value. If any Amiga dos.library routine can display text in reverse order then code can be changed. But i dont think that end text code will be shortest if other dos.library routine will be used. |
|
21 June 2021, 14:10 | #314 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
Inspired by Thomas Richter, maybe even a few bytes shortest. If works.
Code:
OldOpenLibrary = -408 CloseLibrary = -414 Output = -60 Input = -54 Write = -48 Read = -42 Forbid = -132 Permit = -138 AddIntServer = -168 RemIntServer = -174 VBlankFrequency = 530 INTB_VERTB = 5 ;for vblank interrupt NT_INTERRUPT = 2 ;node type ;N = 7*D/2 ;D digits, e.g., N = 350 for 100 digits start lea libname(pc),a1 ;open the dos library move.l 4.W,a5 move.l a5,a6 jsr OldOpenLibrary(a6) move.l d0,a6 jsr Output(a6) ;get stdout lea cout(PC),A4 move.l d0,(A4) ;cout move.w #((65536-(ra-start))/(7<<2))<<2,D7 ; d7.w=maxn (moved here) ;call Write(stdout,buff,size) moveq #msg1-cout,D2 ; must be checked if in moveq range, the longest text can be moved at end moveq #msg4-msg1,d3 bsr .write .l20 moveq #msg4-cout,D2 moveq #msg5-msg4,d3 bsr.b .write move.w d7,d5 bsr.w PR0000 moveq #msg5-cout,D2 moveq #msg3-msg5,d3 bsr.b .write bsr.w getnum cmp.w d7,d5 bhi.b .l20 move.w d5,d1 beq.b .l20 addq.w #3,d5 and.w #$fffc,d5 cmp.b #10,(a0) bne.b .l21 move.w d5,d6 cmp.w d1,d5 beq.b .l7 .l21 bsr.w PR0000 moveq #msg3-cout,D2 moveq #msg2-msg3+1,d3 bsr.b .write .l7 mulu.w #7,d6 ;kv = d6 lsr.l #2,D6 ; /4 move.l d6,d7 lea ra(pc),a3 exg a5,a6 jsr Forbid(a6) moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr AddIntServer(a6) exg a5,a6 ;move.w #$4000,$dff096 ;DMA off move.l #2000*65537,d0 move.l a3,a0 .fill move.l d0,(a0)+ subq.l #1,D7 bne.b .fill move.l D7,-(SP) ; cv lea 10000.W,A2 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf .l0 moveq #0,D5 ;d <- 0 move.l d6,d4 ;i <- kv, i <- i*2 lsl.l #2,D4 ; *4 adda.l d4,a3 subq.l #1,d4 ;b <- 2*i-1 move.l A2,D1 bra.b .l4 .write move.l (A4),D1 ; cout add.l A4,D2 jmp Write(a6) .longdiv swap d0 move.w d0,d7 divu.w d4,d7 swap d7 move.w d7,d0 swap d0 divu.w d4,d0 move.w d0,d7 exg d0,d7 clr.w d7 swap d7 move.w d7,(a3) ;r[i] <- d%b bra.b .enddiv .l2 sub.l d0,d5 sub.l d7,d5 lsr.l #1,d5 .l4 move -(a3),d0 ; r[i] mulu.w d1,d0 ;r[i]*10000 add.l d0,d5 ;d += r[i]*10000 move.l d5,d0 divu.w d4,d0 bvs.s .longdiv move.w d0,d7 clr.w d0 swap d0 move.w d0,(a3) ;r[i] <- d%b .enddiv subq.l #2,d4 ;i <- i - 1 bcc.b .l2 ;the main loop divu.w d1,d5 ;removed with MULU optimization add.w (SP),D5 ; cv move.l D5,(SP) ; cv bsr.w PR000N subq.l #7,d6 ;kv bne.b .l0 addq.l #4,SP ; restore stack move.l time(pc),d5 ;move.w #$c000,$dff096 ;DMA on exg a5,a6 moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr RemIntServer(a6) jsr Permit(a6) exg a5,a6 moveq #1+3+1,D3 lea string+8(PC),A3 moveq #10,D1 move.b D1,-(A3) ; newline move.l d5,d0 add.l D5,D5 cmp.b #50,VBlankFrequency(a5) beq.b .l8 add.l D5,D5 ;60 Hz add.l d0,d5 divu.w #3,d5 swap d5 lsr.w #2,d5 swap d5 negx.l d5 neg.l d5 .l8 moveq #$30,D0 ; moveq #0,d7 ; not necessary D7 highword is already cleared divu.w d1,d5 bvc.b .div32no swap d5 move.w d5,d7 divu.w d1,d7 swap d7 move d7,d5 swap d5 divu.w d1,d5 .div32no move.w d5,d7 swap d5 add.b D0,D5 move.b d5,-(a3) divu.w d1,d7 swap d7 add.b D0,D7 move.b d7,-(a3) clr.w d7 swap d7 move.b #'.',-(a3) ; dot .l12 tst.w d7 beq .l11 addq.l #1,D3 divu.w d1,d7 swap d7 add.b D0,D7 move.b d7,-(a3) clr.w d7 swap d7 bra .l12 .l11 move.b #32,-(A3) ; space move.l (A4),D1 ; cout move.l A3,D2 jsr Write(a6) move.l a6,a1 move.l a5,a6 jmp CloseLibrary(a6) PR0000 ;prints d5, uses a0,a1(scratch),d0,d1,d2,d3 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf PR000N move.w #$0100,a0 move.l #$2f3a2f2f,d0 move.w #1000,d1 .b1000 add.w a0,d0 sub.w d1,d5 bcc.b .b1000 add.w d1,d5 moveq #100,d1 .b100 addq.b #1,d0 sub.w d1,d5 bcc.b .b100 add.w d1,d5 swap d0 moveq #10,d1 .b10 add.w a0,d0 sub.w d1,d5 bcc.b .b10 add.b d5,d0 move.l D0,4(A4) ; buf move.l (A4),D1 ; cout jmp Write(A6) ;call Write(stdout,buff,size) rasteri addq.l #1,(a1) ;If you set your interrupt to priority 10 or higher then a0 must point at $dff000 on exit moveq #0,d0 ; must set Z flag on exit! rts VBlankServer: dc.l 0,0 ;ln_Succ,ln_Pred dc.b NT_INTERRUPT,0 ;ln_Type,ln_Pri dc.l 0 ;ln_Name dc.l time,rasteri ;is_Data,is_Code ;msgx dc.b 32,10 cnop 0,4 time dc.l 0 cout dc.l 0 buf ds.b 4 ; Overwritten code/data start here. ra string = msg1 libname dc.b "dos.library",0 msg1 dc.b 'number pi calculator v13',10 msg4 dc.b 'number of digits (up to ' msg5 dc.b ')? ' msg3 dc.b ' digits will be printed' msg2 dc.b 10,0 even getnum jsr Input(a6) ;get stdin moveq #msg1-cout,D2 add.l A4,D2 move.l d0,d1 moveq #5,d3 ;+ newline jsr Read(a6) move.l d2,a0 moveq #0,d5 .loop subq.w #1,d0 beq.b .done move.w #256-'0',d6 add.b (a0)+,d6 cmp.w #9,d6 bhi.b .error mulu.w #10,d5 add.w d6,d5 bra.b .loop .error moveq #0,d5 .done rts Buffy ds.b 65536-(Buffy-start) Last edited by Don_Adan; 21 June 2021 at 20:17. |
21 June 2021, 16:19 | #315 |
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 171
|
@Don_Adan
I see following error message: Code:
error 2029 in line 33: branch destination out of range > bsr.b .write |
21 June 2021, 20:17 | #316 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
Ok, thanks, from my manually calculated code length 2 bytes too long, maybe any other small optimisation will be possible, for now i changed this branch to bsr
|
21 June 2021, 20:25 | #317 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
Perhaps now first .write is in bsr.b range.
Code:
OldOpenLibrary = -408 CloseLibrary = -414 Output = -60 Input = -54 Write = -48 Read = -42 Forbid = -132 Permit = -138 AddIntServer = -168 RemIntServer = -174 VBlankFrequency = 530 INTB_VERTB = 5 ;for vblank interrupt NT_INTERRUPT = 2 ;node type ;N = 7*D/2 ;D digits, e.g., N = 350 for 100 digits start lea libname(pc),a1 ;open the dos library move.l 4.W,a5 move.l a5,a6 jsr OldOpenLibrary(a6) move.l d0,a6 jsr Output(a6) ;get stdout lea cout(PC),A4 move.l d0,(A4) ;cout move.w #((65536-(ra-start))/(7<<2))<<2,D7 ; d7.w=maxn (moved here) ;call Write(stdout,buff,size) moveq #-4,D4 moveq #msg1-cout,D2 ; must be checked if in moveq range, the longest text can be moved at end moveq #msg4-msg1,d3 bsr.b .write .l20 moveq #msg4-cout,D2 moveq #msg5-msg4,d3 bsr.b .write move.w d7,d5 bsr.w PR0000 moveq #msg5-cout,D2 moveq #msg3-msg5,d3 bsr.b .write bsr.w getnum cmp.w d7,d5 bhi.b .l20 move.w d5,d1 beq.b .l20 addq.w #3,d5 and.w D4,d5 cmp.b #10,(a0) bne.b .l21 move.w d5,d6 cmp.w d1,d5 beq.b .l7 .l21 bsr.w PR0000 moveq #msg3-cout,D2 moveq #msg2-msg3,d3 bsr.b .write .l7 mulu.w #7,d6 ;kv = d6 lsr.l #2,D6 ; /4 move.l d6,d7 lea ra(pc),a3 exg a5,a6 jsr Forbid(a6) moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr AddIntServer(a6) exg a5,a6 ;move.w #$4000,$dff096 ;DMA off move.l #2000*65537,d0 move.l a3,a0 .fill move.l d0,(a0)+ subq.l #1,D7 bne.b .fill move.l D7,-(SP) ; cv lea 10000.W,A2 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf .l0 moveq #0,D5 ;d <- 0 move.l d6,d4 ;i <- kv, i <- i*2 lsl.l #2,D4 ; *4 adda.l d4,a3 subq.l #1,d4 ;b <- 2*i-1 move.l A2,D1 bra.b .l4 .write move.l (A4),D1 ; cout add.l A4,D2 jmp Write(a6) .longdiv swap d0 move.w d0,d7 divu.w d4,d7 swap d7 move.w d7,d0 swap d0 divu.w d4,d0 move.w d0,d7 exg d0,d7 clr.w d7 swap d7 move.w d7,(a3) ;r[i] <- d%b bra.b .enddiv .l2 sub.l d0,d5 sub.l d7,d5 lsr.l #1,d5 .l4 move -(a3),d0 ; r[i] mulu.w d1,d0 ;r[i]*10000 add.l d0,d5 ;d += r[i]*10000 move.l d5,d0 divu.w d4,d0 bvs.s .longdiv move.w d0,d7 clr.w d0 swap d0 move.w d0,(a3) ;r[i] <- d%b .enddiv subq.l #2,d4 ;i <- i - 1 bcc.b .l2 ;the main loop divu.w d1,d5 ;removed with MULU optimization add.w (SP),D5 ; cv move.l D5,(SP) ; cv bsr.w PR000N subq.l #7,d6 ;kv bne.b .l0 addq.l #4,SP ; restore stack move.l time(pc),d5 ;move.w #$c000,$dff096 ;DMA on exg a5,a6 moveq #INTB_VERTB,d0 lea VBlankServer(pc),a1 jsr RemIntServer(a6) jsr Permit(a6) exg a5,a6 moveq #1+3+1,D3 lea string+8(PC),A3 moveq #10,D1 move.b D1,-(A3) ; newline move.l d5,d0 add.l D5,D5 cmp.b #50,VBlankFrequency(a5) beq.b .l8 add.l D5,D5 ;60 Hz add.l d0,d5 divu.w #3,d5 swap d5 lsr.w #2,d5 swap d5 negx.l d5 neg.l d5 .l8 moveq #$30,D0 ; moveq #0,d7 ; not necessary D7 highword is already cleared divu.w d1,d5 bvc.b .div32no swap d5 move.w d5,d7 divu.w d1,d7 swap d7 move d7,d5 swap d5 divu.w d1,d5 .div32no move.w d5,d7 swap d5 add.b D0,D5 move.b d5,-(a3) divu.w d1,d7 swap d7 add.b D0,D7 move.b d7,-(a3) clr.w d7 swap d7 move.b #'.',-(a3) ; dot .l12 tst.w d7 beq .l11 addq.l #1,D3 divu.w d1,d7 swap d7 add.b D0,D7 move.b d7,-(a3) clr.w d7 swap d7 bra .l12 .l11 move.b #32,-(A3) ; space move.l (A4),D1 ; cout move.l A3,D2 jsr Write(a6) move.l a6,a1 move.l a5,a6 jmp CloseLibrary(a6) PR0000 ;prints d5, uses a0,a1(scratch),d0,d1,d2,d3 moveq #4,D3 moveq #buf-cout,D2 add.l A4,D2 ; buf PR000N move.w #$0100,a0 move.l #$2f3a2f2f,d0 move.w #1000,d1 .b1000 add.w a0,d0 sub.w d1,d5 bcc.b .b1000 add.w d1,d5 moveq #100,d1 .b100 addq.b #1,d0 sub.w d1,d5 bcc.b .b100 add.w d1,d5 swap d0 moveq #10,d1 .b10 add.w a0,d0 sub.w d1,d5 bcc.b .b10 add.b d5,d0 move.l D0,buf-cout(A4) ; buf move.l (A4),D1 ; cout jmp Write(A6) ;call Write(stdout,buff,size) rasteri addq.l #1,(a1) ;If you set your interrupt to priority 10 or higher then a0 must point at $dff000 on exit moveq #0,d0 ; must set Z flag on exit! rts VBlankServer: dc.l 0,0 ;ln_Succ,ln_Pred dc.b NT_INTERRUPT,0 ;ln_Type,ln_Pri dc.l 0 ;ln_Name dc.l time,rasteri ;is_Data,is_Code cnop 0,4 cout dc.l 0 buf ds.b 4 time dc.l 0 ; Overwritten code/data start here. ra string = msg1 libname dc.b "dos.library",0 msg1 dc.b 'number pi calculator v13',10 msg4 dc.b 'number of digits (up to ' msg5 dc.b ')? ' msg3 dc.b ' digits will be printed',10 msg2 even getnum jsr Input(a6) ;get stdin moveq #msg1-cout,D2 add.l A4,D2 move.l d0,d1 moveq #5,d3 ;+ newline jsr Read(a6) move.l d2,a0 moveq #0,d5 .loop subq.w #1,d0 beq.b .done move.w #256-'0',d6 add.b (a0)+,d6 cmp.w #9,d6 bhi.b .error mulu.w #10,d5 add.w d6,d5 bra.b .loop .error moveq #0,d5 .done rts Buffy ds.b 65536-(Buffy-start) Last edited by Don_Adan; 22 June 2021 at 00:26. |
21 June 2021, 21:41 | #318 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
-2 bytes due to alignment (extra zero at the end is never used):
Code:
; moveq #msg2-msg3+1,d3 moveq #msg2-msg3,d3 ... ;msg3 dc.b ' digits will be printed' ;msg2 dc.b 10,0 msg3 dc.b ' digits will be printed',10 msg2 even Code:
IFGT .write-*-128 bsr.w .write ELSE bsr.b .write ENDC ; IFGT |
21 June 2021, 23:39 | #319 | ||
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,544
|
Quote:
But I see my work isn't done. I will test your latest version tonight. Quote:
|
||
22 June 2021, 00:28 | #320 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
|
Quote:
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68020 Bit Field Instructions | mcgeezer | Coders. Asm / Hardware | 9 | 27 October 2023 23:21 |
68060 64-bit integer math | BSzili | Coders. Asm / Hardware | 7 | 25 January 2021 21:18 |
Discovery: Math | Audio Snow | request.Old Rare Games | 30 | 20 August 2018 12:17 |
Math apps | mtb | support.Apps | 1 | 08 September 2002 18:59 |
|
|