19 May 2021, 01:08 | #121 |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,543
|
|
19 May 2021, 01:50 | #122 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
And of course you can replace your strange
lsl.l D5 with add.l D5,D5 |
19 May 2021, 03:09 | #123 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Minimum one bug was in previous version. Here is fixed version. I hope.
Code:
clr.l -(SP) ; cv .l0 .... add.w (SP),D5 ; cv move.l D5,(SP) ; cv bsr PR0000 endif sub.w #14,d6 ;kv bne .l0 addq.l #4,SP ; restore stack ..... PR0000 ;prints d5, uses a0,a1(scratch),d0,d1,d2,d3 lea $100.W,A0 move.l #$303A3030,D2 move.w #1000,D3 b1000 sub.w D3,D5 bcs.b n100 add.w A0,D2 bra.b b1000 n100 add.w D3,D5 moveq #100,D3 b100 sub.w D3,D5 bcs.b n10 addq.b #1,D2 bra.b b100 n10 add.w D3,D5 swap D2 moveq #10,D3 b10 sub.w D3,D5 bcs.b n1 add.w A0,D2 bra.b b10 n1 add.b D5,D2 lea cout(PC),A0 move.l (A0)+,D1 move.l D2,(A0) move.l A0,D2 ; buf moveq #4,D3 jmp Write(A6) ;call Write(stdout,buff,size) time dc.l 0 cout dc.l 0 buf dc.l 0 |
19 May 2021, 16:57 | #124 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
On one specific Amiga CD32 my custom NVRAM routines do not seem to work properly. The owner can't run tests for me at the moment. Details are in this post and in the thread it belongs to (you already know it). |
|
19 May 2021, 20:29 | #125 | ||||||
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
Maybe I will use an optimization similar to this one but the gain is very little so I will rather hesitate. Quote:
Moreover, more digits of the pi we count, the less gain we get from your optimization. Let us examine 1000 digits case. The main loop takes place about 400,000 times, the output loop takes place exactly 250 times. The main loop takes more than 100 cycles. So your optimization gives us less than 50*250*100/400000/100 approx. = 0.03%. This value is also undetectable. So we need at least 1000 saved cycles in the output loop to get some detectable gain. Quote:
The attached screenshot from the manual confirms my point. This manual is available here - http://wpage.unina.it/rcanonic/didat...docs/68000.pdf The same information is available in the other manual - https://web.njit.edu/~rosensta/class...tware/code.pdf The official manual - https://www.nxp.com/docs/en/referenc.../M68000PRM.pdf also confirms my point. So britelite is rather not correct. Quote:
Skillgrid looks perfect for me. I would like to have opportunities to help but now it is rather impossible. Quote:
Thank you. Xlife-8 is a simplified port of Xlife - http://litwr2.atspace.eu/xlife.php However it has more colorful graphics and less scientific. I have also plan to port Xlife for the Amiga but I must convert X Window calls to Amiga function calls at first. Of course, Xlife-8 is GoL but it also allows you to define new rules. Quote:
It is the briefest. Last edited by BippyM; 01 June 2021 at 18:24. |
||||||
19 May 2021, 21:01 | #126 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Really? You are funny. Only 50 cycles fastest. How you calculated this? Less divu.w, less bsr.b, less rts and less some other code. And dozen bytes larger? How you calc this, again this is full optimisation not partially.
Now only 8 bytes in total for cv handling. Your old version, still available on github: clr cv ;6 bytes add cv(pc),d5 ;4 bytes swap d5 ; 2 bytes move d5,cv ; 6 bytes clr d5 ; 2 bytes swap d5 ; 2 bytes cv dc.w 0 ; 2 bytes 24 bytes vs 8 bytes. Your PR0000 has 54 bytes, my version of PR0000 has 66 bytes. In total 78 bytes (Your) vs 74 bytes (my). Where is your DOZEN bytes larger code? You was too lazy to check? Or you still must learn more about 68k coding? |
19 May 2021, 21:25 | #127 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
btw. you can gain 2 bytes more.
Code:
move.l #start+$10000-ra,d0 divu #7,d0 ext.l d0 and.b #$fc,d0 move.l d0,d7 ;d7=maxn Code:
move.l #start+$10000-ra,d7 divu.w #7,d7 ext.l d7 and.b #$fc,d7 ;d7=maxn |
19 May 2021, 21:49 | #128 | |||
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
Pages 4-113 through 4-115 of the very same manual you linked to describe the LSL and LSR instructions. The assembler syntax they provide is: LSd Dx,Dy LSd # < data > ,Dy LSd < ea > Then, regarding the instructions working on registers (first two syntax forms) it says: The shift count for the shifting of a register is specified in two different ways: ... The size of the operation for register destinations may be specified as byte, word, or long. That's followed by an explanation of the last syntax form: The contents of memory, < ea > , can be shifted one bit only, and the operand size is restricted to a word. Note that: this last explanation can't refer to the first two forms because they have a variable count, not a fixed count; <ea> is explicitly said to refer to memory. Further on, where the instruction formats are provided, the manual has two sections: REGISTER SHIFTS and MEMORY SHIFTS. The first section deals with the first two syntax forms and does not mention <ea>; the latter deals with the third form and, unsurprisingly, it mentions <ea> and shows its encodings, which exclude registers (and there's even an explicit note that says "Only memory alterable addressing modes can be used"). In short: shift (as well as rotate - check out the description or ROd) instructions applied to registers require a count and thus lsr.l d5 is not syntactically correct. Quote:
Quote:
|
|||
19 May 2021, 21:57 | #129 | ||
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
We have the same sequence of instructions in both programs. The only difference is a position of a label for a branch. Code:
bra .enddiv align 2 ; this is used only in pi-align .l2 ;aligned in pi-align, and not aligned in pi-na ... bcc .l2 ; this branch is aligned pi-align and not aligned in pi-na Quote:
I removed CLR CV long ago but committed it only several minutes ago. Thank you again. Thank you very much! Your changes are committed. EDIT. Both versions (the 68000 and 68020/30) can now show up to 9280 digits! Last edited by litwr; 19 May 2021 at 22:51. |
||
19 May 2021, 22:36 | #130 | |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
Of course, the contents of memory, <ea> , can be shifted one bit only. But this doesn't mean that <ea> means only contents of memory in the assembly syntax headlines. I posted a screenshot that proves that <ea> may be used as a reference to a register. In the chapter Instruction Format Summary they don't show exact instruction syntax, they show only encoding. So it is irrelevant to assembly language syntax. The claim "LSL.L D5 is not syntactically correct" is also controversial because this syntax is correct in VASM. Bruce Abbott confirmed that in several popular assemblers it is not syntactically correct. But we can't generalize this. There are always some diversity in assemblies. Foe example, we have several popular syntaxes for the x86. |
|
19 May 2021, 22:57 | #131 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
68000 timings divu #1000,d5 ; 144 cycles bsr .l0 ; 18 cycles clr d5 ; 4 cycles swap d5 ; 4 cycles .... .l0 eori.b #'0',d5 ; 8 cycles move.b d5,(a0)+ ; 8 cycles rts ; 16 cycles in total for one divu.w 202 cycles b1000 sub.w D3,D5 ; 4 cycles bcs.b n100 ; 8 cycles, 10 if taken add.w A0,D2 ; 4 cycles bra.b b1000 ; 10 cycles best case 14 cycles, worst case 284 cycles, average 298/2=149 cycles 202-149=53 cycles 3 times divu.w called, 3x53 cycles, average about 150 cycles less per one access to PR0000 routine. Plus much fastest code for cv handling. In total, about 165 cycles fastest per one access. |
|
19 May 2021, 23:27 | #132 | |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
Quote:
Try using 'cnop 0,4' to align with next long word address instead. |
|
19 May 2021, 23:42 | #133 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 787
|
Quote:
Now, there is nothing controversial here: the pages of Motorola's official manual define clearly the syntax of the instructions, how the instructions work and how they are encoded. You simply fail to understand those pages. I tried to help you with an almost word-by-word guidance, but given that you choose not to see, I won't add anything else. |
|
20 May 2021, 00:48 | #134 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Next 2 bytes less.
Code:
move.l #start+$10000-ra,D7 divu.w #7*4,D7 ext.l D7 lsl.w #2,D7 ;d7=maxn Code:
move.l #start+$10000-ra,D7 divu.w #7*4,D7 lsl.w #2,D7 ;d7=maxn Last edited by Don_Adan; 20 May 2021 at 00:56. |
20 May 2021, 02:43 | #135 | ||||
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,543
|
Quote:
Quote:
There are equivalent opcodes using Dn explicitly that are valid, which an assembler could alias to for 'convenience'. This also applies to some other instructions that do have an equivalent <ea> opcode, which is a pain when the assembler silently changes one to the other without asking or providing any way to avoid it (sometimes we need to have the exact opcode we asked for!). Quote:
Quote:
"The great thing about standards is that there are so many of them!"Yes, some code that is "syntactically correct" in one assembler may not be in another. To avoid confusion and maintain compatibility it is best to stick to a common subset with unambiguous syntax where possible, and specify the syntax used when it isn't. Otherwise people may have trouble understanding and using your code. |
||||
20 May 2021, 02:51 | #136 |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,543
|
|
20 May 2021, 03:30 | #137 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
Here is example: from move #10,d4 to moveq #10,D4 Mostly time calculation routine can be optimised for space. |
|
20 May 2021, 03:46 | #138 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Perhaps this code can be shortened/optimised too. A few shortet a few fastest.
Code:
.l7 ; lsr d6 mulu #7,d6 ;kv = d6 move.l d6,d3 lea.l ra(pc),a3 exg.l a5,a6 jsr Forbid(a6) moveq.l #INTB_VERTB,d0 lea.l VBlankServer(pc),a1 jsr AddIntServer(a6) exg.l a5,a6 ;move.w #$4000,$dff096 ;DMA off ; lsr d3 lsr.w #2,D3 subq #1,d3 move.l #2000*65537,d0 move.l a3,a0 .fill move.l d0,(a0)+ dbra d3,.fill .l0 clr.l d5 ;d <- 0 ; clr.l d4 clr.l d7 ; move d6,d4 ;i <- kv ; add.l d4,d4 ;i <- i*2 move.l D6,D4 adda.l d4,a3 ..... endif ; sub.w #14,d6 ;kv sub.w #28,D6 bne .l0 |
20 May 2021, 08:31 | #139 | |||
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,543
|
Quote:
On litwr's benchmark site the main loop code size is 54 bytes on a 50MHz 68030 vs 57 bytes (~6% larger) on a 25MHz 386, while the 386 would theoretically be ~5% faster if running at the same clock speed. Some speed optimization might make the Amiga code 5% quicker but 5% larger, and therefore virtually identical to the 386 (except that the 030 is 25% faster in real terms because 386's top out at 40MHz). It's also good to see the Amiga 1200 with Blizzard 1230-IV beating a 36MHz ARM3 and a 33MHz 80486 (though of course these figures don't mean much in the real world). Quote:
Quote:
|
|||
20 May 2021, 09:40 | #140 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,322
|
Quote:
As an example, most assemblers will accept moveq.l even though it is technically incorrect (moveq has no size). Phxass - and vasm in phxass compatibility mode - will accept move.b to/from ccr but this is also incorrect (operand size is .w). So about LSL.L D5 being syntaxically correct or not, it is a matter of how you see it -- but it is always cleaner to not depend on anything incompatible when it is easy to do otherwise. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68020 Bit Field Instructions | mcgeezer | Coders. Asm / Hardware | 9 | 27 October 2023 23:21 |
68060 64-bit integer math | BSzili | Coders. Asm / Hardware | 7 | 25 January 2021 21:18 |
Discovery: Math | Audio Snow | request.Old Rare Games | 30 | 20 August 2018 12:17 |
Math apps | mtb | support.Apps | 1 | 08 September 2002 18:59 |
|
|