19 November 2018, 20:41 | #841 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Code:
exeobj bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+ MC68020 output "ram:spigot2" ; 68020 size-optimised spigot ; entête nbch equ 1000 ; nb chiffres truc equ (nbch/2)*7 ; const utilisée un peu partout ; init move.l 4.w,a6 moveq #4,d0 ; DOS jsr -$32a(a6) ; OpenTaggedLibrary move.l d0,a6 lea zero(pc),a3 ; pointe sur le 00 ; message d'entête ; exg a5,a6 ; pour a6=dos move.l #truc,d7 lea buffer+truc*2(pc),a4 ; a0 = buffer ; remplissage initial move.l d7,d2 .fill move.w #2000,-(a4) subq.l #1,d2 bne.s .fill ; main loop, req. a4=buf et d7=truc - note : a3 libre moveq #5,D3 mulu.w (A4),D3 ; 2000*5 moveq #msg0-zero,d1 bsr.s aff .loop1 moveq #0,d5 move.l a4,a1 move.l d7,d0 ; i move.l d7,d4 add.l d4,d4 subq.l #1,d4 ; i*2-1 .loop2 mulu.l d0,d5 move.w (a1),d6 ; r[i] mulu.w d3,d6 ; r[i]*10000 add.l d6,d5 ; d += d + r[i]*10000 divul.l d4,d6:d5 move.w d6,(a1)+ ; d%b -> r[i] subq.l #2,d4 subq.l #1,d0 bgt.s .loop2 ; aff chiffres divul.l d3,d4:d5 ; d/10000 add.l d2,d5 ; +c ; aff nbr d5 .affd5 moveq #4,d1 ; nb ch moveq #10,d0 ; div.l shortcut .loop divul.l d0,d2:d5 addi.b #"0",d2 move.b d2,-1(a3,d1.l) subq.l #1,d1 bne.s .loop ; on retrouve a0="nnnn",0, afficher directement via aff ci-dessous ; normal cli print bsr.s aff ; suite move.l d4,d2 ; c = d % 10000; lea 28(a4),a4 ; 14 itérations de moins cette fois sub.w #14,d7 ; .w suffit bgt.s .loop1 ; fin moveq #lf-zero,d1 aff add.l a3,d1 jmp -$3b4(a6) ; PutStr? msg0 dc.b "pi calculator v6" ; 16 bytes lf dc.b 10 even dx.b 6 zero dx.b 6 buffer dx.b truc*2 dx.b 4 |
19 November 2018, 21:24 | #842 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
lf dc.b 10,0 Because some assemblers can place here random value and displayed text can be trashed. dx.b 4 at end can be removed too, i think. |
|
19 November 2018, 21:46 | #843 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
I tried to find ways to make A3 and A4 identical, but failed. I moved around registers so the code could jump into the tail check, but I couldn't find any use for it.
I feel there isn't much more to gain here. |
20 November 2018, 09:37 | #844 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
Code:
exeobj bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+ MC68020 output "ram:spigot2" ; 68020 size-optimised spigot ; entête nbch equ 1000 ; nb chiffres truc equ (nbch/2)*7 ; const utilisée un peu partout ; init move.l 4.w,a6 moveq #4,d0 ; DOS jsr -$32a(a6) ; OpenTaggedLibrary move.l d0,a6 lea zero(pc),a3 ; pointe sur le 00 ; message d'entête ; exg a5,a6 ; pour a6=dos move.l #truc,d7 lea buffer+truc*2(pc),a4 ; a0 = buffer ; remplissage initial move.l d7,d2 .fill move.w #2000,-(a4) subq.l #1,d2 ; d2.l must be zero bne.s .fill ; main loop, req. a4=buf et d7=truc - note : a3 libre moveq #5,D3 mulu.w (A4),D3 ; 2000*5 moveq #msg0-zero,d1 bsr.s aff .loop1 moveq #0,d5 move.l a4,a1 move.l d7,d0 ; i move.l d7,d4 add.l d4,d4 subq.l #1,d4 ; i*2-1 .loop2 mulu.l d0,d5 move.w (a1),d6 ; r[i] mulu.w d3,d6 ; r[i]*10000 add.l d6,d5 ; d += d + r[i]*10000 divul.l d4,d6:d5 move.w d6,(a1)+ ; d%b -> r[i] subq.l #2,d4 subq.l #1,d0 bgt.s .loop2 ; aff chiffres divul.l d3,d4:d5 ; d/10000 add.l d2,d5 ; +c ; aff nbr d5 .affd5 moveq #4,d1 ; nb ch moveq #10,d0 ; div.l shortcut .loop divul.l d0,d2:d5 addi.b #"0",d2 move.b d2,-(a3) subq.l #1,d1 ; d1.l must be 0 bne.s .loop ; on retrouve a0="nnnn",0, afficher directement via aff ci-dessous ; normal cli print bsr.s aff addq.l #4,A3 ; restore A3 ; suite move.l d4,d2 ; c = d % 10000; lea 28(a4),a4 ; 14 itérations de moins cette fois sub.w #14,d7 ; .w suffit bgt.s .loop1 ; fin moveq #lf-zero,d1 aff add.l a3,d1 jmp -$3b4(a6) ; PutStr? msg0 dc.b "pi calculator v6" ; 16 bytes lf dc.b 10,0 dx.b 4 zero dx.b 2 buffer dx.b truc*2 |
|
20 November 2018, 12:39 | #845 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
I know how to save two more bytes:
Hoist the "moveq #lf-zero,d1" instruction up before the "move.l d4,d2". Adjust space before/after 'zero' so that the 'lf' offset becomes 14. Adjust the other offset according to what you need. Then you can replace "sub.w #14,d7" with "sub.w d1,d7". Or if the values are negative then flip it around to use add.w. Last edited by NorthWay; 20 November 2018 at 13:22. |
20 November 2018, 13:23 | #846 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,406
|
That'd be 178 bytes executable/142 bytes of code if I'm calculating it correctly, right?
That's a pretty darned good result |
20 November 2018, 13:57 | #847 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Exe files are a multiple of 4 in size... but the code size is what it is anyway.
And 18(17 - the 0 can be implied (i.e. 141), but it becomes a multiple of 2 anyway) of those bytes are required and platform-agnostic (unless you got lucky and could use parts of the string as code/data for something). Last edited by NorthWay; 20 November 2018 at 14:52. Reason: Too optimistic. |
20 November 2018, 15:03 | #848 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Ok, 2 bytes gained, and truc can be no PC relative for this version, much universal i think.
Code:
exeobj bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+ MC68020 output "ram:spigot2" ; 68020 size-optimised spigot ; entête nbch equ 1000 ; nb chiffres truc equ (nbch/2)*7 ; const utilisée un peu partout ; init move.l 4.w,a6 moveq #4,d0 ; DOS jsr -$32a(a6) ; OpenTaggedLibrary move.l d0,a6 lea zero(pc),a3 ; pointe sur le 00 ; message d'entête move.l #truc,d7 lea 2(A3,D7.L*2),a4 ; a4 = end of buffer ; remplissage initial move.l d7,d2 .fill move.w #2000,-(a4) subq.l #1,d2 ; d2.l must be zero bne.s .fill ; main loop, req. a4=buf et d7=truc - note : a3 libre moveq #5,D3 mulu.w (A4),D3 ; 2000*5 moveq #msg0-zero,d1 bsr.s aff .loop1 moveq #0,d5 move.l a4,a1 move.l d7,d0 ; i move.l d7,d4 add.l d4,d4 subq.l #1,d4 ; i*2-1 .loop2 mulu.l d0,d5 move.w (a1),d6 ; r[i] mulu.w d3,d6 ; r[i]*10000 add.l d6,d5 ; d += d + r[i]*10000 divul.l d4,d6:d5 move.w d6,(a1)+ ; d%b -> r[i] subq.l #2,d4 subq.l #1,d0 bgt.s .loop2 ; aff chiffres divul.l d3,d4:d5 ; d/10000 add.l d2,d5 ; +c ; aff nbr d5 .affd5 moveq #4,d1 ; nb ch moveq #10,d0 ; div.l shortcut .loop divul.l d0,d2:d5 addi.b #"0",d2 move.b d2,-(a3) subq.l #1,d1 ; d1.l must be 0 bne.s .loop ; on retrouve a0="nnnn",0, afficher directement via aff ci-dessous ; normal cli print bsr.s aff addq.l #4,A3 ; restore A3 ; suite move.l d4,d2 ; c = d % 10000; lea 28(a4),a4 ; 14 itérations de moins cette fois moveq #lf-zero,d1 ; must be -14 add.l D1,d7 ; .w suffit bgt.s .loop1 aff add.l a3,d1 jmp -$3b4(a6) ; PutStr? msg0 dc.b "pi calculator v6" ; 16 bytes lf dc.b 10,0 ds.b 8 ; extra buff for If-Zero size ds.b 4 zero ds.b 2 buffer ds.b truc*2 Last edited by Don_Adan; 20 November 2018 at 22:16. |
20 November 2018, 17:09 | #849 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
You can gain 4 more bytes not using a4 at all but sp instead.
But surely someone would say that you are cheating because you have to use a non-minimum AmigaOS stack |
20 November 2018, 18:03 | #850 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Yes, 8k stack will be enough for this version, like for PC standards it will be very small stack. Like for Amiga standards a few larger than normal, but perhaps Amiga OS 3.5/3.9 has 8k stack as default?
|
20 November 2018, 23:55 | #851 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
|
21 November 2018, 00:36 | #852 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
BTW. Amiga strings must be ended with null (zero) byte. |
|
21 November 2018, 01:43 | #853 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,406
|
Quote:
Just my two cents, I’m still impressed by how small the actual code has gotten by now. |
|
21 November 2018, 09:15 | #854 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
No.. Well, i's clear that very often the code for 68k is more dense than that of other architectures (probably doing a scientific trial, with various algorithms of different sizes, is the most dense of all). But in this thread there are 'some' false or biased claims and comparisons of oranges with apples .. Much more honest for such a case would be to use (always for DOS) the .EXE format. What is the point compare a headerless format made for an environment with 70s limitations with a format that can support hundreds of megabytes of (different type of) ram? From the start should have been used pure code excluding even the output routine (so creating only the string in memory). But probably it would have been a lot less fun |
|
21 November 2018, 09:44 | #855 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,957
|
Quote:
https://palma.strom.sk/doc/fpc/prog/progse94.html https://palma.strom.sk/doc/fpc/prog/progse93.html Last edited by Don_Adan; 21 November 2018 at 09:51. |
|
21 November 2018, 12:03 | #856 |
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
|
Better yet would be to define a minimal embedded environment to avoid the overhead of executable formats at all. Imagine both a 68K and X86 machine where the code can live in ROM, and the program can poke a hardware register to output a byte. That, I think, would come as close to eliminating unfair advantages for either platform as possible, and would allow meaningful comparisons with other architectures, too.
|
21 November 2018, 12:27 | #857 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
I think you can save another 2 bytes:
Add after "lf dc.b 10,0": "const dc.w 2+truc*2,truc,10000,truc,msg0-zero,2000" then instead of "lea zero(pc),a3" do "lea const(pc),a3" "movem.w (a3)+,a4/d7/d3/d2/d1/d0" "add.l a3,a4". The fill loop uses d0 instead of an immediate. Remove the opcodes that set a4/d7/d3/d2/d1 before the loops. movem.w sign expands and we got lucky here that we could find 6 immediates so the lf offsets stays at 14. |
21 November 2018, 16:29 | #858 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
Actually this can be mixed with the stack trick (removing the adda.l a3,a4, inserting 28 to const to fill a4 and replacing lea 28 with adda.l a4,sp). For a grand total of 136 bytes of code |
|
21 November 2018, 16:58 | #859 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
uh, just noticed that the amiga executable this way becomes 172 bytes, the same as the headerless .COM
(header inflation costs 25% for this small routine, hence iniquity is insane..) |
21 November 2018, 17:03 | #860 | |||||||||||||||||||||
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
Quote:
Quote:
However this time I have a task which IMHO shows very poor code density of 68000 compared to 8086. I have 40 KB table which contains linked records. Thus every record has links to several other records. Using 8086 I can allocate only 2 bytes per such link. It looks like 68k requires 4 (!) bytes for a link because MOVEA and ADDA make sign extension. So instead of just MOV SI,[offset+SI] where SI points to the current record I need to have a kind of slow and complex math for the task if I want to have 2 byte links. Quote:
Quote:
Quote:
Quote:
1) it happened actually very rare (at chances abot 1 to 1 billion) - https://en.wikipedia.org/wiki/Pentium_FDIV_bug 2) my article is about CPUs before 1991, the last CPU considered are 80486 and 68040. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
BTW I bought Sound Blaster 16 in the beginning of 1993. IIRC it was slightly above $100 but it was possible to buy a quite good and much cheaper 8-bit Sound Blaster card. Thus we get about $900 mentioned by you but if I had bought my PC in the end of 1992 I would have paid rather less than $800. The segments were the best idea for the late 70s. 68k was always misleading by VAX-architecture and empty non-practical theories. 386 has 6 segments, 6*64 gives 384 KB. Quote:
Quote:
Quote:
Quote:
Quote:
What cross-assembler do you use? It looks like vasm can't work with this source. Last edited by litwr; 21 November 2018 at 17:38. |
|||||||||||||||||||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Any software to see technical OS details? | necronom | support.Other | 3 | 02 April 2016 12:05 |
2-star rarity details? | stet | HOL suggestions and feedback | 0 | 14 December 2015 05:24 |
EAB's FTP details... | Basquemactee1 | project.Amiga File Server | 2 | 30 October 2013 22:54 |
req details for sdl | turrican3 | request.Other | 0 | 20 April 2008 22:06 |
Forum Details | BippyM | request.Other | 0 | 15 May 2006 00:56 |
|
|