68k details - Page 43

NorthWay · 19 November 2018, 20:41

Code:

		exeobj
		bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+
		MC68020
	output "ram:spigot2"

; 68020 size-optimised spigot

; entête
nbch equ 1000			; nb chiffres
truc equ (nbch/2)*7		; const utilisée un peu partout

; init
		move.l	4.w,a6
		moveq	#4,d0		; DOS
		jsr	-$32a(a6)	; OpenTaggedLibrary
		move.l	d0,a6
		lea	zero(pc),a3		; pointe sur le 00

; message d'entête
; exg a5,a6			; pour a6=dos
		move.l	#truc,d7
		lea	buffer+truc*2(pc),a4	; a0 = buffer
; remplissage initial
		move.l	d7,d2
.fill
		move.w	#2000,-(a4)
		subq.l	#1,d2
		bne.s	.fill

; main loop, req. a4=buf et d7=truc - note : a3 libre
		moveq	#5,D3
		mulu.w	(A4),D3		; 2000*5

		moveq	#msg0-zero,d1
		bsr.s	aff
.loop1
		moveq	#0,d5
		move.l	a4,a1
		move.l	d7,d0			; i
		move.l	d7,d4
		add.l	d4,d4
		subq.l	#1,d4			; i*2-1
.loop2
		mulu.l	d0,d5
		move.w	(a1),d6			; r[i]
		mulu.w	d3,d6			; r[i]*10000
		add.l	d6,d5			; d += d + r[i]*10000
		divul.l	d4,d6:d5
		move.w	d6,(a1)+		; d%b -> r[i]
		subq.l	#2,d4
		subq.l	#1,d0
		bgt.s	.loop2

; aff chiffres
		divul.l	d3,d4:d5		; d/10000
		add.l	d2,d5			; +c
; aff nbr d5
.affd5
		moveq	#4,d1			; nb ch
		moveq	#10,d0			; div.l shortcut
.loop
		divul.l	d0,d2:d5
		addi.b	#"0",d2
		move.b	d2,-1(a3,d1.l)
		subq.l	#1,d1
		bne.s	.loop
; on retrouve a0="nnnn",0, afficher directement via aff ci-dessous
; normal cli print
		bsr.s	aff

; suite
		move.l	d4,d2			; c = d % 10000;
		lea	28(a4),a4		; 14 itérations de moins cette fois
		sub.w	#14,d7			; .w suffit
		bgt.s	.loop1

; fin
		moveq	#lf-zero,d1
aff
		add.l	a3,d1
		jmp	-$3b4(a6)		; PutStr?


msg0
		dc.b	"pi calculator v6"	; 16 bytes
lf
		dc.b	10
	even
		dx.b	6
zero
		dx.b	6
buffer
		dx.b	truc*2
		dx.b	4

Don_Adan · 19 November 2018, 21:24

Quote:

Originally Posted by NorthWay

Code:

		exeobj
		bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+
		MC68020
	output "ram:spigot2"

; 68020 size-optimised spigot

; entête
nbch equ 1000			; nb chiffres
truc equ (nbch/2)*7		; const utilisée un peu partout

; init
		move.l	4.w,a6
		moveq	#4,d0		; DOS
		jsr	-$32a(a6)	; OpenTaggedLibrary
		move.l	d0,a6
		lea	zero(pc),a3		; pointe sur le 00

; message d'entête
; exg a5,a6			; pour a6=dos
		move.l	#truc,d7
		lea	buffer+truc*2(pc),a4	; a0 = buffer
; remplissage initial
		move.l	d7,d2
.fill
		move.w	#2000,-(a4)
		subq.l	#1,d2
		bne.s	.fill

; main loop, req. a4=buf et d7=truc - note : a3 libre
		moveq	#5,D3
		mulu.w	(A4),D3		; 2000*5

		moveq	#msg0-zero,d1
		bsr.s	aff
.loop1
		moveq	#0,d5
		move.l	a4,a1
		move.l	d7,d0			; i
		move.l	d7,d4
		add.l	d4,d4
		subq.l	#1,d4			; i*2-1
.loop2
		mulu.l	d0,d5
		move.w	(a1),d6			; r[i]
		mulu.w	d3,d6			; r[i]*10000
		add.l	d6,d5			; d += d + r[i]*10000
		divul.l	d4,d6:d5
		move.w	d6,(a1)+		; d%b -> r[i]
		subq.l	#2,d4
		subq.l	#1,d0
		bgt.s	.loop2

; aff chiffres
		divul.l	d3,d4:d5		; d/10000
		add.l	d2,d5			; +c
; aff nbr d5
.affd5
		moveq	#4,d1			; nb ch
		moveq	#10,d0			; div.l shortcut
.loop
		divul.l	d0,d2:d5
		addi.b	#"0",d2
		move.b	d2,-1(a3,d1.l)
		subq.l	#1,d1
		bne.s	.loop
; on retrouve a0="nnnn",0, afficher directement via aff ci-dessous
; normal cli print
		bsr.s	aff

; suite
		move.l	d4,d2			; c = d % 10000;
		lea	28(a4),a4		; 14 itérations de moins cette fois
		sub.w	#14,d7			; .w suffit
		bgt.s	.loop1

; fin
		moveq	#lf-zero,d1
aff
		add.l	a3,d1
		jmp	-$3b4(a6)		; PutStr?


msg0
		dc.b	"pi calculator v6"	; 16 bytes
lf
		dc.b	10
	even
		dx.b	6
zero
		dx.b	6
buffer
		dx.b	truc*2
		dx.b	4

Looks good for me. I will only remove "even" and replace with
lf
dc.b 10,0
Because some assemblers can place here random value and displayed text can be trashed.

dx.b 4 at end can be removed too, i think.

NorthWay · 19 November 2018, 21:46

I tried to find ways to make A3 and A4 identical, but failed. I moved around registers so the code could jump into the tail check, but I couldn't find any use for it.
I feel there isn't much more to gain here.

Don_Adan · 20 November 2018, 09:37

Quote:

Originally Posted by NorthWay

I tried to find ways to make A3 and A4 identical, but failed. I moved around registers so the code could jump into the tail check, but I couldn't find any use for it.
I feel there isn't much more to gain here.

I made only fastest a few version, but same size, if i remember right.

Code:

		exeobj
		bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+
		MC68020
	output "ram:spigot2"

; 68020 size-optimised spigot

; entête
nbch equ 1000			; nb chiffres
truc equ (nbch/2)*7		; const utilisée un peu partout

; init
		move.l	4.w,a6
		moveq	#4,d0		; DOS
		jsr	-$32a(a6)	; OpenTaggedLibrary
		move.l	d0,a6
		lea	zero(pc),a3		; pointe sur le 00

; message d'entête
; exg a5,a6			; pour a6=dos
		move.l	#truc,d7
		lea	buffer+truc*2(pc),a4	; a0 = buffer
; remplissage initial
		move.l	d7,d2
.fill
		move.w	#2000,-(a4)
		subq.l	#1,d2                   ; d2.l must be zero
		bne.s	.fill

; main loop, req. a4=buf et d7=truc - note : a3 libre
		moveq	#5,D3
		mulu.w	(A4),D3		; 2000*5

		moveq	#msg0-zero,d1
		bsr.s	aff
.loop1
		moveq	#0,d5
		move.l	a4,a1
		move.l	d7,d0			; i
		move.l	d7,d4
		add.l	d4,d4
		subq.l	#1,d4			; i*2-1
.loop2
		mulu.l	d0,d5
		move.w	(a1),d6			; r[i]
		mulu.w	d3,d6			; r[i]*10000
		add.l	d6,d5			; d += d + r[i]*10000
		divul.l	d4,d6:d5
		move.w	d6,(a1)+		; d%b -> r[i]
		subq.l	#2,d4
		subq.l	#1,d0
		bgt.s	.loop2

; aff chiffres
		divul.l	d3,d4:d5		; d/10000
		add.l	d2,d5			; +c
; aff nbr d5
.affd5
		moveq	#4,d1			; nb ch
		moveq	#10,d0			; div.l shortcut
.loop
		divul.l	d0,d2:d5
		addi.b	#"0",d2
		move.b	d2,-(a3)
		subq.l	#1,d1                   ; d1.l must be 0
		bne.s	.loop
; on retrouve a0="nnnn",0, afficher directement via aff ci-dessous
; normal cli print
		bsr.s	aff
                addq.l   #4,A3                  ; restore A3
; suite
		move.l	d4,d2			; c = d % 10000;
		lea	28(a4),a4		; 14 itérations de moins cette fois
		sub.w	#14,d7			; .w suffit
		bgt.s	.loop1

; fin
		moveq	#lf-zero,d1
aff
		add.l	a3,d1
		jmp	-$3b4(a6)		; PutStr?


msg0
		dc.b	"pi calculator v6"	; 16 bytes
lf
		dc.b	10,0
		dx.b	4
zero
		dx.b	2
buffer
		dx.b	truc*2

NorthWay · 20 November 2018, 12:39

I know how to save two more bytes:
Hoist the "moveq #lf-zero,d1" instruction up before the "move.l d4,d2".
Adjust space before/after 'zero' so that the 'lf' offset becomes 14. Adjust the other offset according to what you need. Then you can replace "sub.w #14,d7" with "sub.w d1,d7". Or if the values are negative then flip it around to use add.w.

roondar · 20 November 2018, 13:23

That'd be 178 bytes executable/142 bytes of code if I'm calculating it correctly, right?

That's a pretty darned good result

NorthWay · 20 November 2018, 13:57

Exe files are a multiple of 4 in size... but the code size is what it is anyway.
And 18(17 - the 0 can be implied (i.e. 141), but it becomes a multiple of 2 anyway) of those bytes are required and platform-agnostic (unless you got lucky and could use parts of the string as code/data for something).

Don_Adan · 20 November 2018, 15:03

Ok, 2 bytes gained, and truc can be no PC relative for this version, much universal i think.

Code:

		exeobj
		bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+
		MC68020
	output "ram:spigot2"

; 68020 size-optimised spigot

; entête
nbch equ 1000			; nb chiffres
truc equ (nbch/2)*7		; const utilisée un peu partout

; init
		move.l	4.w,a6
		moveq	#4,d0		; DOS
		jsr	-$32a(a6)	; OpenTaggedLibrary
		move.l	d0,a6
		lea	zero(pc),a3		; pointe sur le 00

; message d'entête

		move.l	#truc,d7
		lea	2(A3,D7.L*2),a4	; a4 = end of buffer
; remplissage initial
		move.l	d7,d2
.fill
		move.w	#2000,-(a4)
		subq.l	#1,d2                   ; d2.l must be zero
		bne.s	.fill

; main loop, req. a4=buf et d7=truc - note : a3 libre
		moveq	#5,D3
		mulu.w	(A4),D3		; 2000*5

		moveq	#msg0-zero,d1
		bsr.s	aff
.loop1
		moveq	#0,d5
		move.l	a4,a1
		move.l	d7,d0			; i
		move.l	d7,d4
		add.l	d4,d4
		subq.l	#1,d4			; i*2-1
.loop2
		mulu.l	d0,d5
		move.w	(a1),d6			; r[i]
		mulu.w	d3,d6			; r[i]*10000
		add.l	d6,d5			; d += d + r[i]*10000
		divul.l	d4,d6:d5
		move.w	d6,(a1)+		; d%b -> r[i]
		subq.l	#2,d4
		subq.l	#1,d0
		bgt.s	.loop2

; aff chiffres
		divul.l	d3,d4:d5		; d/10000
		add.l	d2,d5			; +c
; aff nbr d5
.affd5
		moveq	#4,d1			; nb ch
		moveq	#10,d0			; div.l shortcut
.loop
		divul.l	d0,d2:d5
		addi.b	#"0",d2
		move.b	d2,-(a3)
		subq.l	#1,d1                   ; d1.l must be 0
		bne.s	.loop
; on retrouve a0="nnnn",0, afficher directement via aff ci-dessous
; normal cli print
		bsr.s	aff
                addq.l   #4,A3                  ; restore A3
; suite
		move.l	d4,d2			; c = d % 10000;
		lea	28(a4),a4		; 14 itérations de moins cette fois

		moveq	#lf-zero,d1             ; must be  -14

		add.l	D1,d7			; .w suffit
		bgt.s	.loop1

aff
		add.l	a3,d1
		jmp	-$3b4(a6)		; PutStr?

msg0
		dc.b	"pi calculator v6"	; 16 bytes
lf
		dc.b	10,0
                ds.b    8                       ; extra buff for If-Zero size
		ds.b	4
zero
		ds.b	2
buffer
		ds.b	truc*2

ross · 20 November 2018, 17:09

You can gain 4 more bytes not using a4 at all but sp instead.

But surely someone would say that you are cheating because you have to use a non-minimum AmigaOS stack

Don_Adan · 20 November 2018, 18:03

Quote:

Originally Posted by ross

You can gain 4 more bytes not using a4 at all but sp instead.

But surely someone would say that you are cheating because you have to use a non-minimum AmigaOS stack

Yes, 8k stack will be enough for this version, like for PC standards it will be very small stack. Like for Amiga standards a few larger than normal, but perhaps Amiga OS 3.5/3.9 has 8k stack as default?

NorthWay · 20 November 2018, 23:55

Quote:

Originally Posted by ross

You can gain 4 more bytes not using a4 at all but sp instead.

But of course!

176 bytes filesize, 138(137) bytes program size. Works like a charm.
Is there some kind of hunk type that has less overhead?

Don_Adan · 21 November 2018, 00:36

Quote:

Originally Posted by NorthWay

But of course!

176 bytes filesize, 138(137) bytes program size. Works like a charm.
Is there some kind of hunk type that has less overhead?

You can check WHDload main exe file, but i dont remember difference, no $000003F2 at end or exe merged with data file.
BTW. Amiga strings must be ended with null (zero) byte.

roondar · 21 November 2018, 01:43

Quote:

Originally Posted by ross

You can gain 4 more bytes not using a4 at all but sp instead.

But surely someone would say that you are cheating because you have to use a non-minimum AmigaOS stack

I hate to be that guy, but I would indeed find it a form of cheating (if OS3.5/3.9 has a bigger stack then it’s ok for running there, but still...). IMHO it’s not really in the spirit of the challenge to do this and then pray the extra unallocated but used stack space doesn’t overwrite something important.

Just my two cents, I’m still impressed by how small the actual code has gotten by now.

ross · 21 November 2018, 09:15

Quote:

Originally Posted by Don_Adan

Yes, 8k stack will be enough for this version, like for PC standards it will be very small stack. Like for Amiga standards a few larger than normal, but perhaps Amiga OS 3.5/3.9 has 8k stack as default?

I think it's always 4k (but sure enlarged at the very start in every serious environment).

Quote:

Originally Posted by NorthWay

Is there some kind of hunk type that has less overhead?

No..

Quote:

Originally Posted by roondar

I hate to be that guy,

Well, i's clear that very often the code for 68k is more dense than that of other architectures (probably doing a scientific trial, with various algorithms of different sizes, is the most dense of all).
But in this thread there are 'some' false or biased claims and comparisons of oranges with apples ..

Much more honest for such a case would be to use (always for DOS) the .EXE format. What is the point compare a headerless format made for an environment with 70s limitations with a format that can support hundreds of megabytes of (different type of) ram?

From the start should have been used pure code excluding even the output routine (so creating only the string in memory).

But probably it would have been a lot less fun

Don_Adan · 21 November 2018, 09:44

Quote:

Originally Posted by ross

I think it's always 4k (but sure enlarged at the very start in every serious environment).

No..

Well, i's clear that very often the code for 68k is more dense than that of other architectures (probably doing a scientific trial, with various algorithms of different sizes, is the most dense of all).
But in this thread there are 'some' false or biased claims and comparisons of oranges with apples ..

Much more honest for such a case would be to use (always for DOS) the .EXE format. What is the point compare a headerless format made for an environment with 70s limitations with a format that can support hundreds of megabytes of (different type of) ram?

From the start should have been used pure code excluding even the output routine (so creating only the string in memory).

But probably it would have been a lot less fun

No, size of stack was changed. 4k was for Amiga OS 3.0/3.1. Maybe for 2.0/2.1 too. For Amiga OS 3.5/3.9 it was 8k, i think, but i cant check this now. Many MUI programs needs 8k stack minimum. Some needs 32k. Amiga ports from Unix/PC needs bigger stacks (often 100k or bigger). I found info that Atari ST uses 8k stack.

https://palma.strom.sk/doc/fpc/prog/progse94.html
https://palma.strom.sk/doc/fpc/prog/progse93.html

robinsonb5 · 21 November 2018, 12:03

Quote:

Originally Posted by ross

Much more honest for such a case would be to use (always for DOS) the .EXE format. What is the point compare a headerless format made for an environment with 70s limitations with a format that can support hundreds of megabytes of (different type of) ram?

Better yet would be to define a minimal embedded environment to avoid the overhead of executable formats at all. Imagine both a 68K and X86 machine where the code can live in ROM, and the program can poke a hardware register to output a byte. That, I think, would come as close to eliminating unfair advantages for either platform as possible, and would allow meaningful comparisons with other architectures, too.

NorthWay · 21 November 2018, 12:27

I think you can save another 2 bytes:
Add after "lf dc.b 10,0": "const dc.w 2+truc*2,truc,10000,truc,msg0-zero,2000" then instead of "lea zero(pc),a3" do "lea const(pc),a3" "movem.w (a3)+,a4/d7/d3/d2/d1/d0" "add.l a3,a4". The fill loop uses d0 instead of an immediate. Remove the opcodes that set a4/d7/d3/d2/d1 before the loops.

movem.w sign expands and we got lucky here that we could find 6 immediates so the lf offsets stays at 14.

ross · 21 November 2018, 16:29

Quote:

Originally Posted by NorthWay

I think you can save another 2 bytes:
Add after "lf dc.b 10,0": "const dc.w 2+truc*2,truc,10000,truc,msg0-zero,2000" then instead of "lea zero(pc),a3" do "lea const(pc),a3" "movem.w (a3)+,a4/d7/d3/d2/d1/d0" "add.l a3,a4". The fill loop uses d0 instead of an immediate. Remove the opcodes that set a4/d7/d3/d2/d1 before the loops.

movem.w sign expands and we got lucky here that we could find 6 immediates so the lf offsets stays at 14.

Great catch! (you only need to reverse the const fill, from d0 to a4).

Actually this can be mixed with the stack trick (removing the adda.l a3,a4, inserting 28 to const to fill a4 and replacing lea 28 with adda.l a4,sp).

For a grand total of 136 bytes of code

ross · 21 November 2018, 16:58

uh, just noticed that the amiga executable this way becomes 172 bytes, the same as the headerless .COM

(header inflation costs 25% for this small routine, hence iniquity is insane..)

litwr · 21 November 2018, 17:03

Quote:

Originally Posted by meynaf

It is not VAX-like ISA. The 68k didn't blindly provide everything blindly like VAX, they did statistical survey of programs to give only useful instructions.
Unlike, of course, x86 which was made without any care about what's useful and what's not (especially the late revisions, done for marketing purposes more than anything else).

Indeed, 68k is not 100% VAX clone but it is much closer to VAX than x86 or ARM. IMHO the address registers are an original idea but rather poor. It is often very useful to use the same register for data and address.

Quote:

Originally Posted by meynaf

And anyway x86 isn't 100% position-idependent, even with segmentation. You can still load constants in segment registers and end up with absolute code.
Code being position independent depends on what the programmer is doing. It's not linked to ISA quality.

COM-format is the example of PIC showing x86 advantage over 68k. Of course, it is always possible to write a bad code, as you mentioned, loading something strange into a segment register, but try to load something similar to an 68k address register and then use it and you get the same consequences.

Quote:

Originally Posted by meynaf

Of course i won't write code for x86 !
But you say that GCC can't be beaten so GCC can generate x86 code if you don't want to do that yourself.
Then we compare with 68k code that i can write.
And 68k code will always be smaller (and you know this).

It is interesting. Do you have any algorithm to implement? But it will take some time for me. Don't point to large algorithms. We can see that the tiny spigot algorithm is being optimized for 68k during months.

However this time I have a task which IMHO shows very poor code density of 68000 compared to 8086. I have 40 KB table which contains linked records. Thus every record has links to several other records. Using 8086 I can allocate only 2 bytes per such link. It looks like 68k requires 4 (!) bytes for a link because MOVEA and ADDA make sign extension. So instead of just MOV SI,[offset+SI] where SI points to the current record I need to have a kind of slow and complex math for the task if I want to have 2 byte links.

Quote:

Originally Posted by meynaf

I told you that it worked only for small amounts of memory.
You say i'm completely wrong but you just don't contradict that point.
And now i'm saying code that makes such assumptions is dirty.

You forget that originally your program asked for the number of digits to output and allocated memory had to be dynamic. I just didn't do many efforts - unlike you - to reduce code size.
You really badly want to win here and i'm wondering why.

It gives you more than 9200 digits. It is the limit for 2-byte values in the main array. If you need more digits you need also 4-byte values - this changes code too much. Do you want to use an array larger than 64 KB? This makes code slower and my benchmark is limited to 64 KB bounds because I have codes for 6502, z80, ... However it have nothing common with your blame about cleanness of the code. It is right and neat code. The original one shows the limit and asks to new input if an entered number is outside shown boundaries for it.

Quote:

Originally Posted by roondar

All of the 50MHz 486DX's were affected. I have no sales figures for them, but I'd be rather surprised if only a few were sold initially. It also prompted Intel to do a redesign of the CPU (into a clock for clock lower performance part even!) and release whole new models a few years later to effectively replace the defective design - which is exactly the same as what happened with the 68LC040 (though there Motorola ended up fixing it by enabling the FPU instead).

Sorry, but you are wrong again - https://ancientelectronics.wordpress...r-speed-demon/

Quote:

Originally Posted by roondar

But I expected you to say something like this - problems for Intel are always made as small as possible by you (irrespective of how bad they actually were) and then promptly ignored for your articles, where problems for Motorola are always made as big as possible (again, irrespective of how bad they actually were) and you promptly note they have to be added to your articles. This comes back to what I said earlier about you being rather biased.

Please supply your accusations by facts.

Quote:

Originally Posted by roondar

And seriously, the FDIV bug is pretty much the best known processor bug of all time and ended up costing Intel nearly half a billion dollars, yet you claim there was no real issue because it didn't really affect users that much. You're just not looking at history in a realistic way if you make claims like this.

I haven't written about it having two reasons:
1) it happened actually very rare (at chances abot 1 to 1 billion) - https://en.wikipedia.org/wiki/Pentium_FDIV_bug
2) my article is about CPUs before 1991, the last CPU considered are 80486 and 68040.

Quote:

Originally Posted by roondar

Moreover, it fits a pattern - every time you find a CPU bug in the Motorola range you (almost gleefully) note you simply have to add that to the Motorola article - irrespective of the severity of the bug. Yet every time you are told about/find a CPU bug in the Intel range you make up excuses as for why that bug doesn't really matter and definitely won't add it to your Intel article (again, irrespective of the severity of the bug).

IMHO 68LC040 had initially absolutely intolerable defect. I have written about notable x86 bugs and quirks - read my article carefully and you will find it.

Quote:

Originally Posted by roondar

Your article leaving out the initial 486 problems, or the FDIV problems (or, if you want non-FPU problems the much more common Pentium F0 0F bug -which crashes any OS that doesn't include a workaround- and the likewise OS-killing 286 popf bug) while harping on about the 68000 clr 'bug' and the 68LC040's bug shows very clearly that you are not all that interested in an unbiased look at these processors, you just want your side to 'win' and will simply leave out any inconvenient facts that happen to show that reality isn't as black and white. CPU bugs are part of any CPU architecture, Intel's x86 is not an exception and has had a few big/impactful ones.

POPF bug is actually very difficult to get so I can't call it too interesting. Try to find an example when such a bug can occur. F0 0F bug has nothing common with FDIV bug and it is almost impossible to get during common programming. It also relates to later processors.

Quote:

Originally Posted by roondar

I'm glad you agree with me on that point - Intel was not a big deal in high end systems until much later and 68k becoming less popular in that arena didn't end up getting Intel a higher market share in that segment.

It is quite possible but we need the real data for this to be sure.

Quote:

Originally Posted by roondar

In other words, you actually agree that what I wrote is accurate - the 68040 is faster than the 486 clock for clock, both in integer and FP (perhaps not by much in some rare cases, but it still is). Which still means that your original statement about Motorola lacking talent is, well, nonsense.

68040 was very rare. I am almost absolutely sure that ppl at EAB never met it in the 90s. Thus we have only to believe published benchmarks which show that 68040 is about 25% faster than 486 with integer and some FP calculations. But don't miss a fact that 68040 is terribly slow (3-4 times at least) with some common FP functions which missed its FPU. It was initially the first point that made it rather unattractive because 68030+68882 could notably outperform 68040 with quite common FP tasks.

Quote:

Originally Posted by roondar

68000 is faster than 8086. Also, 68000 can in fact be coupled to 68881 (for instance, see: http://amiga.resource.cx/exp/alphatronfpu) and this combination, while admittedly extremely rare, is highly, highly likely to be faster than a 8088+8087 given that the 68000 is much faster than 8088 and 68881 is much faster than 8087.

68881 appeared only in 1984 and it was a bit odd to add it to 68000 instead of 68020. My article clearly claims that 68000 is faster than 8086. Why do you write about it several times? Almost every IBM PC compatible computer had a socket for 8087. So it was easy for everybody to get FP performance much higher than with 68000.

Quote:

Originally Posted by roondar

68020 is faster than 80286 clock for clock and was available at higher clock speeds. 68030 is faster than 80386 clock for clock and was available at higher clock speeds.

My article claims absolutely the same. And again, what the reason to repeat it?

Quote:

Originally Posted by roondar

For both 68020 and 68030 you are wrong in the claim the the 80286 is faster at 'some tasks'. No benchmark or real life results show this. In fact, most benchmarks show the 68020 to be roughly comparable to the 80386 at the same clock speeds and the 68030 to be clearly faster than the 80386.

80286 has much faster division, EA calculation and maybe some more things. Arithmetic and move instructions have almost the same speed. So IMHO 68020 can easily beat 80286 only with 32-bit calculations but with 8- and 18-bit data it depends on benchmark. So it is you who is rather wrong in this case.

Quote:

Originally Posted by roondar

68060 is faster than Pentium clock for clock, but Pentium was available at higher clock speeds.

68060 is an exotic almost non-existed being. Why to mention such a rarity?! Nothing is known clearly about it. Even Intel 860 had much more of popularity.

Quote:

Originally Posted by roondar

While I do think those are cool pictures, that page contains no pricing info at all. What I really wanted to see was outside evidence (such as adverts etc) as personal memory is often wrong (for everyone, not just you

).

Also, the price you give for case/psu/fdd/mouse together is $80 combined, which sounds extremely low for 1992. Lastly, this system does not actually have a sound card - which would add about $100 again nor does it include MS DOS or Windows, which IIRC also cost something around $100 at the time (though finding prices for MS DOS has proved challenging so I'm not fully sure on that one).

So your actual example PC, excluding the HDD, but including soundcard -for parity with the A1200- is about $800 (or $700 without sound card & up to around $900 with OS/soundcard). And that is supposing all your numbers are in fact accurate, which I can't verify (other than I'm pretty certain that a case+PSU by itself used to cost about $100 so $80 for case/psu/fdd/mouse sounds far to low) - I'm still hopeful some advert or something can be dug up as this is a fairly interesting point.

But even your price estimate is clearly a good deal more expensive than the A1200 was (by 33-50% depending on OS/soundcard cost) and as I've pointed, the A1200 is fairly comparable to such a machine and absolutely outperforms it on several tasks.

Should I scan my bill for you? It was a special agreement, all text in Russian. BTW I paid for prices of April, thus, to the end of 1992 when A1200 was released prices should have been lower. Thus it was quite possible to get 386dx-25 for maybe $320. The prices could also be even generally lower because there were only a few computer selling companies in Moscow that time. Therefore it is quite possible that the best price at the world computer market for 386dx-25 was even below $300 to the end of 1992.

BTW I bought Sound Blaster 16 in the beginning of 1993. IIRC it was slightly above $100 but it was possible to buy a quite good and much cheaper 8-bit Sound Blaster card. Thus we get about $900 mentioned by you but if I had bought my PC in the end of 1992 I would have paid rather less than $800.

Quote:

Originally Posted by roondar

I might be missing something here, but 4x64 does not make 384

The segments were the best idea for the late 70s. 68k was always misleading by VAX-architecture and empty non-practical theories. 386 has 6 segments, 6*64 gives 384 KB.

Quote:

Originally Posted by roondar

That is not the point, the point is that the x86 ISA was 'supplanted' by different, albeit compatible (though only because they kept 'real mode' even when it no longer was useful in any way), architectures several times and Intel's primary motivation for doing so was to progress away from the inadequate design of the base 8086. This is not at all like the 68010 onwards, where most changes were fairly minor.

68000 was not good for mass PC until at least 1984. Even sir Clive Sinclair chose 68008. Alan Sugar was making a good business with z80-based CPC and PCW for period from 1984 to 1991 - almost the same as Amiga timeline. 8088 won quite fairly.

Quote:

Originally Posted by roondar

Case in point: the incompatibilities between 68000 and 68020 are not as big as you make them out to be, other than caches impacting self modifying code (which you can disable and impact self modifying code on x86 just as badly) the only real one is the move from sr instruction and that is simply not that commonly used. There is a much rarer one involving idiots who used the upper byte of an address register to store information despite all documentation clearly stating you should never, ever, ever do this. Honestly, I'm of the opinion such code deserves to crash and burn.

I don't find any wrong in reading SR. x86 allows this, 68k tried to follow theoretical opinions which mean nothing in practice and created artificial annoying incompatibility.

Quote:

Originally Posted by roondar

Likewise, while x86 is known for being compatible, quite a few programs fail to run as desired on a 286 onwards even if the CPU is running in real mode. This problem was so big there is even a special hardware based solution on most PC's up until the Pentium. More so, protected mode by itself is wholly incompatible with real mode and this caused all sorts of woes for DOS users back in the day where the resulting mess of memory drivers and DOS extenders meant that software regularly just wouldn't work unless you had special boot up floppies to make it compatible. It also affected non-DOS OS's - upgrading from a 8086 Unix PC to a 386 Unix PC would mean all 8086 applications you had would simply not work any more (due to the Unix on 386 machines running in protected mode) without needing recompilation. And remember, prior to Linux taking of in the late 1990's, you did not usually get the source code.

You exaggerated things very much. Almost all DOS program work quite well with 286 and higher CPUs. The only big problem is the ultra fast speed of more newer hardware, old software sometimes can't work at such speed. There are no problems with the protected mode since 386. It's easy to enter it and leave it, you can have even possibility to work in the real mode having advantages of protected. Indeed x86-64 is rather completely incompatible with the old good real mode but, though IMHO it is not good, it is rather unimportant today.

Quote:

Originally Posted by mc6809e

Lots of data from Intel concerning the 8086 vs 68000 vs 80286:
http://www.bitsavers.org/components/...port_Oct82.pdf

Thank you for interesting data. I have heard about such benchmark results but never met them.

Quote:

Originally Posted by meynaf

I don't usually control all digits, but as long as it starts with 31415926 and there is a bunch of 9 a little before 800th position, it's probably right

It is the same but in a piece of poetry way - http://www.cadaeic.net/naraven.htm

Quote:

Originally Posted by Don_Adan

Code:

		exeobj
		bopt x+,f+,c+,O+,wo-,OG+,OT+,ODf+,ODg+
		MC68020
	output "ram:spigot2"

What cross-assembler do you use? It looks like vasm can't work with this source.

20 November 2018, 12:39	#845
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	I know how to save two more bytes: Hoist the "moveq #lf-zero,d1" instruction up before the "move.l d4,d2". Adjust space before/after 'zero' so that the 'lf' offset becomes 14. Adjust the other offset according to what you need. Then you can replace "sub.w #14,d7" with "sub.w d1,d7". Or if the values are negative then flip it around to use add.w. Last edited by NorthWay; 20 November 2018 at 13:22.

20 November 2018, 13:57	#847
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	Exe files are a multiple of 4 in size... but the code size is what it is anyway. And 18(17 - the 0 can be implied (i.e. 141), but it becomes a multiple of 2 anyway) of those bytes are required and platform-agnostic (unless you got lucky and could use parts of the string as code/data for something). Last edited by NorthWay; 20 November 2018 at 14:52. Reason: Too optimistic.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Any software to see technical OS details?	necronom	support.Other	3	02 April 2016 12:05
2-star rarity details?	stet	HOL suggestions and feedback	0	14 December 2015 05:24
EAB's FTP details...	Basquemactee1	project.Amiga File Server	2	30 October 2013 22:54
req details for sdl	turrican3	request.Other	0	20 April 2008 22:06
Forum Details	BippyM	request.Other	0	15 May 2006 00:56

19 November 2018, 21:46	#843
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	I tried to find ways to make A3 and A4 identical, but failed. I moved around registers so the code could jump into the tail check, but I couldn't find any use for it. I feel there isn't much more to gain here.

20 November 2018, 13:23	#846
roondar Registered User Join Date: Jul 2015 Location: The Netherlands Posts: 3,406	That'd be 178 bytes executable/142 bytes of code if I'm calculating it correctly, right? That's a pretty darned good result

20 November 2018, 17:09	#849
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	You can gain 4 more bytes not using a4 at all but sp instead. But surely someone would say that you are cheating because you have to use a non-minimum AmigaOS stack

21 November 2018, 12:27	#857
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	I think you can save another 2 bytes: Add after "lf dc.b 10,0": "const dc.w 2+truc*2,truc,10000,truc,msg0-zero,2000" then instead of "lea zero(pc),a3" do "lea const(pc),a3" "movem.w (a3)+,a4/d7/d3/d2/d1/d0" "add.l a3,a4". The fill loop uses d0 instead of an immediate. Remove the opcodes that set a4/d7/d3/d2/d1 before the loops. movem.w sign expands and we got lucky here that we could find 6 immediates so the lf offsets stays at 14.

21 November 2018, 16:58	#859
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	uh, just noticed that the amiga executable this way becomes 172 bytes, the same as the headerless .COM (header inflation costs 25% for this small routine, hence iniquity is insane..)

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)