I am seeking a way to Amiga 1200... - Page 2

litwr · 03 February 2017, 18:19

Quote:

Originally Posted by meynaf

But they are not the same. Arithmetic shift sets overflow bit, where logical shift does not.

This is a big oddity for me to keep a special instruction (LSL) which sets OF always to 0.

I don't like Intel's DIV because it doesn't set any flags. However Intel's SHL/SAL is perfect.

Quote:

Originally Posted by meynaf

REP+MOV are 2 instructions. MOVE+DBF are 2 instructions. No advantage on x86, apart that you can't tell the direction by reading the code and that pushing data does pre-decrement while DF=1 does post-decrement and is ill-suited for creating stacked data.

And Moto's ISA can do ADD.W (A0)+,D0 while Intel's ISA just can't.
We can even do MOVE.B (A0)+,-(A1) and x86 can't.
Of course 68k can move memory to memory with any addressing modes.
Sure you never wrote a disassembler for x86 (who could do that anyway ?).

Branches (DBF) are slower than a repeat prefix. Indeed Moto's ISA has their advantages but DF just works. You look for a good readability of ML but we passed the 70s long ago.

Modern architecture may execute several dozen instructions at once - it is beyond any human to write such codes with good efficiency. There are a lot of good disassemblers. Study more and you will write yet another one.

Quote:

Doesn't say what happens when an interrupt occurs.
On 68k this is simple. We swap stacks if previously in user mode, then address, sr and eventual extra data gets pushed on stack, then we go to supervisor mode, then we read the vector and jump there. Very easy. No change whether we're using mmu or not.
Now on x86 with protected mode ???

The second stack is an ambiguity. It is my humble opinion based on some experience. One stack is enough for everything. Without MMU your example shows very unsafe work, every program may corrupt any stacks. So DOS has hundred thousand of applications which used by hundred millions of users without any kind of problems you are trying to induce. DOS crash rate is not above Amiga's or Atari ST's.
x86 in protected mode uses gates. Trap and interrupt gates use common stack, task gates are a bit complicated but they are absolutely safe.

Quote:

Meh. Intel registers are not general purpose and this kills any benefit you could have.

Is it a joke? All x86/x86_64 registers maybe used for data and addresses. There are exceptions for IP and SP. Some registers have also special functions.

Quote:

But not :
MOV EAX,[ESP+120]
MOV EAX,[BP]
... or i missed something with this damned sib byte.

Yeah, something is missed.
MOV EAX,[ESP+120] ;8B 44 24 78
MOV EAX,[EBP] ;8B 45 00

Quote:

Yeah, and facing many bugs in x86 world as well, including operating in systems and drivers.

Excuse me but this is close to a kind of a crazy state.

Quote:

Some languages such as cobol used bcd quite a lot. So it had some sense.

I had no opportunity to use Cobol but I read about it and find some of its features very attractive. BCD type is not among them. BCD is the dawn of computer technology like a flint to get fire. Operations with BCD are much slower than with binaries. They have only one advantage - faster decimal input/output. If decimal data are strictly constant then BCD is a good choice.

Quote:

Your above code does NOT get 5th bit. You need TEST AL,020H for this - and the other flags are enough.
You can get 5th bit of AL with BTS instruction.
68k has BTST instruction.
Therefore, parity bit is useless.

TEST AL,0A0H
JS label_1 ;bit 7 is set
JPO label_2 ;bit 5 is set

label_1
JPE label_3 ;bit 5 is set
...

BTST (like BT at x86) can work with one bit only.

Quote:

Wrong. 68020 can actually do this operation with LEA (A2,D3.L*8),A1 - and 80386 can probably do something similar.
Now do a division on the ARM. Oh, well. It doesn't even have it...

The fact individual instructions can do a lot does not mean this whole bunch is really useful. Try to write a big enough program with ARM and you will see the opposite situation.
Btw do you have ARM assembly for your pi calculation program ?

Look at BBC Micro folder. Your LEA example requires address registers.

ARM may do the calculation with data registers too.

ARM may even do R1 <- R2*64 - R3 affecting flags or not.
Pi-spigot is based mostly on division. Division occupies more than 60% of execution time. However ARM1 without hardware division faster 68020. So it is evident that ARM with the code without BCD and division is about 50% faster than 68020.

litwr · 03 February 2017, 18:32

Quote:

Originally Posted by meynaf

In matter of BCD, 6502 got it wrong. Code becomes unreadable because you never know if your ADC/SBC is executed bith "D" bit set or not.

But 6502 proposes the fastest way to work with BCD. IMHO it would be better to ignore BCD at all and to add one or two useful instructions. BCD is a flaw of 6502. Its another flaw is the overflow flag, it should be ignored too. All other components of 6502 are almost perfect.

Quote:

I could write code for an example in 68k and someone else does x86 or arm, to see if and how 68k is superior to x86 and arm (or, for that matter, to anything else). Any sample of significative size (20-40 instructions) doing some useful work is ok.
For example I can do that pi-spigot main loop in just 9 instructions. I don't think x86 can do that. Nor do i think arm can. And this, while it's too short to show much anyway.

Please start a new thread, show 9 instructions pi-spigot, ..

meynaf · 03 February 2017, 20:09

Quote:

Originally Posted by litwr

This is a big oddity for me to keep a special instruction (LSL) which sets OF always to 0.

It's less stupid than keeping a pair of exactly identical instructions (SHL/SAL)

On some 68k (like 68020) it's faster than ASL - because it does not have to compute V bit.

Quote:

Originally Posted by litwr

I don't like Intel's DIV because it doesn't set any flags. However Intel's SHL/SAL is perfect.

Intel's SHL/SAL is 2 opcodes for same operation, so it's not perfect - it's just plain stupid.
Note that it's not the only case when two operations are exactly the same thing.

Quote:

Originally Posted by litwr

Branches (DBF) are slower than a repeat prefix.

Not with dual issue pipelines. Prefixes are also more difficult to decode.
And DBcc is also more versatile, can do things such as DBCS (with carry), which x86 prefixes can't.

Quote:

Originally Posted by litwr

Indeed Moto's ISA has their advantages but DF just works.

Except that sometimes you will forget to set DF correctly and get random bugs depending on where the code is called from...

Quote:

Originally Posted by litwr

You look for a good readability of ML but we passed the 70s long ago.

68k has good readability of ML.

Quote:

Originally Posted by litwr

Modern architecture may execute several dozen instructions at once - it is beyond any human to write such codes with good efficiency.

Not true. Modern architecture with strong OoO does not need much special care.
And compilers are very poor at this task anyway (always have, always will).

Quote:

Originally Posted by litwr

There are a lot of good disassemblers. Study more and you will write yet another one.

On x86 no two disassemblers will give a result that match the docs. Too many ambiguous encoding cases.

Quote:

Originally Posted by litwr

The second stack is an ambiguity.

No.

Quote:

Originally Posted by litwr

It is my humble opinion based on some experience.

What experience ?

Quote:

Originally Posted by litwr

One stack is enough for everything.

No.

If a user program has full stack (so pushing something more will trigger protection fault) and then an interrupt comes, what happens with single stack ? Crash in supervisor mode !

Even better. With an MMU in paged mode, it simply can't work anymore.
There you *NEED* a new stack pointer as the old one goes in memory that's remapped user land. Even x86 appears to be changing stack for this. At least 68k has consistent behavior in protected and not protected environments.

Quote:

Originally Posted by litwr

Without MMU your example shows very unsafe work, every program may corrupt any stacks.

My example isn't unsafe. You just didn't understand it.

Quote:

Originally Posted by litwr

So DOS has hundred thousand of applications which used by hundred millions of users without any kind of problems you are trying to induce.

I'm not trying to induce problems.

Quote:

Originally Posted by litwr

DOS crash rate is not above Amiga's or Atari ST's.

DOS doesn't multitask.
It took Windows decades to be stable enough for normal use.

Quote:

Originally Posted by litwr

x86 in protected mode uses gates. Trap and interrupt gates use common stack, task gates are a bit complicated but they are absolutely safe.

So x86 in protected mode works very differently than normal mode. Same code can run on 68k with and without MMU, even supervisor code.

Quote:

Originally Posted by litwr

Is it a joke? All x86/x86_64 registers maybe used for data and addresses. There are exceptions for IP and SP. Some registers have also special functions.

No, it's not a joke. Many operations can only use (E)AX. You can't use anything but CL for counting shifts, etc.
You can't do byte operations on ESI, EDI, you can't use increment/decrement address mode on EAX, etc.
The list is very long.

Quote:

Originally Posted by litwr

Excuse me but this is close to a kind of a crazy state.

Don't you remember your last BSOD ?

Quote:

Originally Posted by litwr

I had no opportunity to use Cobol but I read about it and find some of its features very attractive. BCD type is not among them. BCD is the dawn of computer technology like a flint to get fire. Operations with BCD are much slower than with binaries. They have only one advantage - faster decimal input/output. If decimal data are strictly constant then BCD is a good choice.

If i recall correctly, Cobol ONLY operates on such datatypes...

Quote:

Originally Posted by litwr

TEST AL,0A0H
JS label_1 ;bit 7 is set
JPO label_2 ;bit 5 is set

label_1
JPE label_3 ;bit 5 is set
...

BTST (like BT at x86) can work with one bit only.

That's just one instruction removed. Is a flag worth keeping for so little use ?
We can test 4 bits with MOVE D0,CCR + BNE/BCS/BVS/BMI.

Quote:

Originally Posted by litwr

Your LEA example requires address registers.

That's what i told you, address registers are also useful for computations.

Quote:

Originally Posted by litwr

ARM may do the calculation with data registers too.

Of course as it does not have address registers.

Quote:

Originally Posted by litwr

ARM may even do R1 <- R2*64 - R3 affecting flags or not.

... which is a totally useless operation.

But ARM can't do :
add.w d0,(a0)+
... which is a lot more useful.

Or even :
eori.l $80808080,(a1)+

In fact ARM can't really work with 16-bit words. It has to use extra instructions for load/store (can't operate in memory). Can't access misaligned data. Etc.

Quote:

Originally Posted by litwr

Pi-spigot is based mostly on division. Division occupies more than 60% of execution time. However ARM1 without hardware division faster 68020. So it is evident that ARM with the code without BCD and division is about 50% faster than 68020.

But that ARM1 didn't run at 14Mhz and no, it is not faster - than something that's just an estimate and DIV timing is very variable.

Quote:

Originally Posted by litwr

But 6502 proposes the fastest way to work with BCD.

No it does not. You have to use an extra instruction (SED) before. 68k does BCD directly.

Quote:

Originally Posted by litwr

IMHO it would be better to ignore BCD at all and to add one or two useful instructions. BCD is a flaw of 6502. Its another flaw is the overflow flag, it should be ignored too. All other components of 6502 are almost perfect.

6502 has irregular addressing modes. Sometimes they are available, sometimes not.
It has way too few registers. It doesn't even have proper ADD and SUB instructions (must CLC/SEC before).
Etc.
So no, other components aren't "almost perfect".

6502 has very few transistors and remarkable instruction timings. And that's all for its qualities.

Quote:

Originally Posted by litwr

Please start a new thread, show 9 instructions pi-spigot, ..

Would you post code if I start a new thread ? If not, why should I, as you're alone here defending the undefendable ?

For spigot 9-instruction main loop it's like this :

Code:

.loop
 mulu.l d0,d5
 move.w (a3),d6
 mulu.w d3,d6
 add.l d6,d5
 divul.l d4,d6:d5
 move.w d6,(a3)+
 subq.l #2,d4
 subq.w #1,d0
 bne .loop

litwr · 04 February 2017, 19:18

I want to clarify my position. I like Amigas. A500 sound made me a happy man at the 1989-90. I was sure that 68000 with 32-bit registers is much better than even 80286. However my first attempts to write programs in 68000 assembler showed that 68000 registers are a bit slow and all its ISA is a bit clunky. With 80286 @12Mhz I could get lesser and much faster code than the code for 68000 @7.1MHz. So I had to sell my A500 and bought PC AT. A1200 at 1992 was good but PC 386DX @25MHz with HDD, SuperVGA and Sound Blaster 16 showed much better choice for me at 1993. So I missed 68020 programming. I can note today that 68020 is quite capable to match 80386 but at the same frequency only. Motorola added as usual some bulky instructions to 68020...
68030, 68040, 68060 are just names for me. IMHO Motorola might evade unnecessary complexity and gain more frequency and speed. But they were wrong since they rejected the speed champion 6502.
I forget to note yet another Moto's oddity. It is DBRA which stops at the value -1. I can write MOV CX,5 L ... LOOP L and get a loop for 5 times, I should write MOV 4,R0 L ... DBRA R0,L for 5 time loop.

Quote:

Originally Posted by meynaf

It's less stupid than keeping a pair of exactly identical instructions (SHL/SAL)

Are you ok? I repeat the third time. SAL and SHL are not different, there is one opcode for the both.

Quote:

Originally Posted by meynaf

On some 68k (like 68020) it's faster than ASL - because it does not have to compute V bit.
Intel's SHL/SAL is 2 opcodes for same operation, so it's not perfect - it's just plain stupid.
Note that it's not the only case when two operations are exactly the same thing.

Quote:

Not with dual issue pipelines. Prefixes are also more difficult to decode.
And DBcc is also more versatile, can do things such as DBCS (with carry), which x86 prefixes can't.

Prefixes are faster.

There are also LOOP, LOOPNE, LOOPE. However with modern CPU DEC CX JNZ is much faster than LOOP. Use DEC CX JC JNE instead of DBCS. Indeed, DBCS looks better.

Quote:

Except that sometimes you will forget to set DF correctly and get random bugs depending on where the code is called from...

It is quite easy with Linux. If a program calls some system service with DF=1 (or DF=0) than the program crashes.

If somebody forgets to set DF correctly then it will be better for all that he would be far away from programming. DF is the first thing to check when a string instruction is used. You could even say about a case when somebody forgets that 2x2=4.

I am really sad that you use so poor argumentation. 680x0 have their advantages but 80x86 have them too.

Quote:

68k has good readability of ML.

Could you decode 0A 05 00 30 10 C5 4E 75? If you are capable for this task then I express my deep admiration to this archaic and circus like capability.

Quote:

Not true. Modern architecture with strong OoO does not need much special care.
And compilers are very poor at this task anyway (always have, always will).

Did you try to fight with optimizing compiler? It is possible for the small sized code only and takes a lot of health. I made a small test - http://litwr2.atspace.eu/art-pls/com...sources-e.html - it shows that plain assembly programming gets codes up to 2 times slower than C/C++ optimizing compiler.

Quote:

On x86 no two disassemblers will give a result that match the docs. Too many ambiguous encoding cases.

This shows that you have very small experience with x86. What is wrong with gdb? I also use edb. Even old DOS debugger maybe useful.
I have to add some information about stack. A lot of modern x86 program may use unlimited stack technology. This allows, for example, the unlimited recursion.

All this usefulness is based on one hardware stack.

Quote:

My example isn't unsafe. You just didn't understand it.

You disagree that without memory protection any program may corrupt the system stack?

Quote:

DOS doesn't multitask. It took Windows decades to be stable enough for normal use.

DOS is partially multitask. It may have a lot of TSR, drivers, ... My experience with Microsoft Windows 3.11 and 95 shows that they were as safe as DOS. Linux, Microsoft Windows XP/10 are safer.

Quote:

So x86 in protected mode works very differently than normal mode. Same code can run on 68k with and without MMU, even supervisor code.

No, it is not the point. Try to see pi-spigot sources in assembly for the heavily protected Linux. They are almost the same as for DOS. There are some problems for OS writers. I made a hobby bootstrapper to the long mode (x86_64), it was not difficult.

Quote:

No, it's not a joke. Many operations can only use (E)AX. You can't use anything but CL for counting shifts, etc.
You can't do byte operations on ESI, EDI, you can't use increment/decrement address mode on EAX, etc.
The list is very long.

It is possible to use SIL or DIL 8-bit registers. There are some limitations, of course. However, it is common knowledge that Intel x86 registers are general purpose. Several exceptions can't change the definition. As I noted earlier Motorola's fundamental flaw is attempts to use concepts instead of a practical approach. The concepts of two carries, address registers, or DBRA (-1) look poor for me.

Quote:

Don't you remember your last BSOD ?

Guru meditation?

I'm using Linux since 1998 so I missed so tragic messages.

Quote:

That's just one instruction removed. Is a flag worth keeping for so little use ?
We can test 4 bits with MOVE D0,CCR + BNE/BCS/BVS/BMI.

That's just one instruction removed.

I can count even two. MOVE D0,CCR can't check any bits so as SAHF (for 4 bits too). My example allows to check any bit + the 7th bit.

Quote:

That's what i told you, address registers are also useful for computations.

Yes, but they will be more useful if they were GP.

Quote:

Of course as it does not have address registers.

Any register of ARM maybe used as an address register.

Quote:

... which is a totally useless operation.

Why? Please clarify you point. RSB is useful for me, look at pi-spigot sources.

Quote:

But ARM can't do :
add.w d0,(a0)+
... which is a lot more useful.

Or even :
eori.l $80808080,(a1)+

In fact ARM can't really work with 16-bit words. It has to use extra instructions for load/store (can't operate in memory). Can't access misaligned data. Etc.

Yes, but it is fast and cold. These are the main features of CPU. Motorola gave up because their CPU were slow and hot. It is sad. Moto's CPU have several attractive features.

Quote:

But that ARM1 didn't run at 14Mhz and no, it is not faster - than something that's just an estimate and DIV timing is very variable.

pi-spigot shows that ARM @8MHz can outperform 68020 @20MHz.

Quote:

No it does not. You have to use an extra instruction (SED) before. 68k does BCD directly.

If we have BCD arithmetic then it implies a lot of computations. SED is just one instruction in the beginning of such computation. CLD should be in the ending. If we add more instructions (MUL, DIV, ...) then we do not need the new opcodes for them in BCD mode unlike Moto's ISA. However, Motorola is better than Intel in this position but Motorola misses BCD div and mul - Intel doesn't miss them.

Quote:

6502 has irregular addressing modes. Sometimes they are available, sometimes not.
It has way too few registers. It doesn't even have proper ADD and SUB instructions (must CLC/SEC before).
Etc.
So no, other components aren't "almost perfect".

It was very easy to learn. it is much easier than 68020.

And it was the speed champion, the terminator. He was too good for us... IMHO if MOS technology could survive then the modern computers would be at least 2-3 times faster.

Quote:

Would you post code if I start a new thread ? If not, why should I, as you're alone here defending the undefendable ?

You may try. I can't guarantee often participation. What do you lose anyway? What should I defend? I am just curious to meet with nice coding.

Quote:

For spigot 9-instruction main loop it's like this :

It is a total disappointment.

You changed

Code:

         sub.l d6,d5
         sub.l d7,d5
         lsr.l d5

by mulu.l d0,d5 - it looks slower even for 68040. I am not sure about 68060.

idrougge · 04 February 2017, 20:20

The fact that x86 code is compact should not be disputed, but if I were writing a compiler, I would stay very far away from that architecture, especially in its incarnations that were concurrent with the 68k line — guess why virtually no-one outside of IBM ever used Intel processors for new architectures. Yes, there is a distinction between address and data registers, but otherwise the 68000 was the most orthogonal design of its time. Just like if you compare the 6502 to the Intel/Z80 CPUs.

Thorham · 05 February 2017, 01:40

Quote:

Originally Posted by litwr

So I had to sell my A500 and bought PC AT.

You sold your Amiga500 for a 286 peecee? Really? Horrible

Quote:

Originally Posted by litwr

A1200 at 1992 was good but PC 386DX @25MHz with HDD, SuperVGA and Sound Blaster 16 showed much better choice for me at 1993.

Peecees back then sucked big time, better performance or not. Nowadays they're pretty nice, but back then? No, just no.

Quote:

Originally Posted by litwr

I forget to note yet another Moto's oddity. It is DBRA which stops at the value -1. I can write MOV CX,5 L ... LOOP L and get a loop for 5 times, I should write MOV 4,R0 L ... DBRA R0,L for 5 time loop.

That's a human oddity. Humans start counting at 1 instead of 0. It's what we're used to. This is similar to arrays starting at 0 instead of 1.

meynaf · 05 February 2017, 09:58

Quote:

Originally Posted by litwr

Motorola added as usual some bulky instructions to 68020...

Look at the horrors that got added to x86. More than 1,000 instructions in total. And you dare to say 68020 is bulky !

Quote:

Originally Posted by litwr

68030, 68040, 68060 are just names for me. IMHO Motorola might evade unnecessary complexity and gain more frequency and speed. But they were wrong since they rejected the speed champion 6502.

6502 can't scale up to 32 bits. 65c816 was expanded to 16 and didn't end up as a speed champion at all.

Quote:

Originally Posted by litwr

I forget to note yet another Moto's oddity. It is DBRA which stops at the value -1. I can write MOV CX,5 L ... LOOP L and get a loop for 5 times, I should write MOV 4,R0 L ... DBRA R0,L for 5 time loop.

It's because you can then do (in case loop counter is zero, i can't imagine what your "LOOP" instructions will trigger) :

Code:

 bra .next
.for
; do something here
.next
 dbra d0,.for

It's sometimes useful for other reasons, too.

That said, it's true i'd really like to have both options.

Quote:

Originally Posted by litwr

Are you ok? I repeat the third time. SAL and SHL are not different, there is one opcode for the both.

I am ok. There are TWO encodings for this instruction. It's not just two names.

Quote:

Originally Posted by litwr

It is quite easy with Linux. If a program calls some system service with DF=1 (or DF=0) than the program crashes.

If somebody forgets to set DF correctly then it will be better for all that he would be far away from programming. DF is the first thing to check when a string instruction is used. You could even say about a case when somebody forgets that 2x2=4.

I am really sad that you use so poor argumentation. 680x0 have their advantages but 80x86 have them too.

Ok so now it's an extra useless instruction you have to add every time you want to use increment/decrement modes.

Quote:

Originally Posted by litwr

Could you decode 0A 05 00 30 10 C5 4E 75? If you are capable for this task then I express my deep admiration to this archaic and circus like capability.

Misunderstanding ? I was speaking about readability at source level, of course

Anyway i can fire a disassembler and get this in seconds :

Code:

eori.b #$30,d5
move.b d5,(a0)+
rts

For x86 i can't do that.

Quote:

Originally Posted by litwr

Did you try to fight with optimizing compiler? It is possible for the small sized code only and takes a lot of health. I made a small test - http://litwr2.atspace.eu/art-pls/com...sources-e.html - it shows that plain assembly programming gets codes up to 2 times slower than C/C++ optimizing compiler.

You failed to write suitable asm because x86 sucks.
On 68k I beat GCC by a factor of 4.

Quote:

Originally Posted by litwr

This shows that you have very small experience with x86. What is wrong with gdb? I also use edb. Even old DOS debugger maybe useful.
I have to add some information about stack. A lot of modern x86 program may use unlimited stack technology. This allows, for example, the unlimited recursion.

All this usefulness is based on one hardware stack.

You clearly didn't try to make complete opcode map from disassembler output. I tried. It failed.

Quote:

Originally Posted by litwr

You disagree that without memory protection any program may corrupt the system stack?

Yes. When running user programs the system (supervisor) stack should be empty so you could write whatever you want in it, without any corruption.

Quote:

Originally Posted by litwr

DOS is partially multitask. It may have a lot of TSR, drivers, ... My experience with Microsoft Windows 3.11 and 95 shows that they were as safe as DOS. Linux, Microsoft Windows XP/10 are safer.

And now DOS multitasks and Windows 95 is safe. It's becoming worse every day.

Quote:

Originally Posted by litwr

No, it is not the point. Try to see pi-spigot sources in assembly for the heavily protected Linux. They are almost the same as for DOS. There are some problems for OS writers. I made a hobby bootstrapper to the long mode (x86_64), it was not difficult.

While they may look similar, the same binary code does NOT work on both.

Quote:

Originally Posted by litwr

It is possible to use SIL or DIL 8-bit registers. There are some limitations, of course. However, it is common knowledge that Intel x86 registers are general purpose. Several exceptions can't change the definition. As I noted earlier Motorola's fundamental flaw is attempts to use concepts instead of a practical approach. The concepts of two carries, address registers, or DBRA (-1) look poor for me.

It's common knowledge that x86 is a total mess and completely lacks orthogonality.

Quote:

Originally Posted by litwr

That's just one instruction removed.

I can count even two. MOVE D0,CCR can't check any bits so as SAHF (for 4 bits too). My example allows to check any bit + the 7th bit.

In your example too, that's just one instruction removed.

Quote:

Originally Posted by litwr

Yes, but they will be more useful if they were GP.

Sure but they're quite a lot better than just having 8 regs.

Quote:

Originally Posted by litwr

Any register of ARM maybe used as an address register.

On 68020 too.

Quote:

Originally Posted by litwr

Why? Please clarify you point. RSB is useful for me, look at pi-spigot sources.

ARM can do a lot of things that are not really useful. Sure, sometimes you will use some of its crazy features but most of the time you will have normal simple addressing mode, shift of zero and condition = always.

Quote:

Originally Posted by litwr

Yes, but it is fast and cold. These are the main features of CPU. Motorola gave up because their CPU were slow and hot. It is sad. Moto's CPU have several attractive features.

A1200's 68020 really is cold, you know. So that old ARM gives no benefit.
68060 doesn't heat too much either.

Quote:

Originally Posted by litwr

pi-spigot shows that ARM @8MHz can outperform 68020 @20MHz.

ARM is fully pipelined. Not 68020. You're comparing apples and oranges.
68060 @8Mhz would outperform ARM @20Mhz.

Quote:

Originally Posted by litwr

If we have BCD arithmetic then it implies a lot of computations. SED is just one instruction in the beginning of such computation. CLD should be in the ending. If we add more instructions (MUL, DIV, ...) then we do not need the new opcodes for them in BCD mode unlike Moto's ISA. However, Motorola is better than Intel in this position but Motorola misses BCD div and mul - Intel doesn't miss them.

I don't know what AAM and AAD do exactly, but perhaps it's easy to do with other instructions.

Quote:

Originally Posted by litwr

It was very easy to learn. it is much easier than 68020.

And it was the speed champion, the terminator. He was too good for us... IMHO if MOS technology could survive then the modern computers would be at least 2-3 times faster.

Sorry, but 68000 is faster than 6502.
And again 6502 is limited to 8 bits so it can't be a speed champion anymore. As you can NOT expand it to 32-bit without turning it into an horror (and as said before 65c816 is already not nice).

Quote:

Originally Posted by litwr

You may try. I can't guarantee often participation. What do you lose anyway? What should I defend? I am just curious to meet with nice coding.

Ok, ok. Thread opened.
Now prepare your code samples

Quote:

Originally Posted by litwr

It is a total disappointment.

You changed

Code:

         sub.l d6,d5
         sub.l d7,d5
         lsr.l d5

by mulu.l d0,d5 - it looks slower even for 68040. I am not sure about 68060.

I didn't say it was fast. I just said 9 instructions.

Thorham · 05 February 2017, 12:53

Quote:

Originally Posted by meynaf

I am ok. There are TWO encodings for this instruction. It's not just two names.

Hate to say it, but according to Intel's own docs they're the same opcode. PDF here: https://software.intel.com/sites/def...abcd-3abcd.pdf See Appendix B.2.

meynaf · 05 February 2017, 13:03

Quote:

Originally Posted by Thorham

Hate to say it, but according to Intel's own docs they're the same opcode. PDF here: https://software.intel.com/sites/def...abcd-3abcd.pdf See Appendix B.2.

Of course as they try to hide the mistake with an "official" opcode. But two really exist. Check D0 /4 and D0 /6 (D0 20 vs D0 30) - D0 or D1,D2,D3,C0,C1 - same story everywhere.

Thorham · 05 February 2017, 13:11

Quote:

Originally Posted by meynaf

Of course as they try to hide the mistake with an "official" opcode. But two really exist. Check D0 /4 and D0 /6 (D0 20 vs D0 30) - D0 or D1,D2,D3,C0,C1 - same story everywhere.

Yeah, it's true: http://ref.x86asm.net/coder32.html

meynaf · 05 February 2017, 13:16

Quote:

Originally Posted by Thorham

Yeah, it's true: http://ref.x86asm.net/coder32.html

Check notes 12 and 18 from this document as well. Same instruction, several opcodes.

Thorham · 05 February 2017, 13:20

Quote:

Originally Posted by meynaf

Check notes 12 and 18 from this document as well. Same instruction, several opcodes.

Could they just be implementation artifacts, like undocumented opcodes on the 6510?

Also, from that doc, at first glance, it doesn't seem terribly hard to write a disassembler.

meynaf · 05 February 2017, 13:38

Quote:

Originally Posted by Thorham

Could they just be implementation artifacts, like undocumented opcodes on the 6510?

Sort of. They are shortcuts that shouldn't have been taken at first place.
Note that x86 also has a few undoc opcodes, and some opcode conflicts as well (like Cyrix SIMD extensions that are now defunct, or some instructions implemented in 386/486 but not documented that you later find under another opcode, etc).

Quote:

Originally Posted by Thorham

Also, from that doc, at first glance, it doesn't seem terribly hard to write a disassembler.

Writing a disassembler that will get it right for most opcodes isn't so difficult. But many docs contain errors (or contradict another and you can't even know who's wrong), and none of them seems to be complete.

Now from that doc, tell me the opcodes for :
- instructions in Bit manipulations (BMI1 and BMI2) extensions
- instructions in Transactional extensions (TSX)

Yeah, there is always something missing.
And who knows what else simply isn't there.

Thorham · 05 February 2017, 15:49

Quote:

Originally Posted by meynaf

Sort of. They are shortcuts that shouldn't have been taken at first place.
Note that x86 also has a few undoc opcodes, and some opcode conflicts as well (like Cyrix SIMD extensions that are now defunct, or some instructions implemented in 386/486 but not documented that you later find under another opcode, etc).

Is that really a big deal? People shouldn't use undocumented opcodes anyway. Sticking to the official documentation avoids heaps of trouble. This isn't like the C64 where everything is always the same (unless I'm mistaken, wouldn't be surprised).

Quote:

Originally Posted by meynaf

Yeah, there is always something missing.
And who knows what else simply isn't there.

Won't the official docs do?

meynaf · 05 February 2017, 16:06

Quote:

Originally Posted by Thorham

Is that really a big deal? People shouldn't use undocumented opcodes anyway. Sticking to the official documentation avoids heaps of trouble. This isn't like the C64 where everything is always the same (unless I'm mistaken, wouldn't be surprised).

Official documentation isn't exactly a small document you can read in five minutes...

Quote:

Originally Posted by Thorham

Won't the official docs do?

Perhaps, provided it's possible to find one which is up to date.

litwr · 06 February 2017, 18:05

I have to expess my gratitude for this so interesting discussion. A500 was very good but a bit slow and poorly expandable. I still want to reach a genuine A1200.
@Thorham
I used my first PC to play Wolfenstein 3D, Civilization and the best Ultima VI at 1991. Dune 2 was very good at 1992.

Ultima VII was the excellent for this year too. Commodore always missed the idea of upgrade. They could make 4 MHz C64 at 1985, Amiga with 68020 at 1989, ... IMHO Commodore killed every thing which it touched to: 6502, VIC-2, C64, SFD-1001, C+4, Amiga, ...

@meynaf
I agree that 65816 is a bit wrong chip. IMHO it was a kind of the great struggle against the terminator 6502 which might terminate both Intel and Motorola. So after MOS Technology lost the battle 6502 development was left almost completely. The winners even constructed a joke about 6502 JMP () bug and made an "improved" 65C02.

IMHO 6502 might be extended to 32 bits. It has a lot of free opcodes. They might add more accumulators, etc. 65816 lost the main feature of 6502 the speed. 4510 was much better but might be made in the 70s. Try to compare the upgraded 6809 6309 and 65816. 6308 has power of 2 or 3 6809 but the speed of 65816 for 16 bit code is only 50% faster than 6502 and the same or even slower for 8-bit.
6502 was the champion to the end of the 70s. It is faster than 68000 at the same frequency.
Intel's LOOP instruction may start a loop with CX=0 that means 65536 times. No any advantage of DBRA was shown.
Sorry, it was me who was not ok.

I forgot that x86 ISA has several opcodes for the same instruction: ADD, SUB, ... SHL/SAL is among them. I have an excuse in the fact that almost all (all?) assemblers compile SAL and SHL into the same opcode. My other excuse is in the fact that Intel's 8086 manual gives one opcode too. I agree that Intel's documentation is always slightly incomplete. They don't like to write about bugs and missed instructions (like IBTS). They have a monopoly but they worked better than Motorola.
I don't find a reason in claiming STD or CLD instructions useless. They are one byte only part of a setup for a string instruction. This setup includes also setting of one or two index registers and a counter register - it is the same as at 680x0.
I have found a contradiction in your logic. You wrote about readability at source level. I gave you an example of x86_64 assembly and you wrote about the debugger level. They are not the same levels.
It looks like that GCC for 680x0 has much more poor optimizer than for x86. It is almost impossible to beat GCC at x86 by the manual assembly.
Thank you for the example for the second 680x0 stack. I missed that simple idea.

However it means little because it is about protection only a few dozen of bytes of the system stack, all other memory maybe corrupted by a wrong program. So giving 0.1% safer system by the expensive 2nd stack concept is not very good idea. It can't solve the problem which maybe solved by memory protection only. It only makes architecture more complex and bulky.
You wrote that the same x86 binary code can't work the same in three x86 modes. It is more than obvious. Indeed it creates problem to you in your debugger project...

The code orthogonality is an obsolete concept from the 60s and 70s when programmer had to use assembler for almost every task. Motorola missed that the speed, timeliness and price mean much more.
I want to have an opportunity to do something for 68060 but it is almost a legendary rarity. There are no systems with it.

However I doubt that 68060 can outperform ARM much at the same frequency.
AAM и AAD are very complex instructions but they are also archaic and useless.

meynaf · 06 February 2017, 18:36

Quote:

Originally Posted by litwr

IMHO 6502 might be extended to 32 bits. It has a lot of free opcodes. They might add more accumulators, etc.

Add just ONE more accumulator and you need TWICE the amount of opcodes for all instructions that need to use it.
Get 8 accumulators and it's *8 encoding space.
Add three sizes and it's *3 again.
Have 16 registers instead of 8, again *2.
Addressing modes can only handle byte and word address, so you'll probably need another more bit.

No, sorry, 6502 can't be extended to 32 bits. If you still believe otherwise i'd be happy if you show me the instruction encoding that this would give (because it's really a matter of encoding, which is reasonably simple on 6502 so they could use a modest logic array for the decoder).

Quote:

Originally Posted by litwr

6502 was the champion to the end of the 70s. It is faster than 68000 at the same frequency.

8Mhz 68000 (Atari ST) was more than 8x faster than 1Mhz 6502 (Oric) back in the old days, for everything i tested.

Besides, it uses double indirect modes for pointers and this is very bad for modern implementations.
In addition it is too memory driven and is totally unable to run at high frequencies.

Quote:

Originally Posted by litwr

Intel's LOOP instruction may start a loop with CX=0 that means 65536 times. No any advantage of DBRA was shown.

The fact that dbcc counts this way has been explained.
I am not for choosing, i would want an instruction for each case.

Quote:

Originally Posted by litwr

I don't find a reason in claiming STD or CLD instructions useless. They are one byte only part of a setup for a string instruction. This setup includes also setting of one or two index registers and a counter register - it is the same as at 680x0.

On 680x0 you don't have to "setup" a string instruction. You just use it, like the (impossible to do on x86) move.b (a0)+,-(a1).

Quote:

Originally Posted by litwr

I have found a contradiction in your logic. You wrote about readability at source level. I gave you an example of x86_64 assembly and you wrote about the debugger level. They are not the same levels.

Debugger level and source level are more or less the same thing, unless you have a debugger so stupid that it doesn't even contain a disassembler.

Quote:

Originally Posted by litwr

It looks like that GCC for 680x0 has much more poor optimizer than for x86. It is almost impossible to beat GCC at x86 by the manual assembly.

Who writes manual assembly in x86 anyway. If nobody tries, nobody can win...

Quote:

Originally Posted by litwr

Thank you for the example for the second 680x0 stack. I missed that simple idea.

However it means little because it is about protection only a few dozen of bytes of the system stack, all other memory maybe corrupted by a wrong program. So giving 0.1% safer system by the expensive 2nd stack concept is not very good idea. It can't solve the problem which maybe solved by memory protection only. It only makes architecture more complex and bulky.

Sorry, but the 2nd stack concept has nothing even remotely looking like "expensive", "complex", "bulky"

Ok it does not solve what memory protection does, but at least, the programming model remains unchanged regardless if there is memory protection or not.
x86 protection is big and cumbersome. Take it as a whole or leave it. 68k protection can be none, light (e.g. enforcer) or full (paged like in linux), or anywhere in between. You have the choice.

When you code in user land you don't have "two stacks". You only have one. When you do some demo or game killing the OS you run in supervisor mode and USP is just a register that has no use in these circumstances so you can ignore it.

litwr · 06 February 2017, 20:32

Quote:

Originally Posted by meynaf

8Mhz 68000 (Atari ST) was more than 8x faster than 1Mhz 6502 (Oric) back in the old days, for everything i tested.

Try to compare C64 GEOS (0.95 MHz) and Amiga 500 Workbench (7.1 MHz). I can't say that Workbench is 8 times faster. I don't know exactly the way for 6502 upgrade but I can believe that MOS Technology guys would find a way for this.

Quote:

Originally Posted by meynaf

On 680x0 you don't have to "setup" a string instruction. You just use it, like the (impossible to do on x86) move.b (a0)+,-(a1).

You have to set A0, A1 and a counter. One byte instruction is the smallest part of this.

Quote:

Debugger level and source level are more or less the same thing, unless you have a debugger so stupid that it doesn't even contain a disassembler.

I am shocked by this point! Are development and hacker levels the same?!

Quote:

Who writes manual assembly in x86 anyway. If nobody tries, nobody can win...

http://www.roguelazer.com/2015/02/beating-the-compiler/

Quote:

Sorry, but the 2nd stack concept has nothing even remotely looking like "expensive", "complex", "bulky"

Ok it does not solve what memory protection does, but at least, the programming model remains unchanged regardless if there is memory protection or not.
x86 protection is big and cumbersome. Take it as a whole or leave it. 68k protection can be none, light (e.g. enforcer) or full (paged like in linux), or anywhere in between. You have the choice.

When you code in user land you don't have "two stacks". You only have one. When you do some demo or game killing the OS you run in supervisor mode and USP is just a register that has no use in these circumstances so you can ignore it.

My point is in a fact that the partial protection means no protection at all and with the full protection two stacks are useless.

meynaf · 06 February 2017, 21:21

Quote:

Originally Posted by litwr

Try to compare C64 GEOS (0.95 MHz) and Amiga 500 Workbench (7.1 MHz). I can't say that Workbench is 8 times faster.

Again comparing what should not. WB is slow because it is not asm code (and not especially well written anyway). Atari ST's desktop is already quite faster.

Quote:

Originally Posted by litwr

I don't know exactly the way for 6502 upgrade but I can believe that MOS Technology guys would find a way for this.

MOS technology guys can't do what's impossible.
6502 upgrade was 65816 and nothing better can be done.
6502 is simple and if you "upgrade" it, it will cease to be simple. As easy as that.

Quote:

Originally Posted by litwr

You have to set A0, A1 and a counter. One byte instruction is the smallest part of this.

I don't need to "set" anything. I just use, when suitable. All string modes available every time.

Quote:

Originally Posted by litwr

I am shocked by this point! Are development and hacker levels the same?!

Misunderstanding here ? Define what you call "development level" and "hacker level".
Code as seen in a source file is very near to that of a good disassembler output.

Quote:

Originally Posted by litwr

http://www.roguelazer.com/2015/02/beating-the-compiler/

The guy is comparing horrible code with horrible code

Quote:

Originally Posted by litwr

My point is in a fact that the partial protection means no protection at all and with the full protection two stacks are useless.

With the full protection you *have* two stacks (and perhaps even more in complex x86).
My point is that protection shouldn't change anything apart some memory cells become unavailable.

But, why the heck do you insist on "two stacks" ? There is only one stack at any given time !!!

litwr · 07 February 2017, 09:01

Quote:

Originally Posted by meynaf

MOS technology guys can't do what's impossible.
6502 upgrade was 65816 and nothing better can be done.
6502 is simple and if you "upgrade" it, it will cease to be simple. As easy as that.

65C02 is a shame. 65816 is only slightly better and too late. MOS Techonology was crashed, its key figure (Chuck Peddle) became out of CPU development.

There was unfinished Synertek 6516 project - https://plus.google.com/108984290462...ts/6JeiVQrwKHi...
I have some ideas how to extend 6502. Just move zero page to CPU memory. This gives 256 z80 style registers. They maybe used as 128 16-bit registers or 64 32-bit. Add 3 8-bit accumulators, add 16- and 32-bit operations for the registers, provide a way to extend index registers, ... I know there is a project for 32-bit 6502, try to seek the net for it.

Quote:

I don't need to "set" anything. I just use, when suitable. All string modes available every time.

If you want, for example, to copy string you have to set the source and destination addresses and the string length. Intel's ISA has also to set the direction, it is one byte only very fast instruction.

Quote:

Misunderstanding here ? Define what you call "development level" and "hacker level".
Code as seen in a source file is very near to that of a good disassembler output.

It is close to a kind of a craziness for me. Are fine sources with macros, conditional assembly, etc the same as a raw dissasembly?!

Quote:

With the full protection you *have* two stacks (and perhaps even more in complex x86).
My point is that protection shouldn't change anything apart some memory cells become unavailable.

But, why the heck do you insist on "two stacks" ? There is only one stack at any given time !!!

Moto's ISA has always to support ambiguous second stack, the second stack pointer, ... x86 always uses one hardware stack, one stack pointer.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
FOR SALE: Amiga 1200 Job Lot 200+ games, 2x Amiga 1200 lots of accessories and spares	erniet5	MarketPlace	0	28 April 2015 13:34
Desperately seeking Amiga Demo Coder	slayerGTN	Amiga scene	2	02 August 2010 23:34
Seeking External Amiga Disk Drives (AMP)	Crown	MarketPlace	5	29 October 2008 19:34
Seeking for 1 or 2 external disk drives (amiga)	Crown	MarketPlace	0	08 September 2006 09:42
Seeking for Amiga music composers	Crown	Amiga scene	0	18 May 2006 12:47

04 February 2017, 20:20	#25
idrougge Registered User Join Date: Sep 2007 Location: Stockholm Posts: 4,338	The fact that x86 code is compact should not be disputed, but if I were writing a compiler, I would stay very far away from that architecture, especially in its incarnations that were concurrent with the 68k line — guess why virtually no-one outside of IBM ever used Intel processors for new architectures. Yes, there is a distinction between address and data registers, but otherwise the 68000 was the most orthogonal design of its time. Just like if you compare the 6502 to the Intel/Z80 CPUs.

06 February 2017, 18:05	#36
litwr Registered User Join Date: Mar 2016 Location: Ozherele Posts: 229	I have to expess my gratitude for this so interesting discussion. A500 was very good but a bit slow and poorly expandable. I still want to reach a genuine A1200. @Thorham I used my first PC to play Wolfenstein 3D, Civilization and the best Ultima VI at 1991. Dune 2 was very good at 1992. Ultima VII was the excellent for this year too. Commodore always missed the idea of upgrade. They could make 4 MHz C64 at 1985, Amiga with 68020 at 1989, ... IMHO Commodore killed every thing which it touched to: 6502, VIC-2, C64, SFD-1001, C+4, Amiga, ... @meynaf I agree that 65816 is a bit wrong chip. IMHO it was a kind of the great struggle against the terminator 6502 which might terminate both Intel and Motorola. So after MOS Technology lost the battle 6502 development was left almost completely. The winners even constructed a joke about 6502 JMP () bug and made an "improved" 65C02. IMHO 6502 might be extended to 32 bits. It has a lot of free opcodes. They might add more accumulators, etc. 65816 lost the main feature of 6502 the speed. 4510 was much better but might be made in the 70s. Try to compare the upgraded 6809 6309 and 65816. 6308 has power of 2 or 3 6809 but the speed of 65816 for 16 bit code is only 50% faster than 6502 and the same or even slower for 8-bit. 6502 was the champion to the end of the 70s. It is faster than 68000 at the same frequency. Intel's LOOP instruction may start a loop with CX=0 that means 65536 times. No any advantage of DBRA was shown. Sorry, it was me who was not ok. I forgot that x86 ISA has several opcodes for the same instruction: ADD, SUB, ... SHL/SAL is among them. I have an excuse in the fact that almost all (all?) assemblers compile SAL and SHL into the same opcode. My other excuse is in the fact that Intel's 8086 manual gives one opcode too. I agree that Intel's documentation is always slightly incomplete. They don't like to write about bugs and missed instructions (like IBTS). They have a monopoly but they worked better than Motorola. I don't find a reason in claiming STD or CLD instructions useless. They are one byte only part of a setup for a string instruction. This setup includes also setting of one or two index registers and a counter register - it is the same as at 680x0. I have found a contradiction in your logic. You wrote about readability at source level. I gave you an example of x86_64 assembly and you wrote about the debugger level. They are not the same levels. It looks like that GCC for 680x0 has much more poor optimizer than for x86. It is almost impossible to beat GCC at x86 by the manual assembly. Thank you for the example for the second 680x0 stack. I missed that simple idea. However it means little because it is about protection only a few dozen of bytes of the system stack, all other memory maybe corrupted by a wrong program. So giving 0.1% safer system by the expensive 2nd stack concept is not very good idea. It can't solve the problem which maybe solved by memory protection only. It only makes architecture more complex and bulky. You wrote that the same x86 binary code can't work the same in three x86 modes. It is more than obvious. Indeed it creates problem to you in your debugger project... The code orthogonality is an obsolete concept from the 60s and 70s when programmer had to use assembler for almost every task. Motorola missed that the speed, timeliness and price mean much more. I want to have an opportunity to do something for 68060 but it is almost a legendary rarity. There are no systems with it. However I doubt that 68060 can outperform ARM much at the same frequency. AAM и AAD are very complex instructions but they are also archaic and useless.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)