some fancy ideas for a extended (68k?) CISC-CPU - Page 3

Gorf · 16 December 2020, 10:39

Quote:

Originally Posted by Bruce Abbott

The 68k in an Amiga is what it is. Coders should enjoy trying to squeeze the most out what they have rather than pining for something 'better'. When playing chess do you change the rules because a piece can't make a move you want? Of course not, and playing within the rules makes it more fun!

That is a good point, you are making here.
My ideas must be seen as "changing the rules" and alienate quite a few people ...

probably not the right place for this kind of discussion - sorry for that.

robinsonb5 · 16 December 2020, 10:57

Quote:

Originally Posted by Gorf

That is a good point, you are making here.
My ideas must be seen as "changing the rules" and alienate quite a few people ...

I don't see anything wrong with exploring ideas, and changing rules, provided of course that you're not demanding that anyone else should play by your new rules!

If nothing else, by exploring what happens when you change the rules, you sometimes gain new insights into why things were done the way they were.

Is this discussion pointless? Yes, probably - but in the grand scheme of things so is every other Amiga-related endeavour over the last two decades. Anyone who doesn't find the discussion interesting is perfectly welcome to ignore it!

meynaf · 16 December 2020, 11:39

Quote:

Originally Posted by Gorf

So I asked for something you have in mind ... you pointed me to this repository and I took literally there first example.
I used only 4 registers instead of 6 in fewer lines.
Your reply to that:

"Partial code so no pressure on registers.."

ok .....

Removing a few data registers out of a routine is something, but if you don't handle the pressure on address registers as well it's not very useful.
Note that 4-bit c2p has of course a lot less pressure than 8-bit one.

Quote:

Originally Posted by Gorf

True.
But the 3 ideas, i mentioned here specifically, are quite ISA Independent - and they would not break any old code no matter of where they would be applied.

I'm not sure it wouldn't break old code.
Sure the repeat can only break implementations.
But let's say we have routine A calling routine B.
Routine A uses the register stack.
Without any special care, everything routine B does, will break the stack. Even if it's explicitly documented as not touching some regs, it can't just save/restore the 32-bit regs like before. Saving is ok but restoring would push.

Quote:

Originally Posted by Gorf

since this forum is 68k specific I did not want to go further off topic....

again - my approach is more generic, but since this forum is for 68k and most coders here are used to this, I tried to keep it 68k-related for this discussion.

That's ok, but a few non-68k code doesn't harm - especially if it could be added to 68k without any incompatibility.

Quote:

Originally Posted by Gorf

If you repeat this single instruction a couple of times, as you probably would in this case, you could probably make use of the "repeat"-feature as well and save even more ...

I wanted to show you that a few well chosen new instructions, or addressing modes for existing instructions, can lead to a better result than any fancy idea.
But yes these can be used at the same time... at least in theory ; in practice i would have to remove existing encodings to make room for that feature, potentially harming code density in other places...

At the end, is it worth ?
It only brings code density, nothing in ease to code or speed, and doesn't make implementation easier.

grond · 16 December 2020, 12:49

Quote:

Originally Posted by Gorf

Stacks of registers

This feature is inspired by stack-machines and/or so called register-windowing like in SPARC-CPUs, but differs in implementation.
Imagine every register is in fact a stack of entries - to make it simple we start by only two values. Every write operation onto one of our 16 registers is in fact a push operation, copying the existing value to the second (otherwise hidden) entry before the new value is written.
Special instructions allow to swap these two values and restoring the former value to a register - to a range of registers and/or to all 16 registers.

I also have a soft spot in my heart for register frames and CPUs that operate on a stack but I think time has shown that these concepts have more limitations than advantages.

PUSH/POP type instructions and the entire head of the traditional stack can be made transparent (i.e. zero execution time) using shadow registers. Basically the difference between 2-operand and 3-operand ISAs is that you need a lot of extra MOVE instructions for the 2-operand ISA. However, you get shorter instructions in exchange. In a modern CPU all those extra MOVEs of 2-operand ISAs just disappear in the execution time because the CPU internally just works with 3 operands after early decode. This makes your 2-operand code basically compressed 3-operand code with the MOVE being equivalent to an extension word for 3-operands which you only use if you need the 2nd input operand of the 2-operand operation after executing that.

The downside of your approach is that, while you can easily extend the number of shadow registers any way you want (wattage constraints etc.) without changing compatibility and code, you can't (at least not easily) if you explicitly define register stacks and frames.

Quote:

"I know what you did last summer!"

The CPU keeps track of the last 16 instructions - imagine a kind of internal log or special instruction cache.
A special instruction allows to tell the cpu to execute any of these instructions again, in any order (so it is not a jump or loop!!)
The special instruction allows to repeat either two out of the last 16 instructions or three of the last 8.
This should allow for very compact code.

This feature sounds like something no compiler will be able to use efficiently (certainly not a concern to Meynaf...

). However, it also sounds like a great way to mess up the pipeline starting with instruction fetch and decode, speculative execution, out-of-order, you name it...

I guess using a directly addressable 0-cyle access local memory like many DSPs do for operands and extending this concept to short subroutines might be a more interesting idea.

EDIT: the 16 previous instructions are also going to be a nightmare when doing context switches.

Quote:

The 8-Bit-Turbo

Too application specific for my taste. This is the sort of feature that might have been interesting for a few years when the emulating CPU wasn't much faster than the emulated CPU and then quickly would have turned into a legacy burden (I hear there are CPUs with a string-compare processor instruction; I imagine implementing it is always the job of the youngest team member in the dev team: "son, we all had to go through this!").

Gorf · 16 December 2020, 19:18

Quote:

Originally Posted by meynaf

Removing a few data registers out of a routine is something, but if you don't handle the pressure on address registers as well it's not very useful.
Note that 4-bit c2p has of course a lot less pressure than 8-bit one.

well, I can have a look for some code with that problem again or you post an example.
In theory static address registers should be a very good use case for my idea.

Quote:

I'm not sure it wouldn't break old code.
Sure the repeat can only break implementations.
But let's say we have routine A calling routine B.
Routine A uses the register stack.
Without any special care, everything routine B does, will break the stack. Even if it's explicitly documented as not touching some regs, it can't just save/restore the 32-bit regs like before. Saving is ok but restoring would push.

That is true.
When using this new feature in combination with old code, one should turn off the stack-feature - it is one write to the features register, that I mentioned earlier and locks the topmost register in place - before jumping to the critical subroutine....

Quote:

I wanted to show you that a few well chosen new instructions, or addressing modes for existing instructions, can lead to a better result than any fancy idea.
But yes these can be used at the same time... at least in theory ; in practice i would have to remove existing encodings to make room for that feature, potentially harming code density in other places...

As almost everything in life, it is a trade off.

Quote:

At the end, is it worth ?
It only brings code density, nothing in ease to code or speed, and doesn't make implementation easier.

Well code density should also provide some speed benefits, due to better cache usage, less instruction fetches ...

meynaf · 16 December 2020, 19:52

Quote:

Originally Posted by Gorf

well, I can have a look for some code with that problem again or you post an example.
In theory static address registers should be a very good use case for my idea.

You could attempt to write complete 8-bit c2p, without paying too much attention about current code. I found it to be a nice stress test for my vm.

Quote:

Originally Posted by Gorf

That is true.
When using this new feature in combination with old code, one should turn off the stack-feature - it is one write to the features register, that I mentioned earlier - before jumping to the critical subroutine....

Ok, but then your "they would not break any old code no matter of where they would be applied" isn't exactly true.
Well, of course one can not think about every possible case.

Quote:

Originally Posted by Gorf

As almost everything non live, it is a trade off.

Obviously.

Quote:

Originally Posted by Gorf

Well code density should also provide some speed benefits, due to better cache usage, less instruction fetches ...

Maybe, but that depends a lot on the implementation.
For a software vm, forget it.
And for hardware, well, read grond's comments.

Gorf · 16 December 2020, 19:55

Quote:

Originally Posted by grond

I also have a soft spot in my heart for register frames and CPUs that operate on a stack but I think time has shown that these concepts have more limitations than advantages.

PUSH/POP type instructions and the entire head of the traditional stack can be made transparent (i.e. zero execution time) using shadow registers. Basically the difference between 2-operand and 3-operand ISAs is that you need a lot of extra MOVE instructions for the 2-operand ISA. However, you get shorter instructions in exchange. In a modern CPU all those extra MOVEs of 2-operand ISAs just disappear in the execution time because the CPU internally just works with 3 operands after early decode. This makes your 2-operand code basically compressed 3-operand code with the MOVE being equivalent to an extension word for 3-operands which you only use if you need the 2nd input operand of the 2-operand operation after executing that.

The downside of your approach is that, while you can easily extend the number of shadow registers any way you want (wattage constraints etc.) without changing compatibility and code, you can't (at least not easily) if you explicitly define register stacks and frames.

shadow registers and my "stacked" registers are not mutual exclusive. It only needs more space of course...

But from what I learned here so far, I was trying to solve a non existing problem, since more registers are not as needed as I thought ...

Quote:

This feature sounds like something no compiler will be able to use efficiently (certainly not a concern to Meynaf...

). However, it also sounds like a great way to mess up the pipeline starting with instruction fetch and decode, speculative execution, out-of-order, you name it...

For fetching and decoding I think I have some ideas how to do it - at least in theory.
(as in: the referred instructions are already decoded ...)
Speculative execution, out-of-order ... well - that all is far beyond the scope of my little project.

Quote:

I guess using a directly addressable 0-cyle access local memory like many DSPs do for operands and extending this concept to short subroutines might be a more interesting idea.

Scratchpad ... actually that is where my thoughts started, but this is also a nightmare for multitasking ...
Actually the TAOS project did something like that: it's first VM was a many register machine (32?) and there was an implementation for the Transputer CPU, which is a stack-machine, but had very fast internal memory (claimed register-speed)
So this VM made the Transputer a de facto register machine

Quote:

EDIT: the 16 previous instructions are also going to be a nightmare when doing context switches.

not really - they are saved back and loaded in reverse order, so the previous state is guaranteed.

Quote:

Too application specific for my taste. This is the sort of feature that might have been interesting for a few years when the emulating CPU wasn't much faster than the emulated CPU and then quickly would have turned into a legacy burden (I hear there are CPUs with a string-compare processor instruction; I imagine implementing it is always the job of the youngest team member in the dev team: "son, we all had to go through this!").

True .. this is probably the least general purpose idea of the three.
While I came up with it, trying to speed up emulation, it is not limited to that: It should improve any kind of byte-code like e.g. for scripting languages, and might also be useful for compression algorithms.

grond · 16 December 2020, 20:16

Quote:

Originally Posted by Gorf

But from what I learned here so far, I was trying to solve a non existing problem, since more registers are not as needed as I thought ...

Well, that was one assembly language coder's opinion. I think 16 GPR are often too few, especially when you execute code that is created by a compiler and not hand optimised by a coder taking the necessary time to squeeze out the last few clock cycles and bytes.

Quote:

Scratchpad ... actually that is where my thoughts started, but this is also a nightmare for multitasking ...

Well, such memory would have to be a processor resource and you would either have the OS swap in and out code on context switches or the processor issue privilege violations when code tries to execute code in the local memory it did not allocate with the OS before. This means the processor would at least need some additional status bit to track the privilege.

Quote:

not really - they are saved back and loaded in reverse order, so the previous state is guarantied.

Yes, but you then need to refetch 16 instructions from icache (hoping they are still there) or from some other place. I'd rather store away more registers on a context switch. You also lose a lot of pipeline content when you get an interrupt in the middle of the "re-execute instructions x, y, z"-instruction.

Gorf · 16 December 2020, 22:52

Quote:

Originally Posted by grond

Yes, but you then need to refetch 16 instructions from icache (hoping they are still there) or from some other place. I'd rather store away more registers on a context switch. You also lose a lot of pipeline content when you get an interrupt in the middle of the "re-execute instructions x, y, z"-instruction.

Now we are talking about two different features:
My reply was about how to save the "invisible" registers of my stacked registers.

If you are talking about the "log" - this is not not part of the normal instruction cache, but in fact a bunch of registers ... and yes: they have to be saved on context switches, as any other registers do, if your next task is going to use them.
(kernel and interrupt request handlers can chose not to use this feature - stop the log and avoid the memory transaction)

Bruce Abbott · 17 December 2020, 05:04

Quote:

Originally Posted by meynaf

Yeah, so we get frustrated and search for ways to make things better. There is nothing bad in this.

Actually there is. The time you waste trying to create a 'better' instruction set (that nobody will use) could be better spent producing code for existing machines. Some of the best Amiga software we have today was produced using appalling languages such as AMOS and Blitz BASIC, and most 'contemporary' compilers produced code that has glaring inefficiencies. But the developers didn't worry about that, they just got stuck in and wrote programs that worked.

Quote:

In my case it's not only about making the code smaller, but also easier to read and write.

I understand your motivation. I just think it is a waste of time as far as Amiga coding goes. If you want to develop your own CPU that's fine, but it is of little interest to those of us who want to get more out of what we have rather than 'cheating'.

I personally think that 68k is close enough to ideal from a coding perspective that it doesn't need improving. Your 'improvements' would just make it harder for me to read and write, because then I would have to learn new instructions that do unfamiliar things - for marginal benefit.

Quote:

There is no progress if we just accept all limitations inflicted to us.

There is no progress if we try to avoid reality by inventing a fantasy and then pretending we can live in it. Sure it would be nice if we could make 68k code smaller and easier to read and write, but then it wouldn't be 68k code and wouldn't run on the machines we love!

Quote:

So because chess exists it becomes forbidden to invent new games with different rules ?

Invent a new game? Fine, just don't expect to find anyone to play it with.

look, I understand where you are coming from. After doing a couple of large Z80 projects I had similar ideas about improving it. But I rejected the idea because I would be destroying the character of the machines I was working on - and that's with a CPU that is begging for improvement. I got more enjoyment out of producing code that was fast and efficient despite the CPU's limitations, than if I had taken the lazy path of tinkering with the ISA.

But don't let my opinion stop you guys from continuing what you are doing. even if it is a dead end, it still makes us think more about 68k coding and appreciate its beauty (and warts), and what we can create with it.

meynaf · 17 December 2020, 09:11

Quote:

Originally Posted by Bruce Abbott

Actually there is. The time you waste trying to create a 'better' instruction set (that nobody will use) could be better spent producing code for existing machines. Some of the best Amiga software we have today was produced using appalling languages such as AMOS and Blitz BASIC, and most 'contemporary' compilers produced code that has glaring inefficiencies. But the developers didn't worry about that, they just got stuck in and wrote programs that worked.

Let's be honest : producing software for Amiga isn't very useful either, considering the low number of people using it.
Sure, Amiga software can work on other machines as well thru emulation, but this needs a copyrighted ROM to work. My VM does not need that.

Quote:

Originally Posted by Bruce Abbott

I understand your motivation. I just think it is a waste of time as far as Amiga coding goes. If you want to develop your own CPU that's fine, but it is of little interest to those of us who want to get more out of what we have rather than 'cheating'.

The as far as Amiga coding goes says it all. For me it's not only Amiga. My vm could work on basically any decent machine.

Quote:

Originally Posted by Bruce Abbott

I personally think that 68k is close enough to ideal from a coding perspective that it doesn't need improving. Your 'improvements' would just make it harder for me to read and write, because then I would have to learn new instructions that do unfamiliar things - for marginal benefit.

I'm pretty sure you can understand something such as

and.l a0,d0

or

eor.w (a0)+,d0

without any effort.
Then it's like going from 68000 to 68020. You're free to use improvements or not.
But the benefit isn't marginal. What previously took a lot of efforts or was even not doable in asm due too complex, is now straigthforward or at least possible.
All the memory you used to keep the 68k's limits in mind is now free for other, better tasks because most of these limits are gone.

Quote:

Originally Posted by Bruce Abbott

There is no progress if we try to avoid reality by inventing a fantasy and then pretending we can live in it. Sure it would be nice if we could make 68k code smaller and easier to read and write, but then it wouldn't be 68k code and wouldn't run on the machines we love!

It's not 68k code but it can run on it nevertheless. That's the principle of a virtual machine.
And yes it's smaller and easier to read and write (at least for myself

).

Quote:

Originally Posted by Bruce Abbott

Invent a new game? Fine, just don't expect to find anyone to play it with.

look, I understand where you are coming from. After doing a couple of large Z80 projects I had similar ideas about improving it. But I rejected the idea because I would be destroying the character of the machines I was working on - and that's with a CPU that is begging for improvement. I got more enjoyment out of producing code that was fast and efficient despite the CPU's limitations, than if I had taken the lazy path of tinkering with the ISA.

It's a very different motivation. I'm not coding on the 68k for the pleasure of handling its limitations, but only because it is the only cpu family that's really usable for this task...
I run into the 68k's limitations just too often. I originally tried to solve that with macros but they can't handle everything.

Quote:

Originally Posted by Bruce Abbott

But don't let my opinion stop you guys from continuing what you are doing. even if it is a dead end, it still makes us think more about 68k coding and appreciate its beauty (and warts), and what we can create with it.

Oh don't worry, it's too late to stop me at least. I have fully designed the instruction set, fully implemented it in a 68k vm (and partially for a pc vm). I have an assembler and a debugger, both written in the vm code itself.
Of course it is currently very slow which limits its use (even though the high level part can run at native speed). But if i could directly translate the code to 68k so that it does at least same job as a compiler, i could release some software using it without the users even knowing about it...

In fact, my vm is the continuation of my system framework, but for the low level. Do you know my system framework ? Probably not, it's not visible to the end user - but nearly all my code uses it, from my picture viewer, audio flac player, to all the games i ported. And it allows writing such programs without any direct use of OS and hardware. In a shorter, more efficient way.
Perhaps i will forever be alone using it. But i don't care, as it makes my life easier. And it is the same for my cpu.

Bruce Abbott · 18 December 2020, 07:00

Quote:

Originally Posted by meynaf

I'm pretty sure you can understand something such as

and.l a0,d0

or

eor.w (a0)+,d0

without any effort.

Shows you how little I use eor - until now I didn't know you can't do that on 68k!

But can you understand this?

Code:

   move.b ([Label1,za1,d1.w],Label2),d0
 ...
Label1:
 ...
Label2:

The assembler I use (ProAsm) thinks it does, but generates incorrect code because it refuses to relocate the labels!

Perhaps I should use Barfly assembler instead, since it is a bit faster and doesn't have this bug? But it can't handle the suppressed address register!

So I am more concerned about being able to use all existing 68k instructions than creating new ones that aren't needed.

meynaf · 18 December 2020, 08:26

Quote:

Originally Posted by Bruce Abbott

Shows you how little I use eor - until now I didn't know you can't do that on 68k!

That might show you are not doing enough asm to run into the limitations.

Quote:

Originally Posted by Bruce Abbott

But can you understand this?

Code:

   move.b ([Label1,za1,d1.w],Label2),d0
 ...
Label1:
 ...
Label2:

Yes i can. Not that i will use anything like this, though. The 68020+ addressing modes are big and slow. They make implementation more complex and are not very useful.

Quote:

Originally Posted by Bruce Abbott

The assembler I use (ProAsm) thinks it does, but generates incorrect code because it refuses to relocate the labels!

Perhaps I should use Barfly assembler instead, since it is a bit faster and doesn't have this bug? But it can't handle the suppressed address register!

I use PhxAss and it handles all that just fine.

Your example here is very interesting. It shows the 68k can be more complicated than it should, and for little benefit. Another shortcoming i wanted to fix.

Quote:

Originally Posted by Bruce Abbott

So I am more concerned about being able to use all existing 68k instructions than creating new ones that aren't needed.

It's only good knowledge about the existing parts that can make you feel the need about having new ones. If you don't have this expertise then you can't say if new ones are needed or not.
As for using all existing 68k instructions, good luck at finding a use for rtr, chk, cmp2, chk2, cas2, trapv, trapcc, nbcd.

Bruce Abbott · 18 December 2020, 22:59

Quote:

Originally Posted by meynaf

That might show you are not doing enough asm to run into the limitations.

Or perhaps just not the type of code that needs it. But it's not a big deal. Having to add one extra instruction to complete a rarely used function is nothing to be concerned about, and certainly not enough reason to create a new CPU.

Quote:

Not that i will use anything like this, though. The 68020+ addressing modes are big and slow. They make implementation more complex and are not very useful.

I wouldn't normally use such instructions, but compilers do. I found this when attempting to reassemble A09, the 6809 assembler found on Aminet. It was compiled with GCC 6.5.0b, which generates more sophisticated code than older Amiga C compilers. A09 is open-source so you might think reassembling it is a waste of time, but it is a good test for dealing with executables that don't have source (as well as interesting to see what code GCC generates).

Quote:

As for using all existing 68k instructions, good luck at finding a use for rtr, chk, cmp2, chk2, cas2, trapv, trapcc, nbcd.

chk2, cas2 and cmp2 might be illegal on Amiga hardware, but trapv, trapcc and rtr are sometimes found in executables so they are obviously needed.

You are creating a 68k-like instruction set for you own code so dropping some opcodes is fine, but it wouldn't do for something that needs to run Amiga programs. Nevertheless it is interesting to consider how the 68k could have been improved in a compatible way. It is a bit puzzling why some instructions do not allow certain addressing modes, while others are 'unnecessarily' duplicated.

meynaf · 18 December 2020, 23:29

Quote:

Originally Posted by Bruce Abbott

Or perhaps just not the type of code that needs it. But it's not a big deal. Having to add one extra instruction to complete a rarely used function is nothing to be concerned about, and certainly not enough reason to create a new CPU.

Except that it's not about adding one instruction that's gonna be rarely used. You can hardly have any routine of significant size that can't use any of my expansions. Sometimes the gain is big. I think rewriting a 15-line block into just 4 lines is worth.

Quote:

Originally Posted by Bruce Abbott

chk2, cas2 and cmp2 might be illegal on Amiga hardware, but trapv, trapcc and rtr are sometimes found in executables so they are obviously needed.

Being present here and there in extremely rare cases don't mean they're really "needed". Especially because they are quite easy to replace by something else.

Quote:

Originally Posted by Bruce Abbott

You are creating a 68k-like instruction set for you own code so dropping some opcodes is fine, but it wouldn't do for something that needs to run Amiga programs.

It is obvious that the goal isn't object level compatibility.

Quote:

Originally Posted by Bruce Abbott

Nevertheless it is interesting to consider how the 68k could have been improved in a compatible way. It is a bit puzzling why some instructions do not allow certain addressing modes, while others are 'unnecessarily' duplicated.

Right, 68k lacks a little bit of logic. In my vm, if an addressing mode is meaningful (and not an easy to remove duplicate) then it is allowed.

Thomas Richter · 19 December 2020, 09:05

Quote:

Originally Posted by meynaf

As for using all existing 68k instructions, good luck at finding a use for rtr, chk, cmp2, chk2, cas2, trapv, trapcc, nbcd.

rtr has certainly its uses for restoring the full context transparently, after a movem of all registers. If I recall, it may be that COP uses this instruction. chk is a compiler support for languages that expect or require bounds checking on arrays. Such languages run "out of favour" as we do such things not at all (C-style) or in software (suitable classes in C++ with bounds checking), but it is certainly useful for Pascal.

cmp2 is useful for saturation logic, and chk2 for debug paths of similar applications, or for 1-based arrays if the source language requires that.

cas2 is essential for multi-core systems. Implementing a lock-free queue or a robst lock-free stack without cas2 is hard if not impossible. Of course, on the Amiga, you do not need anything of that.

trapv is in use by certain compilers - I recall an Oberon compiler for the Amiga - which used it for arithmetic overflow checking. That is, from a language perspective, a much better and saner logic that the C style "undefined behaivour" on arithmetic overflows. trapcc is likewise an extension, if you need this for signed logic.

All in all, these instructions are for languages that run out of favour, though are still useful, or for system designs the Amiga does not follow, but that have been envisioned by Motorola.

There are other examples in this family: Consider "callm" which supports (through the 68581 MMU) a layered security system. That was considered "a good idea" somewhere in the 90's, but nobody writes operating systems with more than 2 layers nowadays, and if so, then uses virtual machines. So it was useful to implement an architecture paradigm that "run out of favour", though was still useful (or required) for it.

Thomas Richter · 19 December 2020, 09:13

Quote:

Originally Posted by Bruce Abbott

You are creating a 68k-like instruction set for you own code so dropping some opcodes is fine, but it wouldn't do for something that needs to run Amiga programs. Nevertheless it is interesting to consider how the 68k could have been improved in a compatible way. It is a bit puzzling why some instructions do not allow certain addressing modes, while others are 'unnecessarily' duplicated.

There is a certain logic why particular addressing modes are not allowed. The design principle of the 68K is to separate instruction and data space. In fact, with proper hardware, one could have data space and instruction space in separate RAMs as the 68K provides control signals that indicate what is addressed: data or instrucitons.

Anything that is in the instruction space, cannot be modified by the processor. Thus, if you address something relative to the PC, it is "instruction space". You thus can read relative to the PC, but not write relative to the PC.

While I do not have evidence at this moment, I would believe that the external logic would also decode accesses relative to the PC as "instruction space" addresses, whereas indirections through an address register are "data accesses".

The 68K there follows a generally advisable design, unfortunately one the Amiga does not fully implement, but that has been envisioned for other applications.

While we are at it: nbcd is useful for BCD arithmetics, it is in the same family as abcd and sbcd, and as such useful. BCD arithmetics run out of favour (for better or worse), but there was "some market" that likes it (financial) and there are certainly languages that would require it. Probably cobol.

meynaf · 19 December 2020, 10:35

Quote:

Originally Posted by Thomas Richter

rtr has certainly its uses for restoring the full context transparently, after a movem of all registers. If I recall, it may be that COP uses this instruction. chk is a compiler support for languages that expect or require bounds checking on arrays. Such languages run "out of favour" as we do such things not at all (C-style) or in software (suitable classes in C++ with bounds checking), but it is certainly useful for Pascal.

cmp2 is useful for saturation logic, and chk2 for debug paths of similar applications, or for 1-based arrays if the source language requires that.

cas2 is essential for multi-core systems. Implementing a lock-free queue or a robst lock-free stack without cas2 is hard if not impossible. Of course, on the Amiga, you do not need anything of that.

trapv is in use by certain compilers - I recall an Oberon compiler for the Amiga - which used it for arithmetic overflow checking. That is, from a language perspective, a much better and saner logic that the C style "undefined behaivour" on arithmetic overflows. trapcc is likewise an extension, if you need this for signed logic.

I perfectly know the intent of all these instructions, it's just that they're quite a failure. They were for special cases and have been made obsolete thru time. They are now legacy.

rtr

is just

move (sp)+,ccr

+

rts

. Not worth creating an instruction.

chk

is faster than

cmp

+

bcc

but less flexible (it crashes if out of bounds, period) and it takes significant opcode space which probably could have been used for something less specific.

cmp2

/

chk2

are too slow to be of any real use and they'll never gonna be fast as they're dependent of memory accesses. All this makes them unsuitable for most of the cases they were supposed to handle...
Besides, their implementation is problematic (see 68060).

cas2

can't be essential for multi-core systems, as multi-core RISC cpus certainly don't have such a complex instruction and they still work.
It is not for nothing it has been deleted too in 68060.

The same feature as

trapv

/

trapcc

is easy to get with branches, but branches can at least do other things than just crashing by the means of an exception.

While we're at it,

nbcd

is easy to replace with

sbcd

(it's just 0-n). The case is too rare to justify an instruction.

So yes, i wouldn't advocate removing all these for a cpu intended to execute legacy code, but for new software you have to admit they're not the most useful stuff we have...

OTOH, move with zero/sign extend would be incredibly common -- but we just don't have that.

Quote:

Originally Posted by Thomas Richter

All in all, these instructions are for languages that run out of favour, though are still useful, or for system designs the Amiga does not follow, but that have been envisioned by Motorola.

They were specific to some use case, and such instructions become obsolete one day or another.

Quote:

Originally Posted by Thomas Richter

There are other examples in this family: Consider "callm" which supports (through the 68581 MMU) a layered security system. That was considered "a good idea" somewhere in the 90's, but nobody writes operating systems with more than 2 layers nowadays, and if so, then uses virtual machines. So it was useful to implement an architecture paradigm that "run out of favour", though was still useful (or required) for it.

Well,

callm

was so useful that it didn't even reach next generation of same cpu. It is just too complex.

Not targeted at me but i wanted to react :

Quote:

Originally Posted by Thomas Richter

There is a certain logic why particular addressing modes are not allowed. The design principle of the 68K is to separate instruction and data space. In fact, with proper hardware, one could have data space and instruction space in separate RAMs as the 68K provides control signals that indicate what is addressed: data or instrucitons.

Actually this logic is quite a failure. Nothing prevents doing

lea

to some address and the space can't be known afterwards. Any stored pointer will lose that info.
By disallowing PC-relative writes Mot' just wanted to discourage SMC. Now whenever we have data right after the code and need to make alterations to it, we run into that limitation.
But ok. We know why it's there and can live with it.

It does not explain, however, why we can't

eor

from mem, or why we can't

movem.x rlist,(an)+

.
(In reality i know the reasons behind these too, but they're not very valid either.)

Thomas Richter · 19 December 2020, 12:40

Quote:

Originally Posted by meynaf

I perfectly know the intent of all these instructions, it's just that they're quite a failure.

Apparently, not.

Quote:

Originally Posted by meynaf

They were for special cases and have been made obsolete thru time. They are now legacy.

Same as the whole architecture.

Quote:

Originally Posted by meynaf

chk is less flexible (it crashes if out of bounds, period) and it takes significant opcode space which probably could have been used for something less specific.

It doesn't crash. It creates an exception, which is exactly what is intended. In such a case, the operating system (stress on "operating", as in "not as AmigaOs") should terminate the executing process as something dangerous was about to happen, such as an out-of-bounds access of an array. This is important for secure programs.

Quote:

Originally Posted by meynaf

cmp2

/

chk2

are too slow to be of any real use and they'll never gonna be fast as they're dependent of memory accesses. All this makes them unsuitable for most of the cases they were supposed to handle...

I doubt these are slower than the alternative code paths of making two bounds checks.

Quote:

Originally Posted by meynaf

cas2

can't be essential for multi-core systems, as multi-core RISC cpus certainly don't have such a complex instruction and they still work.

Apparently, you have never attempted lock-free programming. There are two methods to do so. One is "compare and check", which is the primitive of 68K and the x86. Risc cores use another primitive, which is a bit more complicated to handle, but not available on the 68K. CAS2 is very elegant and fast in such situations. Google for "lock free stack" to find a perfect application for it, and the "ABA problem" why it is necessary.

Quote:

Originally Posted by meynaf

It is not for nothing it has been deleted too in 68060.

At that time, SMP was not much an issue, and multi-core 68K systems were rare. However, nowadays the story is a completely different one, and CAS2 or rather its x86 equivalent, compare-exchange with "lock prefix" are essential. Mot envisioned such systems with the 68K, but there wasn't a market big enough back then. On the Amiga, you cannot use locked memory transfers safely, so this instruction family is of no use in the Amiga anyhow, but it is quite important for some other systems (or was supposed to be).

Quote:

Originally Posted by meynaf

The same feature as

trapv

/

trapcc

is easy to get with branches, but branches can at least do other things than just crashing by the means of an exception.

An exception is not a crash. It is a crash on something as lousy as AmigaOs, but not on a real operating system. It is in use by some languages. As said, Oberon and Pascal come to my mind.

Quote:

Originally Posted by meynaf

So yes, i wouldn't advocate removing all these for a cpu intended to execute legacy code, but for new software you have to admit they're not the most useful stuff we have...

Look, these are two different things... 68K is not Amiga. 68K is a platform Motorola envisioned, and not all these visions became true. That does not make particular instructions pointless - they are just useless on the Amiga as we have it. Besides, I'm holding my breath for new software on this (or any 68K system) for that matter.

Quote:

Originally Posted by meynaf

OTOH, move with zero/sign extend would be incredibly common -- but we just don't have that.

We have ext.w and ext.l and extl.l for that. I don't know why Mot did not include that, even though it is indeed useful for the C programming language which requires extending operants to int before performing arithmetics on them. Other languages have other rules. Vampire has it, but I wouldn't still use that. In reality, opcode fusion would be the way how this is addressed in modern systems (i.e. a sequence such as move.w ..,dx:ext dx becomes a single opcode in the pipeline before reaching the ALU).

Quote:

Originally Posted by meynaf

Well,

callm

was so useful that it didn't even reach next generation of same cpu. It is just too complex.

Certainly, an active market is a moving target. The 386 had similar mechanisms, so Mot probably copied it from there. Operating systems that could take advantage of it never materialized, so it became useless. Now, we don't have an active market.

Quote:

Originally Posted by meynaf

Actually this logic is quite a failure. Nothing prevents doing

lea

to some address and the space can't be known afterwards. Any stored pointer will lose that info.

Then don't use LEA. The processor was build with a particular design in mind, and one was "do not modify the text segment". Of course, on assembler level, you cannot enforce that, but you can give hints as what is "acceptable" and what isn't. It may be that a "move d(PC),d0" may give different data than a "lea d(PC),a0:move (a0),d0". The former addresses the text segment, the latter data with the same logical address.

Quote:

Originally Posted by meynaf

By disallowing PC-relative writes Mot' just wanted to discourage SMC.

Not only that. It is a "Haward architecture" with separate code and data paths, which has "security wise" some advantages as data cannot overwrite code or constant data. So there is some sense in this design, except that the Amiga did not use the function codes available at the outside. In fact, the Atari ST did, so you could not access hardware registers from user code (which makes perfect sense to me).

Quote:

Originally Posted by meynaf

It does not explain, however, why we can't

eor

from mem, or why we can't

movem.x rlist,(an)+

.
(In reality i know the reasons behind these too, but they're not very valid either.)

EOR from memory is probably because there wasn't enough code space available for it, and there wasn't sufficient evidence that it would be widely useful. The latter is that movem is supposed to organize stack frames on the call path. Actually, from a design perspective, "movem" is quite a horror and should have been omitted in first place as the instruction is hard to interrupt and re-run, which causes all strange cases in exception handlers and exception processing. But when the 68K was designed, it was an all-microcoded CPU in the legacy of the 6800 without the need to support virtual memory, so the instruction stuck.

Nowadays, nobody would use the LINK and UNLK instruction either as compilers can keep track of the stack frame themselves, without wasting an address register as frame pointer. But that was all before compilers became smarter.

So certainly 68K carries some legacy around, but that legacy is not Amiga specific.

meynaf · 19 December 2020, 13:47

Quote:

Originally Posted by Thomas Richter

Apparently, not.

Starting personal attacks again, i see.

Quote:

Originally Posted by Thomas Richter

Same as the whole architecture.

The fact it has been left behind, doesn't make it technically obsolete in any manner.

Quote:

Originally Posted by Thomas Richter

It doesn't crash. It creates an exception, which is exactly what is intended. In such a case, the operating system (stress on "operating", as in "not as AmigaOs") should terminate the executing process as something dangerous was about to happen, such as an out-of-bounds access of an array. This is important for secure programs.

Oh, please don't argue on semantics. An exception is a crash. A controlled one, but nevertheless a crash. It's not as if user code could redirect that exception by itself.

Quote:

Originally Posted by Thomas Richter

I doubt these are slower than the alternative code paths of making two bounds checks.

But they are.
Note that x64 also declared the

bound

instruction obsolete.

Quote:

Originally Posted by Thomas Richter

Apparently, you have never attempted lock-free programming. There are two methods to do so. One is "compare and check", which is the primitive of 68K and the x86. Risc cores use another primitive, which is a bit more complicated to handle, but not available on the 68K. CAS2 is very elegant and fast in such situations. Google for "lock free stack" to find a perfect application for it, and the "ABA problem" why it is necessary.

What i have attempted or not isn't the point. You have to stop always bringing the subject to my person.
Now, CAS2 is for double linked list. All others are satisfied with the simpler CAS instruction, which i wouldn't remove.

About the other primitive used by risc cores, and the 68k doesn't have, why not telling more about it ? Hey, wait. Isn't it just some

exg

instruction targeting memory ?

Quote:

Originally Posted by Thomas Richter

At that time, SMP was not much an issue, and multi-core 68K systems were rare. However, nowadays the story is a completely different one, and CAS2 or rather its x86 equivalent, compare-exchange with "lock prefix" are essential. Mot envisioned such systems with the 68K, but there wasn't a market big enough back then. On the Amiga, you cannot use locked memory transfers safely, so this instruction family is of no use in the Amiga anyhow, but it is quite important for some other systems (or was supposed to be).

This instruction isn't worth its silicon, it's a pita for a hw implementation, has an ugly encoding, and isn't friendly when it comes to supporting it in an assembler or a disassembler. All that for double linked lists which can be done by other means.

Quote:

Originally Posted by Thomas Richter

An exception is not a crash. It is a crash on something as lousy as AmigaOs, but not on a real operating system. It is in use by some languages. As said, Oberon and Pascal come to my mind.

An exception is a crash (wait ? didn't i write this same sentence earlier in that post ?

). In the same way as a null pointer dereference (which a "real" operating system will catch too).

Quote:

Originally Posted by Thomas Richter

Look, these are two different things... 68K is not Amiga. 68K is a platform Motorola envisioned, and not all these visions became true. That does not make particular instructions pointless - they are just useless on the Amiga as we have it. Besides, I'm holding my breath for new software on this (or any 68K system) for that matter.

I didn't pretend 68k was Amiga. But if an instruction is useless on most platforms exept a few, and its job can be done by other means, then it's not worth adding.

Quote:

Originally Posted by Thomas Richter

We have ext.w and ext.l and extl.l for that. I don't know why Mot did not include that, even though it is indeed useful for the C programming language which requires extending operants to int before performing arithmetics on them. Other languages have other rules. Vampire has it, but I wouldn't still use that. In reality, opcode fusion would be the way how this is addressed in modern systems (i.e. a sequence such as move.w ..,dx:ext dx becomes a single opcode in the pipeline before reaching the ALU).

But ext is for signed values only. For the unsigned it becomes more complicated if the register can not be cleared before reading the value (when the target is itself or an addressing mode uses it).
Coldfire has mvs/mvz. x86 has movsx/movzx. (An no, Vampire doesn't have it. Worse, it reused its natural encoding space for something else.)
Opcode fusion is the lazy reply. It doesn't change the fact there is a big impact on code density.

Quote:

Originally Posted by Thomas Richter

Then don't use LEA. The processor was build with a particular design in mind, and one was "do not modify the text segment". Of course, on assembler level, you cannot enforce that, but you can give hints as what is "acceptable" and what isn't. It may be that a "move d(PC),d0" may give different data than a "lea d(PC),a0:move (a0),d0". The former addresses the text segment, the latter data with the same logical address.

But text segment is very often followed by data segment, of which data could have been accessed with pc-relative modes.
It is nice theory vs programming flexibility. Forgive me if i prefer the latter.

Quote:

Originally Posted by Thomas Richter

Not only that. It is a "Haward architecture" with separate code and data paths, which has "security wise" some advantages as data cannot overwrite code or constant data. So there is some sense in this design, except that the Amiga did not use the function codes available at the outside. In fact, the Atari ST did, so you could not access hardware registers from user code (which makes perfect sense to me).

The example of the Atari ST is something else. What is used isn't the fact it's code or data, it is the FCx signal associated with the supervisor/user state. And indeed it makes sense ; not only hardware is protected, but also vector table and most system variables. Still nothing to do with PC-relative accesses, though.

Quote:

Originally Posted by Thomas Richter

EOR from memory is probably because there wasn't enough code space available for it, and there wasn't sufficient evidence that it would be widely useful.

Not enough code space, what a joke. The 68000 had full line-A and line-F available.
But yes, the excuse behind this error is that eor is 'rare enough'.

Quote:

Originally Posted by Thomas Richter

The latter is that movem is supposed to organize stack frames on the call path.

No. It is a pure implementation problem. As movem is a complex one to do, even with microcode. It made the cpu designers very conservative and so they handled only the most common case.
Yet the instruction is extremely handy in many situations, and a gem for code density. I'd rather extend it.

Quote:

Originally Posted by Thomas Richter

Actually, from a design perspective, "movem" is quite a horror and should have been omitted in first place as the instruction is hard to interrupt and re-run, which causes all strange cases in exception handlers and exception processing. But when the 68K was designed, it was an all-microcoded CPU in the legacy of the 6800 without the need to support virtual memory, so the instruction stuck.

Perhaps movem is an horror, yes, but it's less horrible than what we'd get if it weren't there at first place. Look at what Risc-V is forced to do for its register saves (you're gonna love it, no doubt

).

Quote:

Originally Posted by Thomas Richter

Nowadays, nobody would use the LINK and UNLK instruction either as compilers can keep track of the stack frame themselves, without wasting an address register as frame pointer. But that was all before compilers became smarter.

At least they made code more readable, by clearly showing where routines started and ended. Made disassembly of compiled code less painful.

Quote:

Originally Posted by Thomas Richter

So certainly 68K carries some legacy around, but that legacy is not Amiga specific.

It doesn't have to be Amiga specific for us to talk about it.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
68k & PPC CPU Usage monitor for OS3	ancalimon	support.Apps	1	29 June 2020 23:42
68k CPU pause (bubble)	kamelito	Coders. Asm / Hardware	9	27 January 2020 15:09
Bad weather for the 68K socket cpu cards	Solderbro	support.Hardware	0	14 July 2018 10:19
Looking to get max CPU performance in WinUAE 68k OS	GunnzAkimbo	support.WinUAE	1	12 May 2016 11:18
Apollo / Phoenix CISC CPUs m68k compatible	Snake79	News	3	05 March 2015 20:20

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)