some fancy ideas for a extended (68k?) CISC-CPU - Page 2

Gorf · 14 December 2020, 18:26

Quote:

Originally Posted by meynaf

You consider pushes aren't writes maybe ?

nope.
it is just a 32bit write to the "other" halve of a 64bit register, as mentioned before.
the former hidden entry is overwritten and becomes visible. The old entry of the register is now on the hidden part.

Quote:

The CPU does not need to store the content of one register anywhere. The data is read, then transmitted to the ALU, where it's just wiring to swap the values.

Well - how this "wiring" is actually implemented is an other question.
In my case EXG d0,d1 could decode to
move d0,d1
moveh d1,d0 (moveh beging move of hidden part of the register as micro-code, that is not exposed as op-code..)
That is of course up to the actual hardware implementation, if there ever is one...

Quote:

A flag that must be saved/restored upon context switches as well. It can't be just an internal, hidden flag.

As all register do - yes.

Quote:

Who cares ? Intel did it, it's reasonably fast on their cpus, so i have a proof it's perfectly doable in HW. It's not a trick, it's just an instruction - it's not like a feature that would change the whole architecture of the cpu (this is what i regard as a trick).

Well, you could just define a nibble-operation mode in your VM, relying on that feature, and call it addnib and do:
addnib color1, color2
And your solved the problem of the other thread with just one operation ...
I call that a trick.

Quote:

This implies the rep itself never enters the log.

Yes, as I already said in my first post...

Quote:

Also implies it's difficult to spot which instructions will be executed.

maybe unusual, but not really difficult.

Quote:

How can you possibly do that. After completion of some operation you never know if there will be - or not - another rep after.

So? The CPU never knows what comes next (ok, modern CPUs do - but that is irrelevant here...)

Quote:

When the interrupt has finished it has polluted the instruction log.

The log is part of the context and must of course be preserved on interrupts and task switches, like all other registers.
You can speed up things, by stopping the log and not use that command in your interrupts handler and/or kernel, if that is what you prefer.

Quote:

That's not the problem.
The trace exception would mess up the log by adding its own instructions in it. It's log pollution - same problem as the interrupt above.

Like on all conditional operations, you have to be aware of that fact and not refer to something that might not be there in the future.

meynaf · 14 December 2020, 19:50

Quote:

Originally Posted by Gorf

nope.
it is just a 32bit write to the "other" halve of a 64bit register, as mentioned before.
the former hidden entry is overwritten and becomes visible. The old entry of the register is now on the hidden part.

So i suppose this adds again another per-register bit, to avoid swapping by value.

Quote:

Originally Posted by Gorf

Well - how this "wiring" is actually implemented is an other question.
In my case EXG d0,d1 could decode to
move d0,d1
moveh d1,d0 (moveh beging move of hidden part of the register as micro-code, that is not exposed as op-code..)
That is of course up to the actual hardware implementation, if there ever is one...

Wiring is just what it means - plugging one wire into another.
In HW you don't need logical gates to change the order of bits.

Quote:

Originally Posted by Gorf

Well, you could just define a nibble-operation mode in your VM, relying on that feature, and call it addnib and do:
addnib color1, color2
And your solved the problem of the other thread with just one operation ...
I call that a trick.

I didn't create anything new. So it's not a trick.
The operations were there before. They are general purpose. They have a defined encoding. I simply used them. In less than a minute i can fire up my vm debugger, assemble these instructions somewhere in memory, and trace them to see their effect.
If you call that a trick, we'd better stop this discussion right now.

Quote:

Originally Posted by Gorf

maybe unusual, but not really difficult.

Still more work than reading "normal" code.

Quote:

Originally Posted by Gorf

So? The CPU never knows what comes next (ok, modern CPUs do - but that is irrelevant here...)

That was the point - the cpu can't know. But ok, if saving the log is no problem for you.

Quote:

Originally Posted by Gorf

The log is part of the context and must of course be preserved on interrupts and task switches, like all other registers.
You can speed up things, by stopping the log and not use that command in your interrupts handler and/or kernel, if that is what you prefer.

That's starting to become complicated, and i still do not see a benefit.

In your starting post you said you wanted to know how useful your ideas would be for real assembler coders. You have your answer, even though it's not at all the one you wanted.

Gorf · 14 December 2020, 22:11

Quote:

Originally Posted by meynaf

So i suppose this adds again another per-register bit, to avoid swapping by value.

no.

Quote:

Wiring is just what it means - plugging one wire into another.
In HW you don't need logical gates to change the order of bits.

It's magic!

Quote:

I didn't create anything new. So it's not a trick.
The operations were there before. They are general purpose. They have a defined encoding. I simply used them. In less than a minute i can fire up my vm debugger, assemble these instructions somewhere in memory, and trace them to see their effect.
If you call that a trick, we'd better stop this discussion right now.

You seem to view this word as something bad... while in return you are calling my idea a trick.

Quote:

Still more work than reading "normal" code.

until it is the new normal ...

Quote:

That was the point - the cpu can't know. But ok, if saving the log is no problem for you.

That's starting to become complicated, and i still do not see a benefit.

higher code density.

Quote:

In your starting post you said you wanted to know how useful your ideas would be for real assembler coders. You have your answer, even though it's not at all the one you wanted.

And that is fine.
You already said earlier:
"And as a coder i don't see the point in any of the 3, sorry."
And I did not object to that - I was just answering your questions and clarifying things.

So thank you for your feedback and for thinking about my ideas.

And since you are the only one who did, I guess there is no benefit for others as well.

robinsonb5 · 14 December 2020, 22:36

Quote:

Originally Posted by Gorf

And since you are the only one who did, I guess there is no benefit for others as well.

For what it's worth I did read the whole discussion, and have found it interesting even if I haven't had any useful input to add. I can't really be sure whether I'd find your additions useful without actually writing some real code - but it's always interesting to explore new ideas and revisit old assumptions.

On the FPGAs I'm targetting the block RAM size is typically 9kbit so a 16 x 32bit register file is quite wasteful; making use of those otherwise wasted bits would be interesting so your stack idea did pique my interest. A simpler idea might be simply to give the CPU a complete shadow register file for exception processing.

(In my own simplistic CPU project I used 8-bit opcodes in an effort to maximise code density. The result - for compiled C code at least - was in the same ballpark as i386 and m68k and significantly better than the various 32-bit RISC architectures - but the compressed RISC architectures did significantly better.)

Gorf · 14 December 2020, 23:06

Quote:

Originally Posted by robinsonb5

For what it's worth I did read the whole discussion, and have found it interesting even if I haven't had any useful input to add. I can't really be sure whether I'd find your additions useful without actually writing some real code - but it's always interesting to explore new ideas and revisit old assumptions.

Thank you!

Quote:

On the FPGAs I'm targetting the block RAM size is typically 9kbit so a 16 x 32bit register file is quite wasteful; making use of those otherwise wasted bits would be interesting so your stack idea did pique my interest. A simpler idea might be simply to give the CPU a complete shadow register file for exception processing.

I was playing with this idea as well...
There is a little-endian 68k FPGA implementation that holds up to 512 shadow registers for interrupts und tasks.

Depending on the RAM interface it might be enough to have one or two of these for interrupt handling and kernel.

But this is a feature, that does not really matter much to the coder of an application, only for OS development or real-time stuff.

Quote:

(In my own simplistic CPU project I used 8-bit opcodes in an effort to maximise code density. The result - for compiled C code at least - was in the same ballpark as i386 and m68k and significantly better than the various 32-bit RISC architectures - but the compressed RISC architectures did significantly better.)

SH-2 comes to mind

meynaf · 15 December 2020, 10:24

Quote:

Originally Posted by Gorf

no.

If that 'no' means "there is no extra bit to handle", then it implies "they are exchanged by value", and then exg has to move 128 bits around, not 64.
Actually, every write will move 32 bits more than before (instead of just changing a single bit which will say where to fetch the actual value).

Quote:

Originally Posted by Gorf

It's magic!

Only signal routing.

Quote:

Originally Posted by Gorf

You seem to view this word as something bad... while in return you are calling my idea a trick.

Then let's try to use other words.
Your ideas change the overall architecture of the cpu -- it's not as if it was just adding a new instruction. This has a big impact, and one that isn't easy to estimate.
In the meantime, a few extra instructions could give a bigger gain.

Quote:

Originally Posted by Gorf

until it is the new normal ...

Normal code is already hard enough to handle. No need to add another layer.
On the contrary, changes that enhance code readability are welcome.

Quote:

Originally Posted by Gorf

higher code density.

Higher than what ? Than bare 68k ? Not that hard. Try to beat me, now

Quote:

Originally Posted by Gorf

And since you are the only one who did, I guess there is no benefit for others as well.

I wouldn't bet on this. Others have different points of view.

Nevertheless i think you should experiment more with it. Document the full ISA. Write a fully featured VM, an assembler, a debugger. There is a lot to be learnt in the process.

alexh · 15 December 2020, 13:14

Slightly off topic but have you been watching the PiStorm project? It's a software defined 680x0 replacement that uses a RaspberryPi and a Carrier board to plug into a 68000 socket. Currently getting 040@25MHz speeds with an RPI3.

Because the CPU is software defined running on an ARM processor, you can add or manipulate 680x0 instructions very easily.

The project is very ambitious, with the intent to expose all SoC peripherals to the virtual 680x0 & AmigaOS. Mass storage and 32-bit FastRAM already working, Wifi etc in the works.

And it's cheap and open source

Gorf · 15 December 2020, 14:08

Quote:

Originally Posted by meynaf

If that 'no' means "there is no extra bit to handle", then it implies "they are exchanged by value", and then exg has to move 128 bits around, not 64.
Actually, every write will move 32 bits more than before (instead of just changing a single bit which will say where to fetch the actual value).

Still no!
Only 32 bits are written. The rest ist "only signal routing"

No really: is is just a state that gets flipped at every write access - in the simplest possible way, this would be a switch a flipflop determining which lines to use.
And no: we do not need an extra register and we do not need to preserve this state on context switch: the order in which the registers will be written back, determines this automatically.

Quote:

Only signal routing.

here you go!

It (EXG) is actually a very similar situation, but here you have no problem in seeing it as "simple".
Besides the actual wiring you need of course some logic as well, telling the CPU when and what to do if this command is issued ... but ok.

Quote:

Then let's try to use other words.
Your ideas change the overall architecture of the cpu -- it's not as if it was just adding a new instruction. This has a big impact, and one that isn't easy to estimate.
In the meantime, a few extra instructions could give a bigger gain.

sure - but this would be the "easy" way ... maybe the wrong word again... the obvious way to go ... and so everybody will go this way.

I think I came up with some not so obvious ideas, that could have a big impact - positiv and negative - on code and performance.

Quote:

Normal code is already hard enough to handle. No need to add another layer.
On the contrary, changes that enhance code readability are welcome.

Ok.
That's the kind of feedback I was looking for.

Maybe I can provide a tool - some editor plugin, to compensate this, and make these features more transparent.
The "repeat" feature could look for the coder just like the original instructions in a different color or a bracket around them...
All "pop" instructions would have a comment, pointing you to the last writing command on that register ...

Quote:

Higher than what ? Than bare 68k ? Not that hard. Try to beat me, now

Since my approach is more general: higher than all.

My ideas are not colliding with better instructions - both can be done simultaneously.

Gorf · 15 December 2020, 14:20

Quote:

Originally Posted by alexh

Slightly off topic but have you been watching the PiStorm project? It's a software defined 680x0 replacement that uses a RaspberryPi and a Carrier board to plug into a 68000 socket. Currently getting 040@25MHz speeds with an RPI3.

Because the CPU is software defined running on an ARM processor, you can add or manipulate 680x0 instructions very easily.

The project is very ambitious, with the intent to expose all SoC peripherals to the virtual 680x0 & AmigaOS. Mass storage and 32-bit FastRAM already working, Wifi etc in the works.

And it's cheap and open source

This is a ver interesting project - sadly I do not have a Amiga with 68000 socket - only my A3000 (which is much faster *g*).

But for CPU (or better VM design at this point) the PiStorm would have no benefit ... it would be a pain in the ass much more likely, to try this on real hardware.

And my ideas are definitely slowing things down in software, so some of the FPGA boards make much more sense here, once I try to implement this in hardware.

meynaf · 15 December 2020, 15:44

Quote:

Originally Posted by Gorf

Still no!
Only 32 bits are written. The rest ist "only signal routing"

No really: is is just a state that gets flipped at every write access - in the simplest possible way, this would be a switch a flipflop determining which lines to use.
And no: we do not need an extra register and we do not need to preserve this state on context switch: the order in which the registers will be written back, determines this automatically.

That makes the context saving process even more interesting, especially from implementation point of view...

Quote:

Originally Posted by Gorf

here you go!

It (EXG) is actually a very similar situation, but here you have no problem in seeing it as "simple".
Besides the actual wiring you need of course some logic as well, telling the CPU when and what to do if this command is issued ... but ok.

The exg is just another operation in the alu. It does not add any logic to register access - as long as you can write two registers with a single instruction. So yes, i have no problem in seeing it as "simple".
But in your case, you have 16 registers with the cost - and access time - of 32.

Quote:

Originally Posted by Gorf

sure - but this would be the "easy" way ... maybe the wrong word again... the obvious way to go ... and so everybody will go this way.

I think I came up with some not so obvious ideas, that could have a big impact - positiv and negative - on code and performance.

What to add and what not, has never been obvious, be it instructions or anything else. But, granted, if you don't want to go the same path everybody goes, you're in the right way.

Quote:

Originally Posted by Gorf

Ok.
That's the kind of feedback I was looking for.

Maybe I can provide a tool - some editor plugin, to compensate this, and make these features more transparent.
The "repeat" feature could look for the coder just like the original instructions in a different color or a bracket around them...
All "pop" instructions would have a comment, pointing you to the last writing command on that register ...

I prefer to start by having nothing to compensate. Matter of personal taste probably

Quote:

Originally Posted by Gorf

Since my approach is more general: higher than all.

For now your approach has lost the comparison - and by a fair amount.
But that's only a single case.
Time for another test maybe ?

What about a situation in which we're out of registers ? After all, this is the original goal of your shadow register idea, isn't it ?
As an example of this, a 8-bit c2p (or p2c, that's the same). 8 input pointers, 1 output pointer (or vice versa). 8 data, 1 loop counter, plus any used temporary.
Code is also relatively repetitive, (at least in theory) giving you opportunities for instruction reuse.
How would you fare in this example ?

Quote:

Originally Posted by Gorf

My ideas are not colliding with better instructions - both can be done simultaneously.

That depends. As long as they use some opcode space, it's space no longer available for anything else.

Gorf · 15 December 2020, 16:27

Quote:

Originally Posted by meynaf

Time for another test maybe ?

What about a situation in which we're out of registers ? After all, this is the original goal of your shadow register idea, isn't it ?
As an example of this, a 8-bit c2p (or p2c, that's the same). 8 input pointers, 1 output pointer (or vice versa). 8 data, 1 loop counter, plus any used temporary.
Code is also relatively repetitive, (at least in theory) giving you opportunities for instruction reuse.
How would you fare in this example ?

OK.
Post the routine here and I will apply my ideas to it.
My fist estimation would be, that only 4 registers will be needed for data.

meynaf · 15 December 2020, 16:45

Quote:

Originally Posted by Gorf

OK.
Post the routine here and I will apply my ideas to it.
My fist estimation would be, that only 4 registers will be needed for data.

Best source of c2p code here, pick your fave :
https://github.com/Kalmalyzer/kalms-c2p

Gorf · 15 December 2020, 19:50

Quote:

Originally Posted by meynaf

Best source of c2p code here, pick your fave :
https://github.com/Kalmalyzer/kalms-c2p

ok, I take the very first example in "normal".

Code:

	move.l	(a0)+,d0
	lsl.l	#4,d0
	or.l	(a0)+,d0
	move.l	(a0)+,d1
	lsl.l	#4,d1
	or.l	(a0)+,d1

	move.l	(a0)+,d2
	lsl.l	#4,d2
	or.l	(a0)+,d2
	move.l	(a0)+,d3
	lsl.l	#4,d3
	or.l	(a0)+,d3

	move.w	d2,d6
	move.w	d3,d7
	move.w	d0,d2
	move.w	d1,d3
	swap	d2
	swap	d3
	move.w	d2,d0
	move.w	d3,d1
	move.w	d6,d2
	move.w	d7,d3

	lsl.l	#2,d0
	lsl.l	#2,d1
	or.l	d2,d0
	or.l	d3,d1

becomes:

	move.l	(a0)+,d3
	lsl.l	#4,d3
	or.l	(a0)+,d3

	move.l	d3,d0
	rep3	-4,-3,-2 ; we repeat move.l (a0)+,d3 , lsl.l #4,d3 and or.l (a0)+,d3
	move.l	d3,d1
	rep3	-4,-3,-2
	move.l	d3,d2
	rep3	-4,-3,-2

	move.w	d2,d3	; we temporarily overwrite d3
	move.w	d0,d2
	swap	d2
	move.w	d2,d0
	pop.w	d3,d2	; we move.w d3,d2 and restore former d3
	move.w	d3,d2	; we temporarily overwrite d2
	move.w	d1,d3
	swap	d3
	move.w	d3,d1
	pop.w	d2,d3	; we move.w d2,d3 and restore former d2


	lsl.l	#2,d0
	lsl.l	#2,d1
	or.l	d2,d0
	or.l	d3,d1

So it is shorter and only uses 4 registers for data.
We could of course just keep the original routine in the upper part for speed, but I wanted to demonstrate the "repeat" feature here.

It introduces 3 move.l and would be slower without other mechanism in the CPU like OoO-execution ... except we might gain a few cycles by using already decoded instructions and fewer instruction fetches.

meynaf · 15 December 2020, 20:35

So that's 26 instructions becoming 24. Partial code so no pressure on registers. Sorry, but I'm not impressed.

Now look at how the 16-bit part of the merges can look like :

Code:

 swap d4
 swap d5
 swap d6
 swap d7
 exg.w d0,d4
 exg.w d1,d5
 exg.w d2,d6
 exg.w d3,d7

That's 8 registers merged in 8 instructions instead of 4 merged in 10.
(And no, i have not invented it right now. It's just an excerpt of my current, tested and working, vm c2p code.)

If you want an advice : keep it simple. As you told yourself in the title, your ideas are 'fancy'. Perhaps too much.
So instead of looking for original ideas, it might be better to search for a good mix of everything.

Gorf · 15 December 2020, 21:38

Quote:

Originally Posted by meynaf

So that's 26 instructions becoming 24. Partial code so no pressure on registers..

Now you are moving the goalpost...

I tried to save registers here, and showed how my idea can be used to do exactly that.

Quote:

That's 8 registers merged in 8 instructions instead of 4 merged in 10.

I expected very much something like this coming ... that's why I asked you for a piece of code in the first place.

You point me to code written for one ISA and I apply my ideas on that ... and then you say: look, on a complete different ISA it is even better.

Quote:

If you want an advice : keep it simple. As you told yourself in the title, your ideas are 'fancy'. Perhaps too much.
So instead of looking for original ideas, it might be better to search for a good mix of everything.

But that is besides the point of this thread, which its exactly about the mentioned ideas.
It is even totally besides my project, which is about trying unorthodox things.
You had me already convinced, that you do not find them useful or intriguing 10 posts ago...

Gorf · 15 December 2020, 23:10

I just hat a look at the second ham8 routine:

Code:

	swap	d0
	swap	d2
	move.l	d0,d1
	move.l	d2,d3
	lsl.l	#6,d0
	lsl.l	#6,d2
	lsr.w	#3,d0
	lsr.w	#3,d2
	lsr.b	#3,d0
	lsr.b	#3,d2
	lsl.l	#8,d0
	lsl.l	#8,d2
	move.b	d1,d0
	move.b	d3,d2
	swap	d1
	swap	d3
	move.l	d1,d5
	move.l	d3,d7
	lsl.l	#6,d1
	lsl.l	#6,d3
	lsr.w	#3,d1
	lsr.w	#3,d3
	lsr.b	#3,d1
	lsr.b	#3,d3
	lsl.l	#8,d1
	lsl.l	#8,d3
	move.b	d5,d1
	move.b	d7,d3

becomes
	swap	d0
	move.l	d0,d7
	lsl.l	#6,d7
	lsr.w	#3,d7
	lsr.b	#3,d7
	lsl.l	#8,d7
	pop.b	d0,d7
	exg.l	d0,d7
	rep2	-6,-5
	rep3	-6,-5,-4
	move.l 	d7,d1
	swap	d2
	move.l	d2,d7
	rep2	-8,-7
	rep2	-8,-7
	pop.b	d2,d7
	exg.l	d2,d7
	rep2	-6,-5
	rep3	-6,-5,-4
	move.l 	d7,d3

From 28 lines down to 20 and the code goes on like this for a while...

meynaf · 16 December 2020, 09:23

Quote:

Originally Posted by Gorf

Now you are moving the goalpost...

I tried to save registers here, and showed how my idea can be used to do exactly that.

I'm not the one moving the goalpost. The goal wasn't to "try to save registers". It was to rewrite the routine to see how register pressure is handled with your ISA.

Quote:

Originally Posted by Gorf

I expected very much something like this coming ... that's why I asked you for a piece of code in the first place.

You point me to code written for one ISA and I apply my ideas on that ... and then you say: look, on a complete different ISA it is even better.

With post #1 you were supposed to be defining a complete different ISA yourself too.
It does not matter for which ISA the code you're rewriting was originally for. I could just have shown some C code and asked you to "play compiler".
Actually, proper code density comparison should have included other cpu families as well, not just 68k.

Quote:

Originally Posted by Gorf

But that is besides the point of this thread, which its exactly about the mentioned ideas.
It is even totally besides my project, which is about trying unorthodox things.
You had me already convinced, that you do not find them useful or intriguing 10 posts ago...

Again, post #1 suggested you were redesigning the whole ISA. If all you want is to add things to 68k, you have to be more explicit about it.

Quote:

Originally Posted by Gorf

I just hat a look at the second ham8 routine:
(...)
From 28 lines down to 20 and the code goes on like this for a while...

Yes, but this is beating in code size some code designed for speed rather than size. Not exactly meaningful. I spotted 2 lines that can be removed without any change to the instruction set...

Anyway...
You know that with my instructions i can replace 4 blocks of move.l + lsl.l #6 + lsr.w #3 + lsr.b #3 each by a single instruction, going from 28 lines down to 16, and i didn't attempt to fully optimise it.
So why coming back for more ?

Bruce Abbott · 16 December 2020, 10:03

Quote:

Originally Posted by Thomas Richter

Pointless. Rather pointless. Don't we have the Vampire already for those that want to experiment with alternative 68K interpretations? Who will write software for that? I certainly won't.

You are right, it is rather pointless. 68k already has better code density than x86 and PPC, and it's easier to add RAM than make an 'extended' CPU.

Assembly language programmers tend to get frustrated when certain instructions take up more space or work less efficiently than expected, but 99% of the time it isn't that important. Most programs can be made smaller simply by thinking about different ways of doing things or even not doing them at all.

For example, trying to improve c2p code with 'extended' instructions. Better to embed an Akiko-like c2p converter into the machine, then you dramatically reduce code size, potentially do it much faster and have better compatibility.

The 68k in an Amiga is what it is. Coders should enjoy trying to squeeze the most out what they have rather than pining for something 'better'. When playing chess do you change the rules because a piece can't make a move you want? Of course not, and playing within the rules makes it more fun!

meynaf · 16 December 2020, 10:24

Quote:

Originally Posted by Bruce Abbott

You are right, it is rather pointless. 68k already has better code density than x86 and PPC, and it's easier to add RAM than make an 'extended' CPU.

68k is not always better than x86 in code density, PPC is among the worse, and code size is more important than just the space it takes in RAM.

Quote:

Originally Posted by Bruce Abbott

Assembly language programmers tend to get frustrated when certain instructions take up more space or work less efficiently than expected, but 99% of the time it isn't that important. Most programs can be made smaller simply by thinking about different ways of doing things or even not doing them at all.

Yeah, so we get frustrated and search for ways to make things better. There is nothing bad in this.
In my case it's not only about making the code smaller, but also easier to read and write. There is no progress if we just accept all limitations inflicted to us.

Quote:

Originally Posted by Bruce Abbott

For example, trying to improve c2p code with 'extended' instructions. Better to embed an Akiko-like c2p converter into the machine, then you dramatically reduce code size, potentially do it much faster and have better compatibility.

That would be right if the added instructions were targeted at doing the c2p instead of being general purpose and merely using the c2p as an example.

Quote:

Originally Posted by Bruce Abbott

The 68k in an Amiga is what it is. Coders should enjoy trying to squeeze the most out what they have rather than pining for something 'better'. When playing chess do you change the rules because a piece can't make a move you want? Of course not, and playing within the rules makes it more fun!

So because chess exists it becomes forbidden to invent new games with different rules ?

Gorf · 16 December 2020, 10:36

Quote:

Originally Posted by meynaf

I'm not the one moving the goalpost. The goal wasn't to "try to save registers". It was to rewrite the routine to see how register pressure is handled with your ISA.

So I asked for something you have in mind ... you pointed me to this repository and I took literally there first example.
I used only 4 registers instead of 6 in fewer lines.
Your reply to that:

"Partial code so no pressure on registers.."

ok .....

Quote:

With post #1 you were supposed to be defining a complete different ISA yourself too.

True.
But the 3 ideas, i mentioned here specifically, are quite ISA Independent - and they would not break any old code no matter of where they would be applied.

Quote:

It does not matter for which ISA the code you're rewriting was originally for. I could just have shown some C code and asked you to "play compiler".
Actually, proper code density comparison should have included other cpu families as well, not just 68k.

since this forum is 68k specific I did not want to go further off topic....

Quote:

Again, post #1 suggested you were redesigning the whole ISA. If all you want is to add things to 68k, you have to be more explicit about it.

again - my approach is more generic, but since this forum is for 68k and most coders here are used to this, I tried to keep it 68k-related for this discussion.

Quote:

Yes, but this is beating in code size some code designed for speed rather than size. Not exactly meaningful. I spotted 2 lines that can be removed without any change to the instruction set...

Anyway...
You know that with my instructions i can replace 4 blocks of move.l + lsl.l #6 + lsr.w #3 + lsr.b #3 each by a single instruction, going from 28 lines down to 16, and i didn't attempt to fully optimise it.
So why coming back for more ?

If you repeat this single instruction a couple of times, as you probably would in this case, you could probably make use of the "repeat"-feature as well and save even more ...

15 December 2020, 13:14	#27
alexh Thalion Webshrine Join Date: Jan 2004 Location: Oxford Posts: 14,448	Slightly off topic but have you been watching the PiStorm project? It's a software defined 680x0 replacement that uses a RaspberryPi and a Carrier board to plug into a 68000 socket. Currently getting 040@25MHz speeds with an RPI3. Because the CPU is software defined running on an ARM processor, you can add or manipulate 680x0 instructions very easily. The project is very ambitious, with the intent to expose all SoC peripherals to the virtual 680x0 & AmigaOS. Mass storage and 32-bit FastRAM already working, Wifi etc in the works. And it's cheap and open source Last edited by alexh; 15 December 2020 at 13:20.

15 December 2020, 20:35	#34
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,351	So that's 26 instructions becoming 24. Partial code so no pressure on registers. Sorry, but I'm not impressed. Now look at how the 16-bit part of the merges can look like : Code: swap d4 swap d5 swap d6 swap d7 exg.w d0,d4 exg.w d1,d5 exg.w d2,d6 exg.w d3,d7 That's 8 registers merged in 8 instructions instead of 4 merged in 10. (And no, i have not invented it right now. It's just an excerpt of my current, tested and working, vm c2p code.) If you want an advice : keep it simple. As you told yourself in the title, your ideas are 'fancy'. Perhaps too much. So instead of looking for original ideas, it might be better to search for a good mix of everything.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
68k & PPC CPU Usage monitor for OS3	ancalimon	support.Apps	1	29 June 2020 23:42
68k CPU pause (bubble)	kamelito	Coders. Asm / Hardware	9	27 January 2020 15:09
Bad weather for the 68K socket cpu cards	Solderbro	support.Hardware	0	14 July 2018 10:19
Looking to get max CPU performance in WinUAE 68k OS	GunnzAkimbo	support.WinUAE	1	12 May 2016 11:18
Apollo / Phoenix CISC CPUs m68k compatible	Snake79	News	3	05 March 2015 20:20

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)