Enhanced 68k ISA - Page 4

Mrs Beanbag · 12 August 2016, 19:59

Quote:

Originally Posted by meynaf

Is this puzzle/paradox solved now ?

I enjoy the journey, not the destination!

Quote:

PC doesn't have its low bit wired to zero. If you do a jump or a branch to an odd address, then you'll get an address error. Same if an RTS pops an odd value. These are the only causes of address errors on 020+, btw.

Yes this is correct, but is it the attempt to set bit zero that causes the error, or the attempt to actually execute from it? Because it certainly doesn't stay set for very long, as soon as the exception handler takes over it's back to an even address again. Unless the trap vector is also pointing to an odd address, in which case i don't know what happens...

Quote:

Bugs causing jumps to bogus addresses, which end up into data such as text, with all these 6x codes, is a common cause for 80000003 errors.
I've already warned about the use of this bit, which would break any program using an odd address branch to deliberately trigger an exception.

Yes, quite, but i did also ask, should we really be pandering to this sort of code?

Quote:

For a byte address, absolutely none. For a byte offset, a bit more. For byte data, a lot.
Remember that i am for more data uses for An registers - as i'm quite often out of data regs, but more rarely of address regs. I've even used address regs to represent R,G,B values

If you don't like the data/address register split, you should understand the use for this quite easily.

It is a noble goal for sure, and were i to design an ISA from scratch to be source-code compatible with 68k i would certainly try to avoid the register split, but working around existing 68k encodings to achieve this can get a bit messy.

Just consider for instance, LEA d8(An,Rn),An...
Now the Rn can be a An or a Dn, and the bit that selects it can be considered the high bit in a four-bit register field. So far so good. So what about LEA to Dn? There, the A/D bit has to be on the other side of the register... in other words it still looks like a 4 bit register field but with the bits in a different order.

And that's before you get onto the possibility of "Data register indirect" addressing modes, for which there is just not enough encoding space.

Anyway it certainly involves "special cases" to handle unsigned byte offsets/data. Short branches use signed byte offsets, as do the d8(An,Rn) addressing modes mentioned earlier, and even the venerable moveq.l #n,Dn sign extends its byte data.

EDIT: also of course we already have the "special case" of the stack pointer (A7), which increments and decrements by 2 instead of the usual 1 using (A7)+/-(A7) on byte sized operations.

meynaf · 12 August 2016, 21:07

Quote:

Originally Posted by Mrs Beanbag

Yes this is correct, but is it the attempt to set bit zero that causes the error, or the attempt to actually execute from it?

If we trace a JMP to an odd address the exception occurs immediately.
But does it matter ?

Quote:

Originally Posted by Mrs Beanbag

Because it certainly doesn't stay set for very long, as soon as the exception handler takes over it's back to an even address again.

Yes but regardless of how long the PC can stay odd (if at all), a branch going to an odd address is kinda valid in some way.

Quote:

Originally Posted by Mrs Beanbag

Unless the trap vector is also pointing to an odd address, in which case i don't know what happens...

Double fault. The cpu stops until it gets reset signal.

Quote:

Originally Posted by Mrs Beanbag

Yes, quite, but i did also ask, should we really be pandering to this sort of code?

Should we remain compatible with existing programs doing valid things ?

Quote:

Originally Posted by Mrs Beanbag

It is a noble goal for sure, and were i to design an ISA from scratch to be source-code compatible with 68k i would certainly try to avoid the register split, but working around existing 68k encodings to achieve this can get a bit messy.

Just consider for instance, LEA d8(An,Rn),An...
Now the Rn can be a An or a Dn, and the bit that selects it can be considered the high bit in a four-bit register field. So far so good. So what about LEA to Dn? There, the A/D bit has to be on the other side of the register... in other words it still looks like a 4 bit register field but with the bits in a different order.

And that's before you get onto the possibility of "Data register indirect" addressing modes, for which there is just not enough encoding space.

Anyway it certainly involves "special cases" to handle unsigned byte offsets/data. Short branches use signed byte offsets, as do the d8(An,Rn) addressing modes mentioned earlier, and even the venerable moveq.l #n,Dn sign extends its byte data.

Actually i kinda like the register split. It's just that it's too strict like it is. I wouldn't do dirty things just to remove it (which can't be done - at least not fully - without important code density losses, btw).

Quote:

Originally Posted by Mrs Beanbag

EDIT: also of course we already have the "special case" of the stack pointer (A7), which increments and decrements by 2 instead of the usual 1 using (A7)+/-(A7) on byte sized operations.

That's not a clever thing and no consideration other than 68000 compatibility can justify it.

Mrs Beanbag · 12 August 2016, 21:45

Quote:

Originally Posted by meynaf

If we trace a JMP to an odd address the exception occurs immediately.
But does it matter ?

it matters for the point i'm making, which is you could just wire bit 0 of the PC to zero, and then any branch to an odd address would just round down. No special treatment of odd PC-relative offsets would be required in the hardware to deal with this case.

i'm not saying it would be a good idea. Just that it wouldn't be very difficult or complicated.

Quote:

Yes but regardless of how long the PC can stay odd (if at all), a branch going to an odd address is kinda valid in some way.

Should we remain compatible with existing programs doing valid things ?

Is it "valid" or is it "kinda valid in some way"? This seems like something of a subjective question. "It works" is not the same thing as "it's valid". I've worked with engineers before who didn't bother to read the documentation and just made things work, and then we fix a bug and it doesn't work any more, and then they phone me up and complain and i have to explain to them why they should have done what it says in the instructions.

Yeah there are some cases where it's kind of our fault if that happens, because we should have been more strict in what we would accept and throw an exception or something otherwise, and then they'd know they'd done it wrong. But in this case we're in a pickle because the programmer threw an exception on purpose in order to achieve what they wanted.

Quote:

Actually i kinda like the register split. It's just that it's too strict like it is. I wouldn't do dirty things just to remove it (which can't be done - at least not fully - without important code density losses, btw).

Yeah sometimes when i consider having 32 registers instead of 16, i end up wondering about bringing another split back in, but i dunno..

it has architectural advantages as well, because two single-port register files might be simpler to implement than a dual-port register file, so instructions like move (A0)+,D0 can write both results simultaneously. So i kind of waver in my support of it.

matthey · 12 August 2016, 22:08

Quote:

Originally Posted by meynaf

It could just have been :

Code:

 bfextu d0{16:16},d0
 ffo d0

This code doesn't look performance critical. D0 could be extended before (at no cost if it's not in a loop).
You're using bfffo because it's there, but you wouldn't have asked for it if it weren't.
Anyway we have it now, so unless we go the incompatible way this is useless talk.

You could use MVZ.W Dn,Dn on the CF but I believe you are going to need to subtract those first 16 zeros.

Code:

  mvz.w d0,d0
  ff1 d0
  sub.l #16,d0 ; 6 bytes without OP.L #data.w,Dn

It would be possible to do a SWAP+FF1 but then we need to take care of the case where D0=0. Yea, I'm using BFFFO because its there and exactly what I need as demonstrated by the single instruction. Time critical? It is a support library so it depends on the use.

Quote:

Originally Posted by meynaf

Branches consist of 10% of overall instructions. You can't avoid them. So it's better to concentrate the resources on their implementation, rather than waste silicon on ways to avoid them.

It is not how frequent the instruction but how big of a problem it is. Let's look at the case of one simple missed branched.

68060
IPC=1.3
pipe length=8
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 1.3 * 8 = 10.4 instructions on average
This is 10.4 * 3 = 31.2 bytes of code fetched and cached
This is 31.2 / 16 = 2 ICache lines replaced

The Apollo-core may have an IPC of 3+ and a deeper pipeline.
IPC=3
pipe length=10
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 3 * 10 = 30 instructions on average
This is 30 * 3 = 90 bytes of code fetched and cached
This is 90 / 16 = 6 ICache lines replaced

There is also the delay in cycles of the pipe length and any data accessed would go into the DCache. We will do this misprediction twice before we turn around a 2bit saturating prediction. I hope we can quickly see that the cost of those little branches is much higher when they are mispredicted. Advanced processors are not like the 68020/68030 anymore.

Quote:

Originally Posted by meynaf

That's not something the average coder will do and if that "general support" doesn't have the hint bit already, you can't count on GCC people to add it.

That "general support" does have support for a branch hint bit. My evaluation ISA hint bit works the same way as the PPC hint bit which can still be set for those processors which support it. Of course you need a standard and real processors supporting it before the GCC folks would add support. The average coder doesn't use profiling but it is a high level optimizing tool which most compilers have and it is relatively easy to use. More programmers should use it as it can optimize inlining, loops, branches (as much as possible for the particular CPU), code relativity, cache efficiency, etc. The results are relative to how much can be done for that particular CPU and how easy the support is. I want to give compilers the tools they need rather than just complaining about the bloat they produce.

Quote:

Originally Posted by meynaf

So you don't see the relative branch with bit #0 special vs d16(pc) inconsistency as dirty ? Personnally i do.
In addition i don't differentiate ugly and dirty. For me if it's ugly, it's dirty.

No. I don't see the hint bit as dirty. I don't think it slows anything down and it is normally not used. Some people may consider it ugly but they don't have to set it and it is the same as not having it. There are better ways to trap since the 68020 than setting an odd PC.

Quote:

Originally Posted by meynaf

Anything that adds special cases is to be avoided if possible.

Everything adds special cases. It is important to avoid the big tables/muxes and decoding info not in the first word. This is what I consider dirty.

Quote:

Originally Posted by meynaf

By having SELcc move the condition field in an unusual position you create a special case.

I should rework the SELcc encoding after I changed BScc. It is low priority as nobody is likely to use the ISA or do anything with it anyway.

Quote:

Originally Posted by meynaf

By adding the hint bit you change the way PC-relative displacements are interpreted but not always.
By adding an addressing mode for short displacements you add a mode that's only valid in a few cases (i see the regular immediate addr mode as a bad choice as well).

So do you consider the EA #data immediate as ugly so dirty since there is no difference? Does that make the whole 68k dirty then? If the original 68k was dirty then I can't ruin it by being dirty?

meynaf · 12 August 2016, 22:37

Quote:

Originally Posted by Mrs Beanbag

it matters for the point i'm making, which is you could just wire bit 0 of the PC to zero, and then any branch to an odd address would just round down. No special treatment of odd PC-relative offsets would be required in the hardware to deal with this case.

i'm not saying it would be a good idea. Just that it wouldn't be very difficult or complicated.

IIRC several RISC cpus do it this way.

Quote:

Originally Posted by Mrs Beanbag

Is it "valid" or is it "kinda valid in some way"? This seems like something of a subjective question. "It works" is not the same thing as "it's valid". I've worked with engineers before who didn't bother to read the documentation and just made things work, and then we fix a bug and it doesn't work any more, and then they phone me up and complain and i have to explain to them why they should have done what it says in the instructions.

Yeah there are some cases where it's kind of our fault if that happens, because we should have been more strict in what we would accept and throw an exception or something otherwise, and then they'd know they'd done it wrong. But in this case we're in a pickle because the programmer threw an exception on purpose in order to achieve what they wanted.

Well, the 68k manual doesn't state we shouldn't throw exceptions on purpose.
So for me bit #0 of a branch isn't "free". And that's all.

Quote:

Originally Posted by matthey

You could use MVZ.W Dn,Dn on the CF but I believe you are going to need to subtract those first 16 zeros.

Code:

  mvz.w d0,d0
  ff1 d0
  sub.l #16,d0 ; 6 bytes without OP.L #data.w,Dn

It would be possible to do a SWAP+FF1 but then we need to take care of the case where D0=0. Yea, I'm using BFFFO because its there and exactly what I need as demonstrated by the single instruction. Time critical? It is a support library so it depends on the use.

Bit position in a register does not depend on the bit-field position, e.g. if you do BFFFO D0{16:16},D0 with D0=1 you will get D0=31, not 15.

Quote:

Originally Posted by matthey

It is not how frequent the instruction but how big of a problem it is. Let's look at the case of one simple missed branched.

68060
IPC=1.3
pipe length=8
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 1.3 * 8 = 10.4 instructions on average
This is 10.4 * 3 = 31.2 bytes of code fetched and cached
This is 31.2 / 16 = 2 ICache lines replaced

The Apollo-core may have an IPC of 3+ and a deeper pipeline.
IPC=3
pipe length=10
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 3 * 10 = 30 instructions on average
This is 30 * 3 = 90 bytes of code fetched and cached
This is 90 / 16 = 6 ICache lines replaced

There is also the delay in cycles of the pipe length and any data accessed would go into the DCache. We will do this misprediction twice before we turn around a 2bit saturating prediction. I hope we can quickly see that the cost of those little branches is much higher when they are mispredicted. Advanced processors are not like the 68020/68030 anymore.

Here you make the assumption that the mispredicted branch goes to code that's not in the cache. This is true only if it's executed for the first time, and in that case, its speed doesn't really matter.

And the misprediction will occur... twice. So even if it adds 100 clocks, that'll be 100 clocks added to the overall program execution. Big deal.

Quote:

Originally Posted by matthey

That "general support" does have support for a branch hint bit. My evaluation ISA hint bit works the same way as the PPC hint bit which can still be set for those processors which support it. Of course you need a standard and real processors supporting it before the GCC folks would add support. The average coder doesn't use profiling but it is a high level optimizing tool which most compilers have and it is relatively easy to use. More programmers should use it as it can optimize inlining, loops, branches (as much as possible for the particular CPU), code relativity, cache efficiency, etc. The results are relative to how much can be done for that particular CPU and how easy the support is. I want to give compilers the tools they need rather than just complaining about the bloat they produce.

The PPC hint bit got dropped, remember. And for quite valid reasons. Do you really want to make the same mistakes other made before you ?

Quote:

Originally Posted by matthey

No. I don't see the hint bit as dirty. I don't think it slows anything down and it is normally not used. Some people may consider it ugly but they don't have to set it and it is the same as not having it. There are better ways to trap since the 68020 than setting an odd PC.

Since the 68020 ok, but there is a lot of 68000 code around there.
Bit #0 isn't free.

Quote:

Originally Posted by matthey

Everything adds special cases. It is important to avoid the big tables/muxes and decoding info not in the first word. This is what I consider dirty.

Dirty things are also things that got added without enough thinking and cause problems in next generations of the cpu.
Or things that "reuse" a bit that wasn't previously free.

Quote:

Originally Posted by matthey

I should rework the SELcc encoding after I changed BScc. It is low priority as nobody is likely to use the ISA or do anything with it anyway.

Ok. Let me know of your new encoding when you have it.

Quote:

Originally Posted by matthey

So do you consider the EA #data immediate as ugly so dirty since there is no difference? Does that make the whole 68k dirty then? If the original 68k was dirty then I can't ruin it by being dirty?

I consider the EA #immediate as a bad choice. Does a single bad choice ruin the whole architecture ? No.
If your car has a small scratch, do you take a hammer and add more, just because it's not exactly in mint condition and so you can't ruin it ?

Mrs Beanbag · 13 August 2016, 20:07

Quote:

Originally Posted by meynaf

Well, the 68k manual doesn't state we shouldn't throw exceptions on purpose.
So for me bit #0 of a branch isn't "free". And that's all.

Does it not also throw an exception for unused opcodes? In which case we're rather stuck for adding anything, if throwing "illegal instruction" exceptions on purpose is also valid.

Although on balance, i think i agree with you on leaving this bit be.

Quote:

Here you make the assumption that the mispredicted branch goes to code that's not in the cache. This is true only if it's executed for the first time, and in that case, its speed doesn't really matter.

if the instruction stream is read into a buffer a few cycles before execution, the cache could be strobed in advance of actually taking the branch. Especially if it has to wait on the condition code from the previous instruction anyway.

Btw Matthey how did the team manage to get 3 IPC? Is it many-way superscalar or is this thanks to opcode fusion &c?

meynaf · 13 August 2016, 20:18

Quote:

Originally Posted by Mrs Beanbag

Does it not also throw an exception for unused opcodes? In which case we're rather stuck for adding anything, if throwing "illegal instruction" exceptions on purpose is also valid.

Deliberate throwing the illegal opcode exception is perfectly valid but there is a specific opcode for doing that (aka $4AFC).

Mrs Beanbag · 13 August 2016, 20:27

Quote:

Originally Posted by meynaf

Deliberate throwing the illegal opcode exception is perfectly valid but there is a specific opcode for doing that (aka $4AFC).

yes, ok, but why would someone deliberately run an odd address branch instead of ILLEGAL? if they just want to get into supervisor mode without going through the OS call for some reason, it seems a much more obvious way. One does not need a 68020.

Plus 68020 already broke some old code by not throwing exceptions on odd data accesses! So Motorola seems to think that wasn't such a legitimate technique.

meynaf · 13 August 2016, 20:47

Quote:

Originally Posted by Mrs Beanbag

yes, ok, but why would someone deliberately run an odd address branch instead of ILLEGAL? if they just want to get into supervisor mode without going through the OS call for some reason, it seems a much more obvious way. One does not need a 68020.

Illegal exception isn't the same as odd address exception. And in one instruction you can test the condition and call the exception ; ILLEGAL isn't a conditional instruction.

Quote:

Originally Posted by Mrs Beanbag

Plus 68020 already broke some old code by not throwing exceptions on odd data accesses! So Motorola seems to think that wasn't such a legitimate technique.

They didn't have much choice. The 68020 is 32-bit so "odd address" becomes meaningless, and trapping on all misaligned accesses would have broken 90% of existing programs.

Mrs Beanbag · 13 August 2016, 20:54

Quote:

Originally Posted by meynaf

Illegal exception isn't the same as odd address exception. And in one instruction you can test the condition and call the exception ; ILLEGAL isn't a conditional instruction.

True. There is TRAPcc though...

Quote:

They didn't have much choice. The 68020 is 32-bit so "odd address" becomes meaningless, and trapping on all misaligned accesses would have broken 90% of existing programs.

But they COULD have maintained exception behaviour on odd addresses the same as before, while allowing non-longword-aligned addresses. They don't *need* to anymore, and as such it might not be very meaningful. But they could have, for the sake of backwards compatibility, if they'd thought it was a legitimate thing to do on purpose.

meynaf · 13 August 2016, 22:00

Quote:

Originally Posted by Mrs Beanbag

True. There is TRAPcc though...

I recall you that TRAPcc doesn't work on 68000.

Quote:

Originally Posted by Mrs Beanbag

But they COULD have maintained exception behaviour on odd addresses the same as before, while allowing non-longword-aligned addresses. They don't *need* to anymore, and as such it might not be very meaningful. But they could have, for the sake of backwards compatibility, if they'd thought it was a legitimate thing to do on purpose.

Sure they could have built a cpu that supports some misaligned accesses. However it would have looked ridiculous, and very confusing (i even dare to say total crazy).
They did the right choice and it's good for coding flexibility, code density, and even sometimes performance. It was really worth the limited compatibility issue.

It's always a matter of trade-offs. Inability of the 68000 to do misaligned accesses was a real pain. Not many programs trigger address errors on purpose ; in fact i haven't found any. So the change is ok. On the other hand, even a very small compatibility threat for just a near useless branch hint bit, isn't worth.

Mrs Beanbag · 13 August 2016, 22:17

Quote:

Originally Posted by meynaf

I recall you that TRAPcc doesn't work on 68000.

oh yeah

Quote:

Sure they could have built a cpu that supports some misaligned accesses. However it would have looked ridiculous, and very confusing (i even dare to say total crazy).
They did the right choice and it's good for coding flexibility, code density, and even sometimes performance. It was really worth the limited compatibility issue.

It's always a matter of trade-offs. Inability of the 68000 to do misaligned accesses was a real pain. Not many programs trigger address errors on purpose ; in fact i haven't found any. So the change is ok. On the other hand, even a very small compatibility threat for just a near useless branch hint bit, isn't worth.

but they DID build a CPU that supports some misaligned accesses. It doesn't support misaligned code accesses. Granted nobody cares!

But indeed... it is the uselessness of the branch hint bit that swings it for me. I don't really care so much that a few old demos and games won't work. Tbh i would be willing to compromise everything in supervisor mode and let the operating system be recompiled if it would help, i only care about compatibility with user mode software, anything that trashes the OS in order to run is something i'd prefer not to run on a computer with too much power!

Megol · 15 August 2016, 20:21

My post got eaten. Short version:

Hint bits are not worth it in a modern processor. They can be worth it in simple processors or to shave a few clocks from run-once code (exceptions etc.).

The bit can be used for better things, my version:
0000 0000 -> 16 bit displacement
1111 1111 -> 32 bit displacement
0000 0001 -> 64 bit displacement
xxxx xxx0 -> normal 8 bit displacement
xxxx xxx1 -> available for extension

Any incompatible treatment of the LSb should be disabled by default and enabled by the OS if needed.

meynaf · 16 August 2016, 10:03

What the heck could be the use for 64 bit displacement ?

Programs are never that large !

Megol · 16 August 2016, 12:44

Quote:

Originally Posted by meynaf

What the heck could be the use for 64 bit displacement ?

Programs are never that large !

Mostly because I like symmetry/orthogonality.

Mrs Beanbag · 17 August 2016, 23:10

Quote:

Originally Posted by Megol

The bit can be used for better things, my version:
0000 0000 -> 16 bit displacement
1111 1111 -> 32 bit displacement
0000 0001 -> 64 bit displacement
xxxx xxx0 -> normal 8 bit displacement
xxxx xxx1 -> available for extension

64 bit displacements!

But seriously... when i look at this little table it makes me realise something else about why using Bit 0 as a hint bit (or indeed anything else) is really dirty...

Because $FE is a legitimate 8-bit branch. So then how do you put a hint bit on it? Then it becomes $FF which means a 32-bit branch...

Then again $FE would be a branch to itself, causing a total lock-up, so maybe let's not use that anyway.

meynaf · 18 August 2016, 07:34

A branch with $FE is likely to be a quick'n'dirty error handling, so it has better be predicted as not taken. Hey, wait -- it's a backward branch so default would be taken and it needs the hint bit to reverse that...

So now we have :
- $FE unable to get the hint bit (and always mispredicted)
- $01 useless
- BRA and BSR not using the hint bit
- the hint bit moves away in case of larger branches
Who told about 'special cases' ?

matthey · 18 August 2016, 19:22

Quote:

Originally Posted by Megol

Hint bits are not worth it in a modern processor. They can be worth it in simple processors or to shave a few clocks from run-once code (exceptions etc.).

Is a <150MHz FPGA processor a modern processor? I agree that there is less advantage to a branch hint bit as the prediction technique gets better but high end processors have gshare/adaptive dynamic prediction (there is a several cycle latency with these techniques so it still could be useful in a few cases). More likely for 68k processors is 2 bit saturating or no dynamic prediction where a hint bit would be "useful" if faster executing code is considered "useful". It would be helpful to see some timed results of a variety of code executing with no dynamic prediction and with a hybrid 2 bit saturating with hint bit but everyone has already decided what is "useful". The cost of a hint bit on the 68k is so high that it is practically free but lets reject it as not "useful" even though most research shows a branch prediction improvement for practically free.

Quote:

Originally Posted by Megol

The bit can be used for better things, my version:
0000 0000 -> 16 bit displacement
1111 1111 -> 32 bit displacement
0000 0001 -> 64 bit displacement
xxxx xxx0 -> normal 8 bit displacement
xxxx xxx1 -> available for extension

Any incompatible treatment of the LSb should be disabled by default and enabled by the OS if needed.

We are talking of branches in code here where a 32 bit displacement already gives +-4 GB displacement. If all the code of the AmigaOS was combined into one executable it would probably not exceed this. If all the code which was released for cost on the Amiga in the first year was combined into one executable it would probably not exceed this. I don't think it would be a problem to divide code up into sections this small even if 64 bit addressing existed. The problem with PC relative addressing is not the branch displacement range but the code density deterioration and use of an expensive addressing mode because of a lack of range in PC relative addressing modes (beyond +-32kB). This is where the 2 new PC relative addressing modes proposed could practically eliminate both problems. Even then, 64 bit PC relative displacements are not necessary and would likely never be used. Code is much smaller than data even on modern processors.

Quote:

Originally Posted by meynaf

A branch with $FE is likely to be a quick'n'dirty error handling, so it has better be predicted as not taken. Hey, wait -- it's a backward branch so default would be taken and it needs the hint bit to reverse that...

So now we have :
- $FE unable to get the hint bit (and always mispredicted)
- $01 useless

This is not a problem for real code. Why not blame the 68k ISA creators for making useless encodings too?

Quote:

Originally Posted by meynaf

- BRA and BSR not using the hint bit

I documented the least significant bit of the displacement as reserved for these instructions. The CPU handles these different than Bcc even if the encoding is similar.

Quote:

Originally Posted by meynaf

- the hint bit moves away in case of larger branches
Who told about 'special cases' ?

The CPU decodes into a 32 bit displacement as quickly as possible. The 68060 does not have a cycle penalty for a Bcc.W or Bcc.L so this does not appear to be a problem. The hint bit would be available at the same time as the displacement.

I wish to reduce my time spent posting here. I do not view this thread as productive and the ISA is dead end anyway. It only has historical significance as these were the ideas which the "Apollo non-Team" came up with and were discarded with minimal evaluation before a 68k+MMX bolt-on was decided by Gunnar. RIP 68k.

meynaf · 18 August 2016, 20:46

Quote:

Originally Posted by matthey

Is a <150MHz FPGA processor a modern processor?

In some way, yes.

Quote:

Originally Posted by matthey

It would be helpful to see some timed results of a variety of code executing with no dynamic prediction and with a hybrid 2 bit saturating with hint bit but everyone has already decided what is "useful".

You know full well what these results would say. Comparing 2 bit saturating with and without the hint bit, is quite easy : at best a few clocks for the two first iterations (if the hint bit isn't simply completely off, a likely situation for compiled code), and then nothing at all. Optimizing is about the needs of the many - not the needs of the few.

Quote:

Originally Posted by matthey

The cost of a hint bit on the 68k is so high that it is practically free but lets reject it as not "useful" even though most research shows a branch prediction improvement for practically free.

Special cases in encoding, assemblers and disassemblers, aren't what i call "practically free". Don't play your gunnar, there are costs aside of the implementation.

Quote:

Originally Posted by matthey

This is not a problem for real code. Why not blame the 68k ISA creators for making useless encodings too?

Useless encodings come from the fact special cases are costly to handle in comparison to the small benefit, no more no less. But this is not the point. I will always be against frankenstein-like stuff that's added for the mere sake of 'speed'.

Quote:

Originally Posted by matthey

I documented the least significant bit of the displacement as reserved for these instructions. The CPU handles these different than Bcc even if the encoding is similar.

Here the whole Bcc area is split between two cases. How they are handled internal to the cpu, is irrelevant.

Quote:

Originally Posted by matthey

The CPU decodes into a 32 bit displacement as quickly as possible. The 68060 does not have a cycle penalty for a Bcc.W or Bcc.L so this does not appear to be a problem. The hint bit would be available at the same time as the displacement.

The displacement is useless if the branch is "not taken". Unsure this fact is usable or not, nevertheless i wouldn't risk this for a hint bit.

Quote:

Originally Posted by matthey

I wish to reduce my time spent posting here. I do not view this thread as productive and the ISA is dead end anyway. It only has historical significance as these were the ideas which the "Apollo non-Team" came up with and were discarded with minimal evaluation before a 68k+MMX bolt-on was decided by Gunnar. RIP 68k.

So long then, it has been fun

The 68k remains the best existing ISA, even if you consider it dead.
Gunnar has changed his mind when facing the gruesome facts for the emulation library and it seems current version does not have this silly c2p/pixmerge stuff he once wanted to add (and not even additional data registers). He is likely to change again when seeing the uselessness of what he has added.

Mrs Beanbag · 18 August 2016, 21:33

Quote:

Originally Posted by matthey

Is a <150MHz FPGA processor a modern processor?

any given design could be run at fast or slow clock speeds in principle so this is not a very good critereon. It's modern if it uses modern ideas. FPGAs will get faster and we could even see homebrew silicon wafers in the future. Instructions per clock is a better measure. Although even that is open to debate because not all modern applications demand speed.

So. Imho there are better ways to mitigate branch penalties than explicit hint bits, which are alien to 68k architecture.

i'd rather we started thinking outside the box a bit more.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
BOOM (DOOM Enhanced) port to 68k	NovaCoder	News	155	05 May 2023 12:26
ISA Ethernet Cards	jmmijo	support.Hardware	13	03 February 2015 11:04
Any ISA Mach64 Information?	CU_AMiGA	support.Hardware	21	09 September 2007 22:17
Help converting an 8bit ISA slot to 16bit ISA slot	Smiley	support.Hardware	4	25 April 2006 11:20
A2000 ISA slots	Unknown_K	support.Hardware	1	20 March 2005 09:48

15 August 2016, 20:21	#73
Megol Registered User Join Date: May 2014 Location: inside the emulator Posts: 377	My post got eaten. Short version: Hint bits are not worth it in a modern processor. They can be worth it in simple processors or to shave a few clocks from run-once code (exceptions etc.). The bit can be used for better things, my version: 0000 0000 -> 16 bit displacement 1111 1111 -> 32 bit displacement 0000 0001 -> 64 bit displacement xxxx xxx0 -> normal 8 bit displacement xxxx xxx1 -> available for extension Any incompatible treatment of the LSb should be disabled by default and enabled by the OS if needed.

16 August 2016, 10:03	#74
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	What the heck could be the use for 64 bit displacement ? Programs are never that large !

18 August 2016, 07:34	#77
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	A branch with $FE is likely to be a quick'n'dirty error handling, so it has better be predicted as not taken. Hey, wait -- it's a backward branch so default would be taken and it needs the hint bit to reverse that... So now we have : - $FE unable to get the hint bit (and always mispredicted) - $01 useless - BRA and BSR not using the hint bit - the hint bit moves away in case of larger branches Who told about 'special cases' ?

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)