English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 12 August 2016, 20:59   #61
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by meynaf View Post
Is this puzzle/paradox solved now ?
I enjoy the journey, not the destination!

Quote:
PC doesn't have its low bit wired to zero. If you do a jump or a branch to an odd address, then you'll get an address error. Same if an RTS pops an odd value. These are the only causes of address errors on 020+, btw.
Yes this is correct, but is it the attempt to set bit zero that causes the error, or the attempt to actually execute from it? Because it certainly doesn't stay set for very long, as soon as the exception handler takes over it's back to an even address again. Unless the trap vector is also pointing to an odd address, in which case i don't know what happens...

Quote:
Bugs causing jumps to bogus addresses, which end up into data such as text, with all these 6x codes, is a common cause for 80000003 errors.
I've already warned about the use of this bit, which would break any program using an odd address branch to deliberately trigger an exception.
Yes, quite, but i did also ask, should we really be pandering to this sort of code?

Quote:
For a byte address, absolutely none. For a byte offset, a bit more. For byte data, a lot.
Remember that i am for more data uses for An registers - as i'm quite often out of data regs, but more rarely of address regs. I've even used address regs to represent R,G,B values
If you don't like the data/address register split, you should understand the use for this quite easily.
It is a noble goal for sure, and were i to design an ISA from scratch to be source-code compatible with 68k i would certainly try to avoid the register split, but working around existing 68k encodings to achieve this can get a bit messy.

Just consider for instance, LEA d8(An,Rn),An...
Now the Rn can be a An or a Dn, and the bit that selects it can be considered the high bit in a four-bit register field. So far so good. So what about LEA to Dn? There, the A/D bit has to be on the other side of the register... in other words it still looks like a 4 bit register field but with the bits in a different order.

And that's before you get onto the possibility of "Data register indirect" addressing modes, for which there is just not enough encoding space.

Anyway it certainly involves "special cases" to handle unsigned byte offsets/data. Short branches use signed byte offsets, as do the d8(An,Rn) addressing modes mentioned earlier, and even the venerable moveq.l #n,Dn sign extends its byte data.

EDIT: also of course we already have the "special case" of the stack pointer (A7), which increments and decrements by 2 instead of the usual 1 using (A7)+/-(A7) on byte sized operations.

Last edited by Mrs Beanbag; 12 August 2016 at 21:15.
Mrs Beanbag is offline  
Old 12 August 2016, 22:07   #62
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
Quote:
Originally Posted by Mrs Beanbag View Post
Yes this is correct, but is it the attempt to set bit zero that causes the error, or the attempt to actually execute from it?
If we trace a JMP to an odd address the exception occurs immediately.
But does it matter ?


Quote:
Originally Posted by Mrs Beanbag View Post
Because it certainly doesn't stay set for very long, as soon as the exception handler takes over it's back to an even address again.
Yes but regardless of how long the PC can stay odd (if at all), a branch going to an odd address is kinda valid in some way.


Quote:
Originally Posted by Mrs Beanbag View Post
Unless the trap vector is also pointing to an odd address, in which case i don't know what happens...
Double fault. The cpu stops until it gets reset signal.


Quote:
Originally Posted by Mrs Beanbag View Post
Yes, quite, but i did also ask, should we really be pandering to this sort of code?
Should we remain compatible with existing programs doing valid things ?


Quote:
Originally Posted by Mrs Beanbag View Post
It is a noble goal for sure, and were i to design an ISA from scratch to be source-code compatible with 68k i would certainly try to avoid the register split, but working around existing 68k encodings to achieve this can get a bit messy.

Just consider for instance, LEA d8(An,Rn),An...
Now the Rn can be a An or a Dn, and the bit that selects it can be considered the high bit in a four-bit register field. So far so good. So what about LEA to Dn? There, the A/D bit has to be on the other side of the register... in other words it still looks like a 4 bit register field but with the bits in a different order.

And that's before you get onto the possibility of "Data register indirect" addressing modes, for which there is just not enough encoding space.

Anyway it certainly involves "special cases" to handle unsigned byte offsets/data. Short branches use signed byte offsets, as do the d8(An,Rn) addressing modes mentioned earlier, and even the venerable moveq.l #n,Dn sign extends its byte data.
Actually i kinda like the register split. It's just that it's too strict like it is. I wouldn't do dirty things just to remove it (which can't be done - at least not fully - without important code density losses, btw).


Quote:
Originally Posted by Mrs Beanbag View Post
EDIT: also of course we already have the "special case" of the stack pointer (A7), which increments and decrements by 2 instead of the usual 1 using (A7)+/-(A7) on byte sized operations.
That's not a clever thing and no consideration other than 68000 compatibility can justify it.
meynaf is offline  
Old 12 August 2016, 22:45   #63
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by meynaf View Post
If we trace a JMP to an odd address the exception occurs immediately.
But does it matter ?
it matters for the point i'm making, which is you could just wire bit 0 of the PC to zero, and then any branch to an odd address would just round down. No special treatment of odd PC-relative offsets would be required in the hardware to deal with this case.

i'm not saying it would be a good idea. Just that it wouldn't be very difficult or complicated.

Quote:
Yes but regardless of how long the PC can stay odd (if at all), a branch going to an odd address is kinda valid in some way.

Should we remain compatible with existing programs doing valid things ?
Is it "valid" or is it "kinda valid in some way"? This seems like something of a subjective question. "It works" is not the same thing as "it's valid". I've worked with engineers before who didn't bother to read the documentation and just made things work, and then we fix a bug and it doesn't work any more, and then they phone me up and complain and i have to explain to them why they should have done what it says in the instructions.

Yeah there are some cases where it's kind of our fault if that happens, because we should have been more strict in what we would accept and throw an exception or something otherwise, and then they'd know they'd done it wrong. But in this case we're in a pickle because the programmer threw an exception on purpose in order to achieve what they wanted.

Quote:
Actually i kinda like the register split. It's just that it's too strict like it is. I wouldn't do dirty things just to remove it (which can't be done - at least not fully - without important code density losses, btw).
Yeah sometimes when i consider having 32 registers instead of 16, i end up wondering about bringing another split back in, but i dunno..

it has architectural advantages as well, because two single-port register files might be simpler to implement than a dual-port register file, so instructions like move (A0)+,D0 can write both results simultaneously. So i kind of waver in my support of it.
Mrs Beanbag is offline  
Old 12 August 2016, 23:08   #64
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by meynaf View Post
It could just have been :
Code:
 bfextu d0{16:16},d0
 ffo d0
This code doesn't look performance critical. D0 could be extended before (at no cost if it's not in a loop).
You're using bfffo because it's there, but you wouldn't have asked for it if it weren't.
Anyway we have it now, so unless we go the incompatible way this is useless talk.
You could use MVZ.W Dn,Dn on the CF but I believe you are going to need to subtract those first 16 zeros.

Code:
  mvz.w d0,d0
  ff1 d0
  sub.l #16,d0 ; 6 bytes without OP.L #data.w,Dn
It would be possible to do a SWAP+FF1 but then we need to take care of the case where D0=0. Yea, I'm using BFFFO because its there and exactly what I need as demonstrated by the single instruction. Time critical? It is a support library so it depends on the use.

Quote:
Originally Posted by meynaf View Post
Branches consist of 10% of overall instructions. You can't avoid them. So it's better to concentrate the resources on their implementation, rather than waste silicon on ways to avoid them.
It is not how frequent the instruction but how big of a problem it is. Let's look at the case of one simple missed branched.

68060
IPC=1.3
pipe length=8
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 1.3 * 8 = 10.4 instructions on average
This is 10.4 * 3 = 31.2 bytes of code fetched and cached
This is 31.2 / 16 = 2 ICache lines replaced

The Apollo-core may have an IPC of 3+ and a deeper pipeline.
IPC=3
pipe length=10
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 3 * 10 = 30 instructions on average
This is 30 * 3 = 90 bytes of code fetched and cached
This is 90 / 16 = 6 ICache lines replaced

There is also the delay in cycles of the pipe length and any data accessed would go into the DCache. We will do this misprediction twice before we turn around a 2bit saturating prediction. I hope we can quickly see that the cost of those little branches is much higher when they are mispredicted. Advanced processors are not like the 68020/68030 anymore.

Quote:
Originally Posted by meynaf View Post
That's not something the average coder will do and if that "general support" doesn't have the hint bit already, you can't count on GCC people to add it.
That "general support" does have support for a branch hint bit. My evaluation ISA hint bit works the same way as the PPC hint bit which can still be set for those processors which support it. Of course you need a standard and real processors supporting it before the GCC folks would add support. The average coder doesn't use profiling but it is a high level optimizing tool which most compilers have and it is relatively easy to use. More programmers should use it as it can optimize inlining, loops, branches (as much as possible for the particular CPU), code relativity, cache efficiency, etc. The results are relative to how much can be done for that particular CPU and how easy the support is. I want to give compilers the tools they need rather than just complaining about the bloat they produce.

Quote:
Originally Posted by meynaf View Post
So you don't see the relative branch with bit #0 special vs d16(pc) inconsistency as dirty ? Personnally i do.
In addition i don't differentiate ugly and dirty. For me if it's ugly, it's dirty.
No. I don't see the hint bit as dirty. I don't think it slows anything down and it is normally not used. Some people may consider it ugly but they don't have to set it and it is the same as not having it. There are better ways to trap since the 68020 than setting an odd PC.

Quote:
Originally Posted by meynaf View Post
Anything that adds special cases is to be avoided if possible.
Everything adds special cases. It is important to avoid the big tables/muxes and decoding info not in the first word. This is what I consider dirty.

Quote:
Originally Posted by meynaf View Post
By having SELcc move the condition field in an unusual position you create a special case.
I should rework the SELcc encoding after I changed BScc. It is low priority as nobody is likely to use the ISA or do anything with it anyway.

Quote:
Originally Posted by meynaf View Post
By adding the hint bit you change the way PC-relative displacements are interpreted but not always.
By adding an addressing mode for short displacements you add a mode that's only valid in a few cases (i see the regular immediate addr mode as a bad choice as well).
So do you consider the EA #data immediate as ugly so dirty since there is no difference? Does that make the whole 68k dirty then? If the original 68k was dirty then I can't ruin it by being dirty?
matthey is offline  
Old 12 August 2016, 23:37   #65
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
Quote:
Originally Posted by Mrs Beanbag View Post
it matters for the point i'm making, which is you could just wire bit 0 of the PC to zero, and then any branch to an odd address would just round down. No special treatment of odd PC-relative offsets would be required in the hardware to deal with this case.

i'm not saying it would be a good idea. Just that it wouldn't be very difficult or complicated.
IIRC several RISC cpus do it this way.


Quote:
Originally Posted by Mrs Beanbag View Post
Is it "valid" or is it "kinda valid in some way"? This seems like something of a subjective question. "It works" is not the same thing as "it's valid". I've worked with engineers before who didn't bother to read the documentation and just made things work, and then we fix a bug and it doesn't work any more, and then they phone me up and complain and i have to explain to them why they should have done what it says in the instructions.

Yeah there are some cases where it's kind of our fault if that happens, because we should have been more strict in what we would accept and throw an exception or something otherwise, and then they'd know they'd done it wrong. But in this case we're in a pickle because the programmer threw an exception on purpose in order to achieve what they wanted.
Well, the 68k manual doesn't state we shouldn't throw exceptions on purpose.
So for me bit #0 of a branch isn't "free". And that's all.


Quote:
Originally Posted by matthey View Post
You could use MVZ.W Dn,Dn on the CF but I believe you are going to need to subtract those first 16 zeros.

Code:
  mvz.w d0,d0
  ff1 d0
  sub.l #16,d0 ; 6 bytes without OP.L #data.w,Dn
It would be possible to do a SWAP+FF1 but then we need to take care of the case where D0=0. Yea, I'm using BFFFO because its there and exactly what I need as demonstrated by the single instruction. Time critical? It is a support library so it depends on the use.
Bit position in a register does not depend on the bit-field position, e.g. if you do BFFFO D0{16:16},D0 with D0=1 you will get D0=31, not 15.


Quote:
Originally Posted by matthey View Post
It is not how frequent the instruction but how big of a problem it is. Let's look at the case of one simple missed branched.

68060
IPC=1.3
pipe length=8
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 1.3 * 8 = 10.4 instructions on average
This is 10.4 * 3 = 31.2 bytes of code fetched and cached
This is 31.2 / 16 = 2 ICache lines replaced

The Apollo-core may have an IPC of 3+ and a deeper pipeline.
IPC=3
pipe length=10
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 3 * 10 = 30 instructions on average
This is 30 * 3 = 90 bytes of code fetched and cached
This is 90 / 16 = 6 ICache lines replaced

There is also the delay in cycles of the pipe length and any data accessed would go into the DCache. We will do this misprediction twice before we turn around a 2bit saturating prediction. I hope we can quickly see that the cost of those little branches is much higher when they are mispredicted. Advanced processors are not like the 68020/68030 anymore.
Here you make the assumption that the mispredicted branch goes to code that's not in the cache. This is true only if it's executed for the first time, and in that case, its speed doesn't really matter.

And the misprediction will occur... twice. So even if it adds 100 clocks, that'll be 100 clocks added to the overall program execution. Big deal.


Quote:
Originally Posted by matthey View Post
That "general support" does have support for a branch hint bit. My evaluation ISA hint bit works the same way as the PPC hint bit which can still be set for those processors which support it. Of course you need a standard and real processors supporting it before the GCC folks would add support. The average coder doesn't use profiling but it is a high level optimizing tool which most compilers have and it is relatively easy to use. More programmers should use it as it can optimize inlining, loops, branches (as much as possible for the particular CPU), code relativity, cache efficiency, etc. The results are relative to how much can be done for that particular CPU and how easy the support is. I want to give compilers the tools they need rather than just complaining about the bloat they produce.
The PPC hint bit got dropped, remember. And for quite valid reasons. Do you really want to make the same mistakes other made before you ?


Quote:
Originally Posted by matthey View Post
No. I don't see the hint bit as dirty. I don't think it slows anything down and it is normally not used. Some people may consider it ugly but they don't have to set it and it is the same as not having it. There are better ways to trap since the 68020 than setting an odd PC.
Since the 68020 ok, but there is a lot of 68000 code around there.
Bit #0 isn't free.


Quote:
Originally Posted by matthey View Post
Everything adds special cases. It is important to avoid the big tables/muxes and decoding info not in the first word. This is what I consider dirty.
Dirty things are also things that got added without enough thinking and cause problems in next generations of the cpu.
Or things that "reuse" a bit that wasn't previously free.


Quote:
Originally Posted by matthey View Post
I should rework the SELcc encoding after I changed BScc. It is low priority as nobody is likely to use the ISA or do anything with it anyway.
Ok. Let me know of your new encoding when you have it.


Quote:
Originally Posted by matthey View Post
So do you consider the EA #data immediate as ugly so dirty since there is no difference? Does that make the whole 68k dirty then? If the original 68k was dirty then I can't ruin it by being dirty?
I consider the EA #immediate as a bad choice. Does a single bad choice ruin the whole architecture ? No.
If your car has a small scratch, do you take a hammer and add more, just because it's not exactly in mint condition and so you can't ruin it ?
meynaf is offline  
Old 13 August 2016, 21:07   #66
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by meynaf View Post
Well, the 68k manual doesn't state we shouldn't throw exceptions on purpose.
So for me bit #0 of a branch isn't "free". And that's all.
Does it not also throw an exception for unused opcodes? In which case we're rather stuck for adding anything, if throwing "illegal instruction" exceptions on purpose is also valid.

Although on balance, i think i agree with you on leaving this bit be.

Quote:
Here you make the assumption that the mispredicted branch goes to code that's not in the cache. This is true only if it's executed for the first time, and in that case, its speed doesn't really matter.
if the instruction stream is read into a buffer a few cycles before execution, the cache could be strobed in advance of actually taking the branch. Especially if it has to wait on the condition code from the previous instruction anyway.

Btw Matthey how did the team manage to get 3 IPC? Is it many-way superscalar or is this thanks to opcode fusion &c?
Mrs Beanbag is offline  
Old 13 August 2016, 21:18   #67
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
Quote:
Originally Posted by Mrs Beanbag View Post
Does it not also throw an exception for unused opcodes? In which case we're rather stuck for adding anything, if throwing "illegal instruction" exceptions on purpose is also valid.
Deliberate throwing the illegal opcode exception is perfectly valid but there is a specific opcode for doing that (aka $4AFC).
meynaf is offline  
Old 13 August 2016, 21:27   #68
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by meynaf View Post
Deliberate throwing the illegal opcode exception is perfectly valid but there is a specific opcode for doing that (aka $4AFC).
yes, ok, but why would someone deliberately run an odd address branch instead of ILLEGAL? if they just want to get into supervisor mode without going through the OS call for some reason, it seems a much more obvious way. One does not need a 68020.

Plus 68020 already broke some old code by not throwing exceptions on odd data accesses! So Motorola seems to think that wasn't such a legitimate technique.

Last edited by Mrs Beanbag; 13 August 2016 at 21:35.
Mrs Beanbag is offline  
Old 13 August 2016, 21:47   #69
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
Quote:
Originally Posted by Mrs Beanbag View Post
yes, ok, but why would someone deliberately run an odd address branch instead of ILLEGAL? if they just want to get into supervisor mode without going through the OS call for some reason, it seems a much more obvious way. One does not need a 68020.
Illegal exception isn't the same as odd address exception. And in one instruction you can test the condition and call the exception ; ILLEGAL isn't a conditional instruction.


Quote:
Originally Posted by Mrs Beanbag View Post
Plus 68020 already broke some old code by not throwing exceptions on odd data accesses! So Motorola seems to think that wasn't such a legitimate technique.
They didn't have much choice. The 68020 is 32-bit so "odd address" becomes meaningless, and trapping on all misaligned accesses would have broken 90% of existing programs.
meynaf is offline  
Old 13 August 2016, 21:54   #70
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by meynaf View Post
Illegal exception isn't the same as odd address exception. And in one instruction you can test the condition and call the exception ; ILLEGAL isn't a conditional instruction.
True. There is TRAPcc though...

Quote:
They didn't have much choice. The 68020 is 32-bit so "odd address" becomes meaningless, and trapping on all misaligned accesses would have broken 90% of existing programs.
But they COULD have maintained exception behaviour on odd addresses the same as before, while allowing non-longword-aligned addresses. They don't *need* to anymore, and as such it might not be very meaningful. But they could have, for the sake of backwards compatibility, if they'd thought it was a legitimate thing to do on purpose.
Mrs Beanbag is offline  
Old 13 August 2016, 23:00   #71
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
Quote:
Originally Posted by Mrs Beanbag View Post
True. There is TRAPcc though...
I recall you that TRAPcc doesn't work on 68000.


Quote:
Originally Posted by Mrs Beanbag View Post
But they COULD have maintained exception behaviour on odd addresses the same as before, while allowing non-longword-aligned addresses. They don't *need* to anymore, and as such it might not be very meaningful. But they could have, for the sake of backwards compatibility, if they'd thought it was a legitimate thing to do on purpose.
Sure they could have built a cpu that supports some misaligned accesses. However it would have looked ridiculous, and very confusing (i even dare to say total crazy).
They did the right choice and it's good for coding flexibility, code density, and even sometimes performance. It was really worth the limited compatibility issue.

It's always a matter of trade-offs. Inability of the 68000 to do misaligned accesses was a real pain. Not many programs trigger address errors on purpose ; in fact i haven't found any. So the change is ok. On the other hand, even a very small compatibility threat for just a near useless branch hint bit, isn't worth.
meynaf is offline  
Old 13 August 2016, 23:17   #72
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by meynaf View Post
I recall you that TRAPcc doesn't work on 68000.
oh yeah

Quote:
Sure they could have built a cpu that supports some misaligned accesses. However it would have looked ridiculous, and very confusing (i even dare to say total crazy).
They did the right choice and it's good for coding flexibility, code density, and even sometimes performance. It was really worth the limited compatibility issue.

It's always a matter of trade-offs. Inability of the 68000 to do misaligned accesses was a real pain. Not many programs trigger address errors on purpose ; in fact i haven't found any. So the change is ok. On the other hand, even a very small compatibility threat for just a near useless branch hint bit, isn't worth.
but they DID build a CPU that supports some misaligned accesses. It doesn't support misaligned code accesses. Granted nobody cares!

But indeed... it is the uselessness of the branch hint bit that swings it for me. I don't really care so much that a few old demos and games won't work. Tbh i would be willing to compromise everything in supervisor mode and let the operating system be recompiled if it would help, i only care about compatibility with user mode software, anything that trashes the OS in order to run is something i'd prefer not to run on a computer with too much power!
Mrs Beanbag is offline  
Old 15 August 2016, 21:21   #73
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 370
My post got eaten. Short version:

Hint bits are not worth it in a modern processor. They can be worth it in simple processors or to shave a few clocks from run-once code (exceptions etc.).

The bit can be used for better things, my version:
0000 0000 -> 16 bit displacement
1111 1111 -> 32 bit displacement
0000 0001 -> 64 bit displacement
xxxx xxx0 -> normal 8 bit displacement
xxxx xxx1 -> available for extension

Any incompatible treatment of the LSb should be disabled by default and enabled by the OS if needed.
Megol is offline  
Old 16 August 2016, 11:03   #74
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
What the heck could be the use for 64 bit displacement ?
Programs are never that large !
meynaf is offline  
Old 16 August 2016, 13:44   #75
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 370
Quote:
Originally Posted by meynaf View Post
What the heck could be the use for 64 bit displacement ?
Programs are never that large !
Mostly because I like symmetry/orthogonality.
Megol is offline  
Old 18 August 2016, 00:10   #76
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by Megol View Post
The bit can be used for better things, my version:
0000 0000 -> 16 bit displacement
1111 1111 -> 32 bit displacement
0000 0001 -> 64 bit displacement
xxxx xxx0 -> normal 8 bit displacement
xxxx xxx1 -> available for extension
64 bit displacements!

But seriously... when i look at this little table it makes me realise something else about why using Bit 0 as a hint bit (or indeed anything else) is really dirty...

Because $FE is a legitimate 8-bit branch. So then how do you put a hint bit on it? Then it becomes $FF which means a 32-bit branch...

Then again $FE would be a branch to itself, causing a total lock-up, so maybe let's not use that anyway.
Mrs Beanbag is offline  
Old 18 August 2016, 08:34   #77
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
A branch with $FE is likely to be a quick'n'dirty error handling, so it has better be predicted as not taken. Hey, wait -- it's a backward branch so default would be taken and it needs the hint bit to reverse that...

So now we have :
- $FE unable to get the hint bit (and always mispredicted)
- $01 useless
- BRA and BSR not using the hint bit
- the hint bit moves away in case of larger branches
Who told about 'special cases' ?
meynaf is offline  
Old 18 August 2016, 20:22   #78
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Megol View Post
Hint bits are not worth it in a modern processor. They can be worth it in simple processors or to shave a few clocks from run-once code (exceptions etc.).
Is a <150MHz FPGA processor a modern processor? I agree that there is less advantage to a branch hint bit as the prediction technique gets better but high end processors have gshare/adaptive dynamic prediction (there is a several cycle latency with these techniques so it still could be useful in a few cases). More likely for 68k processors is 2 bit saturating or no dynamic prediction where a hint bit would be "useful" if faster executing code is considered "useful". It would be helpful to see some timed results of a variety of code executing with no dynamic prediction and with a hybrid 2 bit saturating with hint bit but everyone has already decided what is "useful". The cost of a hint bit on the 68k is so high that it is practically free but lets reject it as not "useful" even though most research shows a branch prediction improvement for practically free.

Quote:
Originally Posted by Megol View Post
The bit can be used for better things, my version:
0000 0000 -> 16 bit displacement
1111 1111 -> 32 bit displacement
0000 0001 -> 64 bit displacement
xxxx xxx0 -> normal 8 bit displacement
xxxx xxx1 -> available for extension

Any incompatible treatment of the LSb should be disabled by default and enabled by the OS if needed.
We are talking of branches in code here where a 32 bit displacement already gives +-4 GB displacement. If all the code of the AmigaOS was combined into one executable it would probably not exceed this. If all the code which was released for cost on the Amiga in the first year was combined into one executable it would probably not exceed this. I don't think it would be a problem to divide code up into sections this small even if 64 bit addressing existed. The problem with PC relative addressing is not the branch displacement range but the code density deterioration and use of an expensive addressing mode because of a lack of range in PC relative addressing modes (beyond +-32kB). This is where the 2 new PC relative addressing modes proposed could practically eliminate both problems. Even then, 64 bit PC relative displacements are not necessary and would likely never be used. Code is much smaller than data even on modern processors.

Quote:
Originally Posted by meynaf View Post
A branch with $FE is likely to be a quick'n'dirty error handling, so it has better be predicted as not taken. Hey, wait -- it's a backward branch so default would be taken and it needs the hint bit to reverse that...

So now we have :
- $FE unable to get the hint bit (and always mispredicted)
- $01 useless
This is not a problem for real code. Why not blame the 68k ISA creators for making useless encodings too?

Quote:
Originally Posted by meynaf View Post
- BRA and BSR not using the hint bit
I documented the least significant bit of the displacement as reserved for these instructions. The CPU handles these different than Bcc even if the encoding is similar.

Quote:
Originally Posted by meynaf View Post
- the hint bit moves away in case of larger branches
Who told about 'special cases' ?
The CPU decodes into a 32 bit displacement as quickly as possible. The 68060 does not have a cycle penalty for a Bcc.W or Bcc.L so this does not appear to be a problem. The hint bit would be available at the same time as the displacement.

I wish to reduce my time spent posting here. I do not view this thread as productive and the ISA is dead end anyway. It only has historical significance as these were the ideas which the "Apollo non-Team" came up with and were discarded with minimal evaluation before a 68k+MMX bolt-on was decided by Gunnar. RIP 68k.
matthey is offline  
Old 18 August 2016, 21:46   #79
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,620
Quote:
Originally Posted by matthey View Post
Is a <150MHz FPGA processor a modern processor?
In some way, yes.


Quote:
Originally Posted by matthey View Post
It would be helpful to see some timed results of a variety of code executing with no dynamic prediction and with a hybrid 2 bit saturating with hint bit but everyone has already decided what is "useful".
You know full well what these results would say. Comparing 2 bit saturating with and without the hint bit, is quite easy : at best a few clocks for the two first iterations (if the hint bit isn't simply completely off, a likely situation for compiled code), and then nothing at all. Optimizing is about the needs of the many - not the needs of the few.


Quote:
Originally Posted by matthey View Post
The cost of a hint bit on the 68k is so high that it is practically free but lets reject it as not "useful" even though most research shows a branch prediction improvement for practically free.
Special cases in encoding, assemblers and disassemblers, aren't what i call "practically free". Don't play your gunnar, there are costs aside of the implementation.


Quote:
Originally Posted by matthey View Post
This is not a problem for real code. Why not blame the 68k ISA creators for making useless encodings too?
Useless encodings come from the fact special cases are costly to handle in comparison to the small benefit, no more no less. But this is not the point. I will always be against frankenstein-like stuff that's added for the mere sake of 'speed'.


Quote:
Originally Posted by matthey View Post
I documented the least significant bit of the displacement as reserved for these instructions. The CPU handles these different than Bcc even if the encoding is similar.
Here the whole Bcc area is split between two cases. How they are handled internal to the cpu, is irrelevant.


Quote:
Originally Posted by matthey View Post
The CPU decodes into a 32 bit displacement as quickly as possible. The 68060 does not have a cycle penalty for a Bcc.W or Bcc.L so this does not appear to be a problem. The hint bit would be available at the same time as the displacement.
The displacement is useless if the branch is "not taken". Unsure this fact is usable or not, nevertheless i wouldn't risk this for a hint bit.


Quote:
Originally Posted by matthey View Post
I wish to reduce my time spent posting here. I do not view this thread as productive and the ISA is dead end anyway. It only has historical significance as these were the ideas which the "Apollo non-Team" came up with and were discarded with minimal evaluation before a 68k+MMX bolt-on was decided by Gunnar. RIP 68k.
So long then, it has been fun
The 68k remains the best existing ISA, even if you consider it dead.
Gunnar has changed his mind when facing the gruesome facts for the emulation library and it seems current version does not have this silly c2p/pixmerge stuff he once wanted to add (and not even additional data registers). He is likely to change again when seeing the uselessness of what he has added.
meynaf is offline  
Old 18 August 2016, 22:33   #80
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by matthey View Post
Is a <150MHz FPGA processor a modern processor?
any given design could be run at fast or slow clock speeds in principle so this is not a very good critereon. It's modern if it uses modern ideas. FPGAs will get faster and we could even see homebrew silicon wafers in the future. Instructions per clock is a better measure. Although even that is open to debate because not all modern applications demand speed.

So. Imho there are better ways to mitigate branch penalties than explicit hint bits, which are alien to 68k architecture.

i'd rather we started thinking outside the box a bit more.
Mrs Beanbag is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
BOOM (DOOM Enhanced) port to 68k NovaCoder News 133 18 November 2019 16:29
ISA Ethernet Cards jmmijo support.Hardware 13 03 February 2015 12:04
Any ISA Mach64 Information? CU_AMiGA support.Hardware 21 09 September 2007 23:17
Help converting an 8bit ISA slot to 16bit ISA slot Smiley support.Hardware 4 25 April 2006 12:20
A2000 ISA slots Unknown_K support.Hardware 1 20 March 2005 10:48

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 17:35.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.11894 seconds with 16 queries