View Single Post
Old 12 August 2016, 22:37   #65
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
Quote:
Originally Posted by Mrs Beanbag View Post
it matters for the point i'm making, which is you could just wire bit 0 of the PC to zero, and then any branch to an odd address would just round down. No special treatment of odd PC-relative offsets would be required in the hardware to deal with this case.

i'm not saying it would be a good idea. Just that it wouldn't be very difficult or complicated.
IIRC several RISC cpus do it this way.


Quote:
Originally Posted by Mrs Beanbag View Post
Is it "valid" or is it "kinda valid in some way"? This seems like something of a subjective question. "It works" is not the same thing as "it's valid". I've worked with engineers before who didn't bother to read the documentation and just made things work, and then we fix a bug and it doesn't work any more, and then they phone me up and complain and i have to explain to them why they should have done what it says in the instructions.

Yeah there are some cases where it's kind of our fault if that happens, because we should have been more strict in what we would accept and throw an exception or something otherwise, and then they'd know they'd done it wrong. But in this case we're in a pickle because the programmer threw an exception on purpose in order to achieve what they wanted.
Well, the 68k manual doesn't state we shouldn't throw exceptions on purpose.
So for me bit #0 of a branch isn't "free". And that's all.


Quote:
Originally Posted by matthey View Post
You could use MVZ.W Dn,Dn on the CF but I believe you are going to need to subtract those first 16 zeros.

Code:
  mvz.w d0,d0
  ff1 d0
  sub.l #16,d0 ; 6 bytes without OP.L #data.w,Dn
It would be possible to do a SWAP+FF1 but then we need to take care of the case where D0=0. Yea, I'm using BFFFO because its there and exactly what I need as demonstrated by the single instruction. Time critical? It is a support library so it depends on the use.
Bit position in a register does not depend on the bit-field position, e.g. if you do BFFFO D0{16:16},D0 with D0=1 you will get D0=31, not 15.


Quote:
Originally Posted by matthey View Post
It is not how frequent the instruction but how big of a problem it is. Let's look at the case of one simple missed branched.

68060
IPC=1.3
pipe length=8
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 1.3 * 8 = 10.4 instructions on average
This is 10.4 * 3 = 31.2 bytes of code fetched and cached
This is 31.2 / 16 = 2 ICache lines replaced

The Apollo-core may have an IPC of 3+ and a deeper pipeline.
IPC=3
pipe length=10
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 3 * 10 = 30 instructions on average
This is 30 * 3 = 90 bytes of code fetched and cached
This is 90 / 16 = 6 ICache lines replaced

There is also the delay in cycles of the pipe length and any data accessed would go into the DCache. We will do this misprediction twice before we turn around a 2bit saturating prediction. I hope we can quickly see that the cost of those little branches is much higher when they are mispredicted. Advanced processors are not like the 68020/68030 anymore.
Here you make the assumption that the mispredicted branch goes to code that's not in the cache. This is true only if it's executed for the first time, and in that case, its speed doesn't really matter.

And the misprediction will occur... twice. So even if it adds 100 clocks, that'll be 100 clocks added to the overall program execution. Big deal.


Quote:
Originally Posted by matthey View Post
That "general support" does have support for a branch hint bit. My evaluation ISA hint bit works the same way as the PPC hint bit which can still be set for those processors which support it. Of course you need a standard and real processors supporting it before the GCC folks would add support. The average coder doesn't use profiling but it is a high level optimizing tool which most compilers have and it is relatively easy to use. More programmers should use it as it can optimize inlining, loops, branches (as much as possible for the particular CPU), code relativity, cache efficiency, etc. The results are relative to how much can be done for that particular CPU and how easy the support is. I want to give compilers the tools they need rather than just complaining about the bloat they produce.
The PPC hint bit got dropped, remember. And for quite valid reasons. Do you really want to make the same mistakes other made before you ?


Quote:
Originally Posted by matthey View Post
No. I don't see the hint bit as dirty. I don't think it slows anything down and it is normally not used. Some people may consider it ugly but they don't have to set it and it is the same as not having it. There are better ways to trap since the 68020 than setting an odd PC.
Since the 68020 ok, but there is a lot of 68000 code around there.
Bit #0 isn't free.


Quote:
Originally Posted by matthey View Post
Everything adds special cases. It is important to avoid the big tables/muxes and decoding info not in the first word. This is what I consider dirty.
Dirty things are also things that got added without enough thinking and cause problems in next generations of the cpu.
Or things that "reuse" a bit that wasn't previously free.


Quote:
Originally Posted by matthey View Post
I should rework the SELcc encoding after I changed BScc. It is low priority as nobody is likely to use the ISA or do anything with it anyway.
Ok. Let me know of your new encoding when you have it.


Quote:
Originally Posted by matthey View Post
So do you consider the EA #data immediate as ugly so dirty since there is no difference? Does that make the whole 68k dirty then? If the original 68k was dirty then I can't ruin it by being dirty?
I consider the EA #immediate as a bad choice. Does a single bad choice ruin the whole architecture ? No.
If your car has a small scratch, do you take a hammer and add more, just because it's not exactly in mint condition and so you can't ruin it ?
meynaf is offline  
 
Page generated in 0.05064 seconds with 10 queries