English Amiga Board - View Single Post

matthey · 12 August 2016, 22:08

Quote:

Originally Posted by meynaf

It could just have been :

Code:

 bfextu d0{16:16},d0
 ffo d0

This code doesn't look performance critical. D0 could be extended before (at no cost if it's not in a loop).
You're using bfffo because it's there, but you wouldn't have asked for it if it weren't.
Anyway we have it now, so unless we go the incompatible way this is useless talk.

You could use MVZ.W Dn,Dn on the CF but I believe you are going to need to subtract those first 16 zeros.

Code:

  mvz.w d0,d0
  ff1 d0
  sub.l #16,d0 ; 6 bytes without OP.L #data.w,Dn

It would be possible to do a SWAP+FF1 but then we need to take care of the case where D0=0. Yea, I'm using BFFFO because its there and exactly what I need as demonstrated by the single instruction. Time critical? It is a support library so it depends on the use.

Quote:

Originally Posted by meynaf

Branches consist of 10% of overall instructions. You can't avoid them. So it's better to concentrate the resources on their implementation, rather than waste silicon on ways to avoid them.

It is not how frequent the instruction but how big of a problem it is. Let's look at the case of one simple missed branched.

68060
IPC=1.3
pipe length=8
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 1.3 * 8 = 10.4 instructions on average
This is 10.4 * 3 = 31.2 bytes of code fetched and cached
This is 31.2 / 16 = 2 ICache lines replaced

The Apollo-core may have an IPC of 3+ and a deeper pipeline.
IPC=3
pipe length=10
avg instruction size=3 bytes
ICache line=16 bytes

We throw away 3 * 10 = 30 instructions on average
This is 30 * 3 = 90 bytes of code fetched and cached
This is 90 / 16 = 6 ICache lines replaced

There is also the delay in cycles of the pipe length and any data accessed would go into the DCache. We will do this misprediction twice before we turn around a 2bit saturating prediction. I hope we can quickly see that the cost of those little branches is much higher when they are mispredicted. Advanced processors are not like the 68020/68030 anymore.

Quote:

Originally Posted by meynaf

That's not something the average coder will do and if that "general support" doesn't have the hint bit already, you can't count on GCC people to add it.

That "general support" does have support for a branch hint bit. My evaluation ISA hint bit works the same way as the PPC hint bit which can still be set for those processors which support it. Of course you need a standard and real processors supporting it before the GCC folks would add support. The average coder doesn't use profiling but it is a high level optimizing tool which most compilers have and it is relatively easy to use. More programmers should use it as it can optimize inlining, loops, branches (as much as possible for the particular CPU), code relativity, cache efficiency, etc. The results are relative to how much can be done for that particular CPU and how easy the support is. I want to give compilers the tools they need rather than just complaining about the bloat they produce.

Quote:

Originally Posted by meynaf

So you don't see the relative branch with bit #0 special vs d16(pc) inconsistency as dirty ? Personnally i do.
In addition i don't differentiate ugly and dirty. For me if it's ugly, it's dirty.

No. I don't see the hint bit as dirty. I don't think it slows anything down and it is normally not used. Some people may consider it ugly but they don't have to set it and it is the same as not having it. There are better ways to trap since the 68020 than setting an odd PC.

Quote:

Originally Posted by meynaf

Anything that adds special cases is to be avoided if possible.

Everything adds special cases. It is important to avoid the big tables/muxes and decoding info not in the first word. This is what I consider dirty.

Quote:

Originally Posted by meynaf

By having SELcc move the condition field in an unusual position you create a special case.

I should rework the SELcc encoding after I changed BScc. It is low priority as nobody is likely to use the ISA or do anything with it anyway.

Quote:

Originally Posted by meynaf

By adding the hint bit you change the way PC-relative displacements are interpreted but not always.
By adding an addressing mode for short displacements you add a mode that's only valid in a few cases (i see the regular immediate addr mode as a bad choice as well).

So do you consider the EA #data immediate as ugly so dirty since there is no difference? Does that make the whole 68k dirty then? If the original 68k was dirty then I can't ruin it by being dirty?