View Single Post
Old 22 June 2017, 10:43   #159
Registered User

Join Date: Jun 2015
Location: Germany
Posts: 450
Originally Posted by matthey View Post
IBM did introduce integer SIMD data into the FPU register file but integer is faster than floating point processing so nothing is slowed.
Now that you acknowledge that POWER8 mixes scalar and vector integer data processing with floating point processing at 5 GHz clock frequency just fine, let's get back to what you critisized in Gunnar's ISA decisions:
Originally Posted by matthey View Post
Apollo ISA: 64 bit integer unit registers are shared with 64 bit SIMD unit registers in the ISA
Result: 128, 256, or 512 bit SIMD unit registers would give 128, 256 or 512 bit wide integer unit registers slowing down a critical path
Result: floating point in the SIMD unit would give floating point in the integer unit registers slowing down a critical path
Obviously you were wrong:

ad "result" 1: POWER8 has 128 bit wide SIMD registers combined with scalar integer processing running at 5 GHz. 5 GHz does not appear to be "slowed down" by "a critical path". If you were right, we could expect 64 bit Registers (=less than 128bit) to run even faster than 5 GHz. Fine for me.

ad "result" 2: POWER8 has floating point mixed with integer operations working on the same register file running at 5 GHz. 5 GHz is not "slowed down".
Introducing slow floating point into the fast integer units can slow the integer unit processing of the most important and time sensitive part of the CPU.
Now this is really becoming ridiculous. So IBM introducing integer into floating point is no problem (see above) but introducing floating point into integer is? How is this even possible if in both cases we end up with a mixed scalar/vector integer/floating point unit? How can your claim be right if this mixed unit runs at 5 GHz in a real-world processor? Are you really saying that it matters whether FP or integer were there first in the processing unit before the other got added with the next processor generation?
More ports may be required for the register file potentially slowing it.
Is it possible that each time you write "potentially" (which you do a lot) you just don't know for sure what you are writing about? More ports are only required if you want to increase superscalarity, not if you want to add processing units (which would mean a higher fan-out, if you know what that is). BTW, in the case of the unified register file we find in both Apollo and POWER8 there is no penalty with regard to superscalarity because the legacy units work on respective halves of the unified register file. Hence, your statement about requiring additional ports is nonsense.
Enlarging the integer unit register file for more SIMD registers is also potentially slower.
Which would be why? Please explain the details. Ever noticed how registers increase both in size and number with the ever growing number of transistors per chip? We are talking about 21st century chip technology, not about stuff that were a factor for processors designed in the 1980s.
These are poor choices for energy efficiency if ever moving to an ASIC as well.
You are arbitrarily picking stuff that suits your views and preferences out of four decades of processor development. Now you come up with how larger registers are bad for power efficiency. That was true when you could still count the gates in the processor. Now the registers and the ALU are a minor fraction of the transistor count. This line of thinking is just like you critisized RiVA for a lack of optimisation in the initialisation code where one or two cycles were wasted. Of course, you could with today's technology build a 6510 running with µW of power. But except for cardiac pacemakers hardly anybody needs those.

Let's be precise here: you are not against wide SIMD registers in addition to 32bit integer registers, you say that just having 64bit registers acting both as SIMD and integer registers would be less power efficient. Please explain.
You talk about education but what good is it when you deny what you have learned and the obvious truth? At that point, the only degree you deserve is a degree in propaganda, deception and brainwashing which is good for nothing productive. I hope there are enough critical thinkers here to sort out what degree you deserve.
I guess it is obvious that you were no help to the project because of your unfounded objections and constant refusal of accepting that you are wrong. Since Gunnar won't listen to you any more, you are no telling the entire world.

Your lack of technical expertise shows in that you isolate facts and then put them into the wrong context. Let's make up an example (nothing you actually stated): link registers vs. implicit pushing return addresses to stack. In the 1980s the implicit pushing was a practical feature but resulted in slow subroutine call/return times. Then RISC came up with the Link Register to solve this problem. This was generally considered a good idea for some years. Now Link Registers are a burden because we now have the resources to implement link stacks that completely hide the penalties of old and are ISA-transparent. The Link Stack can even hold predecoding info of the instructions to which the processor will return on subroutine exit and more without the programmer having to know about it. However, one could now claim that implicit pushing of return addresses to the stack are a burden and even cite textbooks from the 80s for that statement. This is how most of your arguments are constructed (e.g. your reference to the Motorola 88k line of processors).

It's so tiring.

Last edited by grond; 22 June 2017 at 11:06.
grond is offline  
Page generated in 0.04743 seconds with 10 queries