View Single Post
Old 22 March 2017, 18:08   #113
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Hedeon View Post
Doesn't the number of pipelines also matter? Thought the G3 and G4 had more than the 603/604. Then it would help to schedule abcabcabc instead of ababab (3 vs 2 pipelines in this example). Information on wiki differs with other info (4 vs 5 stages for a 603, 1 versus 2 integer execution units.... etc.). So I'm not sure what the actual numbers are.
The 604(e) could issue 4 and complete 6 instructions every cycle using 6 execution units (2xsimple integer, 1xcomplex integer, 1xload/store, branch, FPU). This was the high end PPC for workstations and servers sparing no expense for large L1 caches either. The G3 could issue 2 and complete 6 instructions using 6 execution units (2xsimple integer, 1xcomplex integer, 1xload/store, branch, FPU). The shallower 4 stage pipeline length is borrowed from the 603(e) which saves transistors for a L2 cache (PPC needs huge caches for good performance) but also limits clock speeds. The G3 is a hybrid of both the 603 and 604 designs. While a similar clocked and die size 604e might outperform the G3 in a few benchmarks, the G3 is a more efficient design with resources where needed (604e: 64kB L1, 1MB ext L2, 5.1 million transistors G3: 64kB L1, 256MB-1024MB (ext then integrated) L2, 6.35 million transistors). The G4 design is practically a G3+SIMD unit. It looked like PPC might be able to outperform x86 when the G3 came out until they tried to clock up this shallow pipeline design. More G3 cores with SMP could have been added but single core performance would suffer (for games) and already large caches would grow more (due to poor code density). The '90s G3 design was relegated to energy efficient embedded uses and what is left of "modern" PPC processors from Freescale/NXP/Qualcomm. The G5 was an aggressive new high performance PPC design based on the 64 bit IBM POWER4 but turned out to be horribly inefficient.

I expect a G3/G4 would perform much better with 604 instruction scheduling than 603 instruction scheduling. It may even be desirable to schedule for the 604 if only one executable is released for 603/604/G3/G4. G3/G4 support with latencies for the CIU, FPU and SIMD unit (G4 only) probably wouldn't be difficult to add but would likely only make a difference for code using these units frequently and the code would have to be compiled for that specific processor (multiple executables). You could always e-mail Frank Wille and ask about G3/G4 vsc support but he is on this forum and may see posts here.

Quote:
Originally Posted by Hedeon View Post
They used gcc. I'm not sure how the old gcc stands versus the new vbcc. You know this better ;-) I think I compiled your BlitzQuake sources against a 750, but seeing that 400MHz and 500MHz don't show a lot of difference regarding fps in this game, the bottleneck is somewhere else.
There is a new version of vbcc on the way. It is mostly bug fixes but some of those bug fixes may improve performance (there is one fix which does improve performance for the 68k). The vbcc PPC backend is much better than the 68k backend.

Last edited by matthey; 22 March 2017 at 21:55.
matthey is offline  
 
Page generated in 0.04610 seconds with 11 queries