Opinion: PPC, 68080 and other - Page 9

AestheticDebris · Yesterday, 12:09

Quote:

Originally Posted by minator

CISC instruction sets are more complex to decode and that limits performance. Intel and AMD have got around this by pushing the clock speed right up. RISC being predictable and simpler to decode can run more instructions per cycle at a lower clock rate.

That's what people thought in the 80's and early 90's, but it was largely predicated on the assumption memory speed would increase in line with CPU speed. Of course that turned out not to happen and the supposed benefits RISC was going to have never really materialised.

Instead it became quicker and easier to just read CISC-like instructions from RAM, convert them to RISC-like microcode and run them at much higher speeds there. The less data you have to transfer across the slow memory bus (and the increased cache density of storing CISC instructions) outweighs the complexity.

Intel really lucked out in that their attempts to replace x86 with just about anything else failed because no customer was willing to do the work of migrating existing code. If Motorola had met similar resistance and been forced to stick with 68k, they might have actually done better in the long run.

Promilus · Yesterday, 14:40

@AestheticDebris - sure, but...
1. current gen x86 still have absurd level of cache (L3) which is higher than memory size (global) of machines 3 decades ago. So basically you could run application of late 90s entirely out of cache.
2. all high performance x86 introduce L0 cache or uops trace cache. It's relatively small but contains already translated instructions. So if your entire loop fits there it's gonna be executed much faster. And that's how RISC build upon current x86 execution units would most likely run. That's also the reason why Apple was able to match x86 general performance with their ARM implementation.
3. Of course 64b 68k with fully superscalar pipelined architecture and vector instructions (simd) would do fairly fine. Better if it was actually evolving due to competition. Intel didn't bring Core uArch out of the goodness of their hearts. They had to because their Itanium sucked and AMD made own x86 enhancement which was warmly welcomed by the market. Motorola did nothing to allow 3rd parties on their turf... Hence, most likely, motorola exclusive 68k would fail in the long run anyway.

Samurai_Crow · Yesterday, 14:41

Caches don't get bigger because of poor code-density automatically. Address pins don't automatically exceed 32 bits because memory size gets bigger. Even embedded controllers use more than 32 bits nowadays.

minator · Yesterday, 17:01

Quote:

Originally Posted by AestheticDebris

That's what people thought in the 80's and early 90's, but it was largely predicated on the assumption memory speed would increase in line with CPU speed. Of course that turned out not to happen and the supposed benefits RISC was going to have never really materialised.

But they did, the high end RISCs (Alpha, PA-RISC) were both soundly beating x86, until they were stupid enough to go with Itanium.

The simpler decode advantage still holds to this day - it's how Apple was able to make the M1 so fast and put it in a laptop. They can decode a whole row of instructions at once because the instruction stream is simple and predictable. x86 has a hard time doing that.

If CISC is such an advantage, why does nobody other than x86 use it anymore?
Arm used a memory-saving instruction set in Thumb mode but it was removed in Arm64. If it was such an advantage, why remove it? In fact,

[/QUOTE]Intel really lucked out in that their attempts to replace x86 with just about anything else failed because no customer was willing to do the work of migrating existing code. If Motorola had met similar resistance and been forced to stick with 68k, they might have actually done better in the long run.[/QUOTE]

They'd have evolved it for sure, but I suspect they'd have done things like produce a simpler version like x86-64, or what they actually did do with Coldfire.

AestheticDebris · Yesterday, 17:26

@Promilus

1) Well, yeah, obviously. Modern machines have enormous amounts of RAM compared to 30 year old machines, so they cache enormous amounts of data in comparison. But RISC architectures would need more caching, because they needed more instructions and throughput is critical.

2) Modern ARM is full of not very RISC things, because like everyone else, they've seen that RISC doesn't really give the benefit it was envisioned to. It's why they have SIMD instructions and other "combined" instructions like FMA that do more than one thing at once, which is literally the opposite design philosophy to RISC.

The whole RISC/CISC argument died a long time ago as both sides borrowed the best bits of the other. The thing everyone cares about these days is really power efficiency and ARM mostly is at an advantage because x86 carries a lot more legacy baggage (which Intel is very keen to ditch).

3) I'm not saying they wouldn't have failed for other reasons, just that abandoning their CISC architecture and going all in on a totally different RISC design might not have been necessary in the long run. The fact Apple was one of their biggest customers and pushing hard for a RISC design (because like everyone else they thought it was the future) probably made ditching 68k seem like a good option at the time and maybe it was on those grounds alone.

Photon · Yesterday, 19:32

My take on this is that Amiga thrives in the magical space of emerging workstation. <3

We love Amiga because we love the platform = hardware + software.
Replace the hardware or software to be very different, and it's not the platform we love anymore.
What makes stuff for it awesome is that the stuff works within the platform limitations. Make it too fast and hires, and it's not impressive anymore - just another PC.

Conclusion:

Per above we can point to this demo, game, modeler/renderer package, titling/FX software, paint program or word processor/DTP/database and say it punches up - just like the Amiga did.
So you can run it faster with a faster CPU, yes, so what? That's... how it works.
Anyone can stay on Motorola, and if fantasy speeds and $0 gfx cards are desired, WinUAE is much more compatible than HW emu, and much faster.

I think FPGAs and RPis are extremely useful, e.g. providing modern and fast I/O interfaces and peripherals like RGB2HDMI and much better than a microcontroller for superfast big flash cards, Ethernet, etc etc. The sky really is the limit and imagination can take us anywhere!

This is what I'd like to see them used for.

I include Zorro compatible gfx cards and GPUs and sound cards in this, although I think with too many expansions (even back in the day cards) - again, it's just like another PC (a pile of cards, not a platform).

(There is something about having a platform to turn to that is different from the 'devices' we use every day. I include a web browser, notifications, updates, needless software change (degradation away from power usage) etc in this. The usage is different. The computer is there. You have power. What do you want to do today?)

E.g. I'm not an Amiga OS4 aficionado despite all their stellar work(!) because I want something to love that's different from everything else, and oldskool Amigas fill that space. I can turn to it, there are no distractions, and it's a very welcome change from 'devices'. And many other platforms that have been left alone, so you can love them e.g. 8-bit, Archimedes and much more.

And it seems these PiStorms, Vampires, and sandboxed Amiberry boxes by any other name are doing the same as OS4/PPC - but starting over on their own with different platforms, with a worse result, and without getting all the work put into OS4.

There's also AROS, which is sort of 'doing a NextSTEP' (but seemingly better) and would then be able to use the full power of the (is >4GB RAM possible?) PC CPU. But again, it wouldn't be the same so same as for OS4.

~

My dream Amiga would be one with every custom chip remade in modern components (and finished not WIP), and remade with modesty and taste, and the same with the 68060 - but since producing a CPU takes a hundred man-years, this will never be finished, and so I don't want anything beyond 68060.

I will never need higher res than highest progressive mode in AGA, but the sound and Blitter were never improved, so I would love it if we all agreed on a (performant DMA) simple sample sound card

and an 8MB chip, exact, finished, 110+ MHz Blitter should be much more possible than a new Motorola CPU. So would a batch Akiko. It's extremely simple in design. Set src & dst registers, trigger by writing size like the Blitter, done.

^ This kind of Amiga would be much more following Jay Miner's Vision. <3

Gorf · Yesterday, 19:59

Quote:

Originally Posted by Promilus

Why are you mixing up code density with performance?

Because memory throuput still relates to performance.

Quote:

Was code density of MIPS better than 68000? No. Was MIPS faster than 68000? Yes.

sure: it had 32bit (later 64bit) wide memory access and was pipelined.
Later 68k iterations also got pipelined and got wider buses and cache ...
Reducing the MIPS advantage to higher clock-speeds only.
On cycle-by-cycle comparison MIPS had no speed advantage vs. contemporary 040 or 060s - just the opposite.

Emu68 can translate 68k-code to aarch64 at a level of 2:1, meaning two ARM instructions pr 68k instruction, in the absolute very best scenarios.

So this are not only twice as many instructions, but each of them is also longer than a typical 68k instruction.

Quote:

AC68080 did try to get 3 pipelines working but couldn't get proper performance benefits. No wonder, that's tough job which took many big companies decades, and then next decade to add 4th, next decade to add 5th execution pipeline. So where in that is G4e and where is AC68080? Well, it isn't so good for vampire.

The 68080 is nowhere near reals silicon and I explicitly excluded its current FPGA implementation, from being useful.

I only gave a speed reference - a new machine would need something that is 10x faster than any current 68080 implementation as a minimum.

I am not against RISC CPUs at all. Whatever works.

Promilus · Yesterday, 21:34

Quote:

Originally Posted by Gorf

Because memory throuput still relates to performance.

Emu68 can translate 68k-code to aarch64 at a level of 2:1, meaning two ARM instructions pr 68k instruction, in the absolute very best scenarios.

And do you know what is the ratio of x86 or 68k for that matter emulating e.g. PPC?

Yes, dynamic recompilation is often nearly as fast as running native code. But... if you want to emulate correct behavior of CPU (i.e. all CCR flags etc.) then there's penalty.

Quote:

So this are not only twice as many instructions, but each of them is also longer than a typical 68k instruction.

Doesn't matter, you cannot fetch & decode more than frontend allows. And the very best RISC can still fetch and decode more instructions than best CISC. My point is... with dynamic recompilation and all the tools we have what is actual benefit of "super duper 68k"? Due to all that hocus pocus ooo & cache & whatever magic there's really no reason to do things through assembly. And with high level language it doesn't really matter what you are writing on.

Quote:

I am not against RISC CPUs at all. Whatever works.

While there are pretty powerful CISC today there's one issue with those. Power consumption. All that energy goes to power magic circuits inside front end to provide support for legacy CISC ISA. I do not think it is actually necessary at this point. I do not feel RISC-V is a way to go either. And obviously paying ARM is out of the question as well so there's no actual push to drop x86 support. Maybe when there will be affordable and popular mainboards for RISC-V or ARM processors matching modern PC in terms of features and performance... But it seems we could wait a lot for it to happen. And as for super duper 68k, 1GHz range - believe me it would not change a thing. With PiStorm you already are in that range. But it just won't do much more good than what it already has. And if we're talking about hypothetical NG I do no care at all on some 1GHz expensive piece of hardware barely adequate to do most basic web stuff.

minator · Today, 20:08

Quote:

Originally Posted by Gorf

Because memory throuput still relates to performance.

sure: it had 32bit (later 64bit) wide memory access and was pipelined.
Later 68k iterations also got pipelined and got wider buses and cache ...
Reducing the MIPS advantage to higher clock-speeds only.

In the 80s and 90s performance was all about clock speed, performance per clock didn't matter. When the 68040 came out at 40MHz, MIPS was at 100MHz. The Alpha was at 200MHz.

Quote:

On cycle-by-cycle comparison MIPS had no speed advantage vs. contemporary 040 or 060s - just the opposite.

I very doubt that. The Alpha was dual issue, so potentially twice the performance per clock of an 040.

When the dual issue 060 came out. MIPS had a quad issue processor at 90MHz. The Alpha was still dual issue, but it was at 300MHz and 500MHz the following year.

Quote:

Emu68 can translate 68k-code to aarch64 at a level of 2:1, meaning two ARM instructions pr 68k instruction, in the absolute very best scenarios.

So this are not only twice as many instructions, but each of them is also longer than a typical 68k instruction.

There is an advantage to having smaller instructions when you are reading them from RAM because it requires lower bandwidth, However, once they are in cache, that advantage evaporates because all you need to do is increase cache bandwidth to allow bigger instructions.

Modern ARM decoders can issue many instructions per cycle. They have caches running into the hundreds of gigabytes per second to feed this.

[/QUOTE]I only gave a speed reference - a new machine would need something that is 10x faster than any current 68080 implementation as a minimum.[/QUOTE]

You set the bar far too low. At 10x faster the 080 can issue 3.2 billion instructions pre second. An Apple M3 can issue 32 billion.

mschulz · Today, 20:21

Quote:

Originally Posted by Gorf

Emu68 can translate 68k-code to aarch64 at a level of 2:1, meaning two ARM instructions pr 68k instruction, in the absolute very best scenarios.

Very best real world scenario can go much higher than your predicted 2:1… RC5-72 calculations go as high as 1:1 (or very slightly less than this)…

Gorf · Today, 20:39

Quote:

Originally Posted by mschulz

Very best real world scenario can go much higher than your predicted 2:1… RC5-72 calculations go as high as 1:1 (or very slightly less than this)…

Wow - that is of course a great accomplishment.

But also rather an edge-case?

What are your real world scenarios on average and worst case vs. best case?

How much bigger is a typical translated aarch64-code section in comparison to the original 68k-section?

Gorf · Today, 21:00

Quote:

Originally Posted by minator

You set the bar far too low. At 10x faster the 080 can issue 3.2 billion instructions pre second. An Apple M3 can issue 32 billion.

Again "instructions pre second" might be misleading.

What performance level would be your minimum?
Do we need Apple M3 power?

mschulz · Today, 21:14

Quote:

Originally Posted by Gorf

Wow - that is of course a great accomplishment.

But also rather an edge-case?

What are your real world scenarios on average and worst case vs. best case?

How much bigger is a typical translated aarch64-code section in comparison to the original 68k-section?

I do not have that amount of stats - the size of debug outputs, if I would like to analyze it, is beyond me - alone workbench boot can generate hundreds of megabytes of log, if enabled.

Best case is simple - do an endianness conversion of 32bit (ror + swap + ror) - emu68 can detect it and replace three m68k opcodes with single aarch64 opcode. So, best (not common though) is 1:3 (1 arm vs 3 m68k). Second best case are move to/from memory instructions which can be eventually merged, resulting in 1:2 ratio (1 arm vs 2 m68k). Worst case are all supervisor instructions since they need to check for supervisor bit and eventually through exception. There you can go easily with 50:1 ratio (50 arm vs 1 m68k).

Regarding code size, you cannot really compare that. Emu68 translates code starting from each used entry point up to a place where translation need to break. It means, even for small portion of code there can be dozens of translated units. Good thing is however, I have almost never managed to fill the JIT cache (64MB) to a point where Emu68 needs to recover some memory by flushing JIT translations.

Yesterday, 19:32	#166
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,747	My take on this is that Amiga thrives in the magical space of emerging workstation. <3 We love Amiga because we love the platform = hardware + software. Replace the hardware or software to be very different, and it's not the platform we love anymore. What makes stuff for it awesome is that the stuff works within the platform limitations. Make it too fast and hires, and it's not impressive anymore - just another PC. Conclusion: Per above we can point to this demo, game, modeler/renderer package, titling/FX software, paint program or word processor/DTP/database and say it punches up - just like the Amiga did. So you can run it faster with a faster CPU, yes, so what? That's... how it works. Anyone can stay on Motorola, and if fantasy speeds and $0 gfx cards are desired, WinUAE is much more compatible than HW emu, and much faster. I think FPGAs and RPis are extremely useful, e.g. providing modern and fast I/O interfaces and peripherals like RGB2HDMI and much better than a microcontroller for superfast big flash cards, Ethernet, etc etc. The sky really is the limit and imagination can take us anywhere! This is what I'd like to see them used for. I include Zorro compatible gfx cards and GPUs and sound cards in this, although I think with too many expansions (even back in the day cards) - again, it's just like another PC (a pile of cards, not a platform). (There is something about having a platform to turn to that is different from the 'devices' we use every day. I include a web browser, notifications, updates, needless software change (degradation away from power usage) etc in this. The usage is different. The computer is there. You have power. What do you want to do today?) E.g. I'm not an Amiga OS4 aficionado despite all their stellar work(!) because I want something to love that's different from everything else, and oldskool Amigas fill that space. I can turn to it, there are no distractions, and it's a very welcome change from 'devices'. And many other platforms that have been left alone, so you can love them e.g. 8-bit, Archimedes and much more. And it seems these PiStorms, Vampires, and sandboxed Amiberry boxes by any other name are doing the same as OS4/PPC - but starting over on their own with different platforms, with a worse result, and without getting all the work put into OS4. There's also AROS, which is sort of 'doing a NextSTEP' (but seemingly better) and would then be able to use the full power of the (is >4GB RAM possible?) PC CPU. But again, it wouldn't be the same so same as for OS4. ~ My dream Amiga would be one with every custom chip remade in modern components (and finished not WIP), and remade with modesty and taste, and the same with the 68060 - but since producing a CPU takes a hundred man-years, this will never be finished, and so I don't want anything beyond 68060. I will never need higher res than highest progressive mode in AGA, but the sound and Blitter were never improved, so I would love it if we all agreed on a (performant DMA) simple sample sound card and an 8MB chip, exact, finished, 110+ MHz Blitter should be much more possible than a new Motorola CPU. So would a *batch Akiko*. It's extremely simple in design. Set src & dst registers, trigger by writing size like the Blitter, done. ^ This kind of Amiga would be much more following Jay Miner's Vision. <3

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
68080/68060 discussion, comparisons etc	lord of time	support.Hardware	226	14 October 2020 11:32
APOLLO CORE 68080 emulation in WinUAE ?	biozzz	support.WinUAE	10	29 June 2018 13:22
68080 CPU on WinUAE	AMIGASYSTEM	support.WinUAE	6	04 April 2017 18:51
vasm with Apollo Core 68080 and AMMX support	phx	News	11	17 February 2017 23:22
Your Valued opinion please	synchro	Retrogaming General Discussion	32	05 May 2007 22:35

Yesterday, 14:40	#162
Promilus Registered User Join Date: Sep 2013 Location: Poland Posts: 885	@AestheticDebris - sure, but... 1. current gen x86 still have absurd level of cache (L3) which is higher than memory size (global) of machines 3 decades ago. So basically you could run application of late 90s entirely out of cache. 2. all high performance x86 introduce L0 cache or uops trace cache. It's relatively small but contains already translated instructions. So if your entire loop fits there it's gonna be executed much faster. And that's how RISC build upon current x86 execution units would most likely run. That's also the reason why Apple was able to match x86 general performance with their ARM implementation. 3. Of course 64b 68k with fully superscalar pipelined architecture and vector instructions (simd) would do fairly fine. Better if it was actually evolving due to competition. Intel didn't bring Core uArch out of the goodness of their hearts. They had to because their Itanium sucked and AMD made own x86 enhancement which was warmly welcomed by the market. Motorola did nothing to allow 3rd parties on their turf... Hence, most likely, motorola exclusive 68k would fail in the long run anyway.

Yesterday, 14:41	#163
Samurai_Crow Total Chaos forever! Join Date: Aug 2007 Location: Waterville, MN, USA Age: 49 Posts: 2,218	Caches don't get bigger because of poor code-density automatically. Address pins don't automatically exceed 32 bits because memory size gets bigger. Even embedded controllers use more than 32 bits nowadays.

Yesterday, 17:26	#165
AestheticDebris Registered User Join Date: May 2023 Location: Norwich Posts: 502	@Promilus 1) Well, yeah, obviously. Modern machines have enormous amounts of RAM compared to 30 year old machines, so they cache enormous amounts of data in comparison. But RISC architectures would need more caching, because they needed more instructions and throughput is critical. 2) Modern ARM is full of not very RISC things, because like everyone else, they've seen that RISC doesn't really give the benefit it was envisioned to. It's why they have SIMD instructions and other "combined" instructions like FMA that do more than one thing at once, which is literally the opposite design philosophy to RISC. The whole RISC/CISC argument died a long time ago as both sides borrowed the best bits of the other. The thing everyone cares about these days is really power efficiency and ARM mostly is at an advantage because x86 carries a lot more legacy baggage (which Intel is very keen to ditch). 3) I'm not saying they wouldn't have failed for other reasons, just that abandoning their CISC architecture and going all in on a totally different RISC design might not have been necessary in the long run. The fact Apple was one of their biggest customers and pushing hard for a RISC design (because like everyone else they thought it was the future) probably made ditching 68k seem like a good option at the time and maybe it was on those grounds alone.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)