22 May 2018, 13:47 | #501 |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
|
22 May 2018, 14:56 | #502 | ||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,350
|
Quote:
So you can hide it and pretend there is no emulation - noone will be able to check. Quote:
Quote:
Hmm. I'm afraid getting a cheap peecee would be better already. Ok. See below. Quote:
Quote:
Quote:
But from a programmer's pov, then i might have quite a clear idea. Going from 68000 to 68020 has bring extra programming flexibility. But from 68020 to 68040, nothing came. This might have contributed to the downfall of the family, maybe. If a coder wants speed, he just takes winuae. Coders like achieving something big out of something little. So to provide them the toy they want to play with, my idea is to give them a cpu which is even more friendly to code on than actual 68k. And to not fall in the performance-driven design trap. |
||||||
22 May 2018, 15:05 | #503 | |||||||
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
Not even the old Intel Pentium Pro emulated anything, it implemented the x86 ISA using very RISC like internal operations. Quote:
Then FPGA have to communicate to the host via the normal interfaces: memory and interrupt signals. If the host use the FPGA with a akiko type interface, that is the host writing bytes to be translated to a memory mapped area and reading the result synchronization is easy. For instance one could probably stall the reading of the translated code till the FPGA is finished with its work. But that would be very inefficient. So a more reasonable interface is letting the host direct the FPGA to a block of code to be translated with a target buffer being either implicit or explicit. Then the translation hardware will work until some limit is reached producing a block of code. Synchronization can be either polling the hardware until it signals completion or the host getting an interrupt signal from the FPGA when done. Then comes the problem of branch address translation. Unlike naive code translation this isn't a mechanical process, 68k branch addresses have to be looked up and if translated code for that address is found inserted. If it isn't translated yet one can imagine the FPGA going down that path to translate the new block of code but that isn't realistic for several reasons. Path explosion being the obvious one. A software JIT can quickly switch between executing native code and interpreting 68k code. If we remove the interpreter the host have to point the FPGA to the code block to be executed, wait until translation is done and then start executing again. I think the overheads would be huge. Quote:
The host processor have a highly optimized cache subsystem, what exactly would the FPGA be able to do faster? Quote:
Quote:
Quote:
Quote:
32, 64, 96 bit instructions Load-operate instructions, perhaps Operate-store instructions. Auto-increment and decrement address modes. Immediate values of at least 8, 32, 64 bits. Condition codes but stored per register. 32 registers -> 32 carries, overflows etc. Only 64 bit operations internally, loads can zero/sign extend from byte, w, l. Hardware division. Perhaps MOVEM type instructions. Hardware supported translation of 68k instructions. Perhaps hardware CAM (type of lookup table) for accelerating address translation. This would make translation of 68k instructions trivial and be easy to code for. |
|||||||
22 May 2018, 15:28 | #504 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,350
|
Quote:
Quote:
Beware, too, of 64 bit immediates. Simple move of a full sized data to a linear address would be a monster instruction of at least 18 bytes ! |
||
22 May 2018, 15:40 | #505 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
So UAE does not emulate 68K but fulfills the ISA-contact. there is no spoon! |
|
22 May 2018, 15:48 | #506 | ||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
I am also wondering how much speed is lost for e.g. WinUAE, by Windows (or Linux) preempting the JIT-task, callbacks to other parts of UAE, flushing caches and so on ... having a core dedicated to the test of executing translated code could improve things quite a bit ... Quote:
Of course we need to talk about the ISA. |
||
22 May 2018, 17:39 | #507 | |||||||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
Quote:
Quote:
Quote:
"Bochs" x86 emulator on a PPC-FPGA combo - only the instruction decoding was done in the FPGA. Despite of the overhead the speed was improved. Today the connection between CPU and FPGA is much faster... Quote:
Instruction decoding and translating can be realized much better in FPGA, due to parallelism and the possibility to build effective pipelines. That is the strength of the FPGA, while very fast ALUs are part of the CPU. Quote:
(and without risking cash flushes or other resource conflicts) Quote:
Last edited by Gorf; 22 May 2018 at 18:46. |
|||||||
22 May 2018, 18:46 | #508 | ||
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
Quote:
64 bit values are very rarely needed but not supporting them would make the processor less orthogonal and harder to use. The same hardware that extracts them from the instruction stream also makes 64 bit branch and address displacements trivial. Compared to CISC instructions decoding is easy. What exactly makes this look monstrous to you? Minimum instruction size? |
||
22 May 2018, 19:58 | #509 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,350
|
Quote:
- other parts of UAE : simple, check host cpu% when emulated cpu does nothing - flushing caches : not needed if code cache and data cache aren't separate Quote:
Not surprising. Quote:
Quote:
No 64-bit cpu in the world is orthogonal, and for a good reason. But you can have orthogonal, easy-to-use, 32-bit cpu. For 64-bit there are better ways. For data, merge 2 32-bit instructions together to do a single 64-bit one. As an advantage, your code will run regardless if this is the 32-bit or the 64-bit of your core. For addresses, use the trick i mentioned before. But programming is a pain in the a$$. Sorry, but no. Having to use 4 or 5 instructions to do the job of one is a no-go today. But there are still people believing in RISC lies... It would just be horrible to code on. Besides, it would have very poor code density. |
||||
23 May 2018, 14:27 | #510 | ||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
It does tell us nothing about the efficiency! It does not tell us how it behaves under heavy load. Quote:
(talking about the host - not the emulated cpu) If you look at benchmarks of OSv, MirageOS or other Unikernel or baremetal approaches the overhead of systems like Windows or Linux eats up at least 5% of your performance.... But thats all not really important for now. |
||
23 May 2018, 14:40 | #511 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,350
|
Then what do you call efficiency here ?
Is it the number of host instructions per emulated instructions or something like that ? But what do you call heavy load here ? Host side ? Other apps eating cpu ? Chipset config needing more cpu power than usual ? Or amiga side ? Is it emulated cpu doing heavy things ? |
23 May 2018, 15:02 | #512 | |||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
But ... would be an other interesting number! efficiency in this case (for me lacking a better word): Percentage of cpu-time time the host cpu spends in executing translated (former 68K) instructions. the 1-x time would than include: host-OS, host-gfx, host-sound, host-io, UAE-chipset-emu, UAE-contolling - housekeeping and synchronizing, JIT overhead, 68K-decoding, ... Quote:
Quote:
Last edited by Gorf; 23 May 2018 at 15:21. |
|||
23 May 2018, 16:05 | #513 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,350
|
Quote:
Quote:
A loaded emulated machine won't make much of a difference, will it ? There will just be less time spent waiting. For the chipset, only rare corner cases need to really push the cpu, and it seems to count less today than it used to in the past (because machines are faster). And this is anyway a typical case where the fpga can do the work. Overall, perhaps just reading the cpu% shown by either task manager of winuae itself, will give you some numbers. But let's be honest : if you expect nice numbers in nice cells of a nice table, then this simply can not be done. |
||
23 May 2018, 16:42 | #514 | |||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
I was not expecting a definitive number ... I know it depends on very many variables. The emulation of some retracer with more or less static output to a p96 screen and no sound is probably more "effective" than AGA-Doom at max resolution... And some things have a upper limit in usage, as things are supply done after some time, while other stuff may use up a constant percentage, no matter how fast your cpu is... I am just asking ยด, because it would give us a rough estimation how much room for improvement there is. Quote:
Quote:
just gathering information pice by pice |
|||
23 May 2018, 16:55 | #515 | ||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
To make this clear: this is an optional optimization. It would be the 3. step and is just an idea. But this idea could be useful. Step one: interpreted execution of code. FPGA can assist in decoding and translating. Good speed-up but slower than JIT. Step two: JIT on host cpu. Identifying hotspots and optimizing execution. Buffering translated code. Step three: identifying persistent hotspots and generate specialized cores in the FPGA. This needs to be done by spare cores, that are not utilizes otherwise. We would NOT create "specialized 68k cores" or "specialized Host-CPU cores", but rather special CL-cores or special DSPs - just capable of executing one former loop of code by sending a single instruction and a range of data. |
||
23 May 2018, 17:04 | #516 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,810
|
The calculator already runs directly.
Quote:
How hard can it be? |
|
23 May 2018, 17:06 | #517 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,350
|
Exactly !
Quote:
Quote:
But in any case emulator settings are what have the most impact. Quote:
You could just have some sort of ultra-wide (simd) alu. Then when a loop is identified which has all its instructions supported there (and with no bad dependencies), it can be "rewritten" to use that special hardware. I can tell i'd find this kind of hardware autovectorization a lot more sexy than adding dumb simd extensions to the instruction set... |
|||
23 May 2018, 17:12 | #518 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,350
|
That's not what i understood from what you wrote a few posts earlier :
Quote:
As hard as writing a new OS is, no more no less. |
|
23 May 2018, 17:18 | #519 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
Quote:
but this was a reply to my FPGA-CPU-hybrid emulator idea. And in this case it would need to stick to the legacy 68K ISA. This special SIMD-Unit (reconfigurable or not) would be part of the enhanced JIT. Even Intel is playing with this ideas: to use Intel's own SPMD compiler to create special Cl cores in FPGAs that are more efficient than e.g. generic CL-cores in your gfx-card. Edit: ah - "lot more sexy" instead of "not more sexy" - i misread the first time ;-) YES: it is fascinating but a lot of work.... Last edited by Gorf; 23 May 2018 at 17:31. Reason: reading it again |
|
23 May 2018, 17:30 | #520 |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,414
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Has anyone got an Amiga 1200 T12 Gen II? | ccorkin | support.Hardware | 10 | 14 April 2017 23:18 |
What do people think about this as next Gen AMIGA? | Gunnar | Amiga scene | 111 | 05 July 2014 20:59 |
Classic 1st Gen EA games for the Amiga | illy5603 | support.Games | 8 | 03 July 2010 02:59 |
Next-gen Amiga development | LaundroMat | Coders. General | 3 | 05 October 2002 00:30 |
|
|