English Amiga Board - View Single Post

Gorf · 22 May 2018, 17:39

Quote:

Originally Posted by Megol

If the host use the FPGA with a akiko type interface, that is the host writing bytes to be translated to a memory mapped area and reading the result synchronization is easy. For instance one could probably stall the reading of the translated code till the FPGA is finished with its work. But that would be very inefficient.

yes, that would be madness

Quote:

So a more reasonable interface is letting the host direct the FPGA to a block of code to be translated with a target buffer being either implicit or explicit.
Then the translation hardware will work until some limit is reached producing a block of code. Synchronization can be either polling the hardware until it signals completion or the host getting an interrupt signal from the FPGA when done.

preferably via some cpu interconnect mechanism

Quote:

Then comes the problem of branch address translation. Unlike naive code translation this isn't a mechanical process, 68k branch addresses have to be looked up and if translated code for that address is found inserted. If it isn't translated yet one can imagine the FPGA going down that path to translate the new block of code but that isn't realistic for several reasons. Path explosion being the obvious one.

That is one reason why I suggested to do implement the memory controller on the FPGA-side of things - that could provide a mechanism to keep track of branch addresses ...

Quote:

A software JIT can quickly switch between executing native code and interpreting 68k code. If we remove the interpreter the host have to point the FPGA to the code block to be executed, wait until translation is done and then start executing again. I think the overheads would be huge.

I already linked a theses here in this tread, that describes exactly that.
"Bochs" x86 emulator on a PPC-FPGA combo - only the instruction decoding was done in the FPGA.
Despite of the overhead the speed was improved.
Today the connection between CPU and FPGA is much faster...

Quote:

The host processor have a highly optimized cache subsystem, what exactly would the FPGA be able to do faster?

The host CPU should use its cache ans memory bus for the translated blocks of code. It should stay in "native" mode as long as possible.
Instruction decoding and translating can be realized much better in FPGA, due to parallelism and the possibility to build effective pipelines.
That is the strength of the FPGA, while very fast ALUs are part of the CPU.

Quote:

That is a good idea. Perhaps it would be better to do the translation on another core too?

See above. FPGA can do it faster.
(and without risking cash flushes or other resource conflicts)

Quote:

That isn't really a good idea - you'd have to generate specialized 68k cores in realtime!

I also posted a paper regarding this issue. This has been done before - latency is just a few milliseconds.