24 November 2022, 22:18 | #101 |
Registered User
Join Date: Jan 2009
Location: Letchworth/UK
Posts: 86
|
This is both hilarious, and amazing at the same time.
I can now run Quake on my FPU less LC060 A3660 equipped A4000 - and its slower than when I first tried the leaked unofficial Amiga Quake port on my A1200 with 68882 equipped Blizzard 1230 II lol. But it does work! |
24 November 2022, 22:30 | #102 | |
Registered User
Join Date: Apr 2012
Location: Canada
Age: 44
Posts: 910
|
Quote:
Just tested this last version and did not get any hits at all. I tried both Quake and Quake2. The earlier version that did give me hits, gave me so many with Q2 that it filled the entire RAD drive (basically the log was over 800k, although I did not post that one). |
|
24 November 2022, 22:31 | #103 | |
Registered User
Join Date: Apr 2012
Location: Canada
Age: 44
Posts: 910
|
Quote:
You can grab the NovaCoder's softfloat version (posted in the Quake thread) and it should be slightly faster, although the 3660 is an awful card so probably not by much. |
|
24 November 2022, 22:35 | #104 |
Registered User
Join Date: Oct 2021
Location: England
Posts: 1,237
|
what witchcraft is next? btw i'm guessing this will merge with your mmu libs? or is it, it's own thing?
|
25 November 2022, 05:43 | #105 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
Quote:
This is why it is so important to make tests on real hardware... Again, this was very helpful, thanks a lot! |
|
25 November 2022, 05:44 | #106 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
Quote:
No, this program will remain stand-alone. It is only loosely related to the mmu library and does not depend on it. It supports it, though. Before you ask: No, you cannot emulate a MMU by software with a similar trick. |
|
25 November 2022, 07:05 | #107 |
A1260T/PPC/BV/SCSI/NET
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 840
|
So, demos like https://www.pouet.net/prod.php?which=2308 are not possible with this.
|
25 November 2022, 07:38 | #108 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
It depends on what you mean by "possible". Real-time? No. Running - yes, provided the thing does not kill the operating system.
But that's really not the point of the software (but then, I never got the point with demos in first place). |
25 November 2022, 08:51 | #109 |
A1260T/PPC/BV/SCSI/NET
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 840
|
Well, Impossible fails with a guru here
Another creation form same coders also fpu based https://www.pouet.net/prod.php?which=2306 This one mostly works, but extreamly slow in places below 1 fps, when a proper 060 keeps a good fps. So I suspect that both are system friendly, since later demos also work on rtg, and you can't kill it there. |
25 November 2022, 08:59 | #110 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
|
25 November 2022, 18:21 | #111 |
Registered User
Join Date: Apr 2012
Location: Canada
Age: 44
Posts: 910
|
I tried both of these demos and have not seen any crashes or MuForce hits.
Annoyingly however, "Impossible" is one of those demos that only presents a black screen on NTSC machines, even if I switch the WB to a PAL screenmode, the demo will still start in machine's native video mode (NTSC) and simply run music but not display anything. No MuForce hits however. @Thomas, please check your PMs. |
26 November 2022, 03:45 | #112 | |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,193
|
Quote:
|
|
26 November 2022, 11:00 | #113 | |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,401
|
Quote:
|
|
26 November 2022, 11:45 | #114 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
Correct, but how do you feed the chip? The trouble is that you need many more instructions to feed the chip with data, namely going through the exception processing, than it would actually take to process each element of the vector manually. This would only make sense if the vectors are hundreds of bytes long, and the chip would be able to read them via DMA as the large overhead needs to be smaller than the scalar processing of data. Thus, to give you a practical example: Even on a 68881, multiplying four numbers takes approximately 200 cycles. Going through the exception processing takes probably 1000 cycles. Even if the actual vector processor takes only a single cycle, going through the exception processing is still slower than just scalar processing.
|
26 November 2022, 15:05 | #115 | |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,193
|
Memory-mapped I/O to absolute addresses whose destination is specified in MOVE16 operations.
Quote:
More incorrect assumptions. Exception handling would only be needed for in-order non-load-store operations. Thus only extracting values from the vector unit justifies a wait state. |
|
26 November 2022, 15:23 | #116 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
Quote:
|
|
27 November 2022, 03:40 | #117 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,193
|
@Thomas Richter
Mapping opcodes to the coprocessor interface would definitely be preferred. However, some of the general-purpose coprocessor circuits present in the 68020 and 68030 were offloaded to an external chip on the 68040 and 68060. Unless that chip design gets rereleased as an FPGA softcore, I'd be unable to replicate it without going all Gunnar von Boehn and hardwiring the floating point vectors into the CPU core and ditching the 68LC040 chip altogether. Furthermore, adding the coprocessor softcore to the vector unit would probably increase the size of the vector unit such that it would require a full fledged FPGA instead of a CPLD. (I kind of doubt it would fit in a CPLD anyway but that raises the cost nonetheless.) |
27 November 2022, 08:42 | #118 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
Sorry, I'm completely confused on what you are proposing here. First, you say "coprocessor circuits present in the 68020 and 68030 were offloaded to an external chip on the 68040 and 68060". Nothing was "offloaded to an external chip" there. Actually, nothing was offloaded to external hardware. The missing opcodes are offloaded to external software (the fpsp.resource). So do you want to say "I plan an external chip that replaces the fpsp.resource"? If so, softIEEE has no role here. This process works (a bit) different compared to SoftIEEE, and it would need to go through a CPU library, or lacking this, an fpsp.resource.
What has Gunnar to do with all this? Even if you disable the FPU on his design, the FPU remains active for elementary math and would continue to process data for such elementary operations in only 56 bits rather than the full precision offered by SoftIEEE. Thus, at best you can offload some transcendental functions to an external chip, but whether it makes sense to go through the emulator trap rather than his "millicode" I cannot judge. Third, what has all this to do with a vector unit, and how does SoftIEEE plays in here? As said, going through an emulator trap does not make sense, it would be only slower than scalar math operations carried out multiple times, so as a software interface to an external chip it makes little sense. I If you propose to use SoftIEEE as some kind of "prototype system" where you catch (lacking hardware) the instructions by software - well, you can do that as of today. It would make sense there as temporary solution just to test the chip until the full interface becomes available in silicon. Just implement a softieee.library. Will I do that? No - that's not the purpose of the project, but the interface is open and documented, so it is doable, and I can help you to understand how the interface works. If you plan to do that as an external chip for the 68LC040 to provide an FPU - that is possible, though again not exactly fast, so I'm not sure how competitive such a design could be. If you plan that as an external chip for Gunnar's 68EC080, I guess you better talk to Gunnar to get it linked to the system as some sort of coprocessor interface. Good luck with that. The chip currently lacks the ability to re-route all FPU instructions, and even if you can re-route the transcendental functions as a subset to an external chip, it would likely not perform very well, but that's not my problem at all. Last but not least, I doubt any soft of FPU can be implemented on a CPLD, these chips are much too tiny for such complex operations. You can probably implement a CORDIC logic in an FPGA to get the missing functions, but you would still need to find a way how to interface this chip to either a Mot chip, or Gunnar's EC080. For the 68040 and related chips, there is no coprocessor interface, thus some software layer is necessary. Yes, SoftIEEE can do that (minus vector instructions), and the answer is that you then need the right softieee.library. Doable, read the documentation, then ask me in case you have additional questions. Thus, to conclude: Please write a concise project proposal of what exactly you are attempting to do. I cannot really make much sense of what you have written so far - sorry. |
27 November 2022, 09:15 | #119 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,401
|
I think, to summarise, SoftIEEE depends on the missing instruction exception processing handling in order to intercept any unimplemented 6888x instruction the CPU encounters. The mechanism is used to hand over to a software emulation of the missing operation. Much of the overhead is in the exception processing itself, so even if you had an eternal hardware device that could perform the operation itself, the benefit would be minimal. For a traditional SIMD unit, where the onus is on throughput, the proposition would only turn a computational profit for vectors that are very large and that the external unit would also need to load and store by itself, to be faster than a software only solution.
Vector stuff aside, this notion that the exception overhead is dominant is why I'm curious about the applicability of using the exception trap to patch the caller with a direct call to a handler function in a manner similar to OxyPatcher/CyberPatcher. |
27 November 2022, 09:25 | #120 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,302
|
Please make that MuRedox as the above projects are dead as a dodo. This is next on my list, namely update MuRedox, but for that, first SoftIEEE needs to become stable, and I need input on that - which is exactly the purpose of this thread.
This said, even with MuRedox there is an overhead, namely copying the sources in, and the targets out, and interfacing the emulating library. It may cut down the number of instructions in the emulation path probably to one tenth of the current instruction count, but it's still very noticable. Even 20 instructions (in reality, it is more - even with MuRedox in place) plus one vector instruction is a noticeable overhead compared to 4 scalar instructions. Thus, you would really need larger vectors, even with MuRedox in place. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Demos to test FPU on SX32 MkII (020+FPU) | Rochabian | request.Demos | 1 | 21 April 2020 03:03 |
Betatesting Amiga and C64 Forever 7 | michaelz | support.Amiga Forever | 23 | 22 June 2017 16:58 |
[obsolete] EoB 2 Thread AGA and translations betatesting | Marcuz | project.Amiga Game Factory | 17 | 21 August 2008 22:47 |
Frederic's Emulator inside and Emulator thread | Fred the Fop | Retrogaming General Discussion | 22 | 09 March 2006 07:31 |
|
|