English Amiga Board


Go Back   English Amiga Board > Support > support.Other

 
 
Thread Tools
Old 24 November 2022, 22:18   #101
OldB0y
Registered User
 
Join Date: Jan 2009
Location: Letchworth/UK
Posts: 86
This is both hilarious, and amazing at the same time.

I can now run Quake on my FPU less LC060 A3660 equipped A4000 - and its slower than when I first tried the leaked unofficial Amiga Quake port on my A1200 with 68882 equipped Blizzard 1230 II lol.

But it does work!
OldB0y is offline  
Old 24 November 2022, 22:30   #102
alenppc
Registered User
 
Join Date: Apr 2012
Location: Canada
Age: 44
Posts: 910
Quote:
Originally Posted by Thomas Richter View Post
No matter what, I thank you a lot for helping me, and it seems we even found something interesting and new about the 68060 masks that does not seem to be documented anywhere else.

Just tested this last version and did not get any hits at all. I tried both Quake and Quake2.


The earlier version that did give me hits, gave me so many with Q2 that it filled the entire RAD drive (basically the log was over 800k, although I did not post that one).
alenppc is offline  
Old 24 November 2022, 22:31   #103
alenppc
Registered User
 
Join Date: Apr 2012
Location: Canada
Age: 44
Posts: 910
Quote:
Originally Posted by OldB0y View Post
This is both hilarious, and amazing at the same time.

I can now run Quake on my FPU less LC060 A3660 equipped A4000 - and its slower than when I first tried the leaked unofficial Amiga Quake port on my A1200 with 68882 equipped Blizzard 1230 II lol.

But it does work!

You can grab the NovaCoder's softfloat version (posted in the Quake thread) and it should be slightly faster, although the 3660 is an awful card so probably not by much.
alenppc is offline  
Old 24 November 2022, 22:35   #104
DisasterIncarna
Registered User
 
DisasterIncarna's Avatar
 
Join Date: Oct 2021
Location: England
Posts: 1,237
what witchcraft is next? btw i'm guessing this will merge with your mmu libs? or is it, it's own thing?
DisasterIncarna is offline  
Old 25 November 2022, 05:43   #105
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by alenppc View Post
Just tested this last version and did not get any hits at all. I tried both Quake and Quake2.
Thanks, so I guess we're done then. Interesting CPU bug then, it only seems to affect fmovem <mem>,register-list. Just as a precaution, I also added the same workaround to fmovem <mem>,control-registers and frestore. The remaining instructions seem to be unaffected.


This is why it is so important to make tests on real hardware...


Again, this was very helpful, thanks a lot!
Thomas Richter is offline  
Old 25 November 2022, 05:44   #106
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by DisasterIncarna View Post
what witchcraft is next? btw i'm guessing this will merge with your mmu libs? or is it, it's own thing?

No, this program will remain stand-alone. It is only loosely related to the mmu library and does not depend on it. It supports it, though.


Before you ask: No, you cannot emulate a MMU by software with a similar trick.
Thomas Richter is offline  
Old 25 November 2022, 07:05   #107
Michael
A1260T/PPC/BV/SCSI/NET
 
Michael's Avatar
 
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 840
So, demos like https://www.pouet.net/prod.php?which=2308 are not possible with this.
Michael is offline  
Old 25 November 2022, 07:38   #108
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
It depends on what you mean by "possible". Real-time? No. Running - yes, provided the thing does not kill the operating system.

But that's really not the point of the software (but then, I never got the point with demos in first place).
Thomas Richter is offline  
Old 25 November 2022, 08:51   #109
Michael
A1260T/PPC/BV/SCSI/NET
 
Michael's Avatar
 
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 840
Well, Impossible fails with a guru here

Another creation form same coders also fpu based
https://www.pouet.net/prod.php?which=2306

This one mostly works, but extreamly slow in places
below 1 fps, when a proper 060 keeps a good fps.

So I suspect that both are system friendly, since later
demos also work on rtg, and you can't kill it there.
Michael is offline  
Old 25 November 2022, 08:59   #110
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by Michael View Post
Well, Impossible fails with a guru here.

Same as above. Please run SegTracker, Sashimi and MuForce with the DISPC option, redirect the output to RAD: and please provide it here. I cannot do anything about it without knowing further details.
Thomas Richter is offline  
Old 25 November 2022, 18:21   #111
alenppc
Registered User
 
Join Date: Apr 2012
Location: Canada
Age: 44
Posts: 910
I tried both of these demos and have not seen any crashes or MuForce hits.


Annoyingly however, "Impossible" is one of those demos that only presents a black screen on NTSC machines, even if I switch the WB to a PAL screenmode, the demo will still start in machine's native video mode (NTSC) and simply run music but not display anything. No MuForce hits however.

@Thomas, please check your PMs.
alenppc is offline  
Old 26 November 2022, 03:45   #112
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,193
Quote:
Originally Posted by Thomas Richter View Post
As a toy project to play with "why not", but as a realistic system design, the answer is quite simple: With software emulation, you would go over many cycles of execution and instruction interpretation just for a single vector instruction, thus there is nothing to be gained by this approach. It will be just slower than mutliple scalar 680x0 instructions.
You make an incorrect assumption about doing multiple scalar ops to mimic a vector. I was going to do actual vector ops in hardware on a CPLD chip. If I used a fixed ABI for the vector chip, I could just use Assembly macros to interface with the CPLD. I guess only a custom scalar emulation would be necessary after all.
Samurai_Crow is offline  
Old 26 November 2022, 11:00   #113
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,401
Quote:
Originally Posted by Samurai_Crow View Post
You make an incorrect assumption about doing multiple scalar ops to mimic a vector. I was going to do actual vector ops in hardware on a CPLD chip. If I used a fixed ABI for the vector chip, I could just use Assembly macros to interface with the CPLD. I guess only a custom scalar emulation would be necessary after all.
Ok, but the trap overhead of reaching your proposed implementation still exists.
Karlos is online now  
Old 26 November 2022, 11:45   #114
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by Samurai_Crow View Post
You make an incorrect assumption about doing multiple scalar ops to mimic a vector. I was going to do actual vector ops in hardware on a CPLD chip.
Correct, but how do you feed the chip? The trouble is that you need many more instructions to feed the chip with data, namely going through the exception processing, than it would actually take to process each element of the vector manually. This would only make sense if the vectors are hundreds of bytes long, and the chip would be able to read them via DMA as the large overhead needs to be smaller than the scalar processing of data. Thus, to give you a practical example: Even on a 68881, multiplying four numbers takes approximately 200 cycles. Going through the exception processing takes probably 1000 cycles. Even if the actual vector processor takes only a single cycle, going through the exception processing is still slower than just scalar processing.
Thomas Richter is offline  
Old 26 November 2022, 15:05   #115
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,193
Quote:
Originally Posted by Thomas Richter View Post
Correct, but how do you feed the chip?
Memory-mapped I/O to absolute addresses whose destination is specified in MOVE16 operations.
Quote:
Originally Posted by Thomas Richter View Post
The trouble is that you need many more instructions to feed the chip with data, namely going through the exception processing, than it would actually take to process each element of the vector manually. This would only make sense if the vectors are hundreds of bytes long, and the chip would be able to read them via DMA as the large overhead needs to be smaller than the scalar processing of data. Thus, to give you a practical example: Even on a 68881, multiplying four numbers takes approximately 200 cycles. Going through the exception processing takes probably 1000 cycles.
68881 was not pipelined nor parallel.
Quote:
Originally Posted by Thomas Richter View Post
Even if the actual vector processor takes only a single cycle, going through the exception processing is still slower than just scalar processing.
More incorrect assumptions. Exception handling would only be needed for in-order non-load-store operations. Thus only extracting values from the vector unit justifies a wait state.
Samurai_Crow is offline  
Old 26 November 2022, 15:23   #116
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by Samurai_Crow View Post
Memory-mapped I/O to absolute addresses whose destination is specified in MOVE16 operations.
Then, assume that whoever wants to use the chip writes to memory mapped registers. That is much faster than going through the emulator trap.
Quote:
Originally Posted by Samurai_Crow View Post
68881 was not pipelined nor parallel.
You don't understand what I'm trying to say. It does not matter how parallel a processor is. Using the 68881 would still outperform an I/O mapped vector processor if the interface to use this processor goes through an emulation trap, unless the vectors are really large.
Quote:
Originally Posted by Samurai_Crow View Post
More incorrect assumptions. Exception handling would only be needed for in-order non-load-store operations. Thus only extracting values from the vector unit justifies a wait state.
If you want to interface the chip with assembler instructions, that's an emulator trap. And that's simply not a good idea, that's all I'm trying to tell you. It is just going to be slower than running through a library vector, which is a suggested interface, and a much quicker one. Tools like MuRedox are there just to prevent the emulator trap.
Thomas Richter is offline  
Old 27 November 2022, 03:40   #117
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,193
@Thomas Richter
Mapping opcodes to the coprocessor interface would definitely be preferred. However, some of the general-purpose coprocessor circuits present in the 68020 and 68030 were offloaded to an external chip on the 68040 and 68060. Unless that chip design gets rereleased as an FPGA softcore, I'd be unable to replicate it without going all Gunnar von Boehn and hardwiring the floating point vectors into the CPU core and ditching the 68LC040 chip altogether. Furthermore, adding the coprocessor softcore to the vector unit would probably increase the size of the vector unit such that it would require a full fledged FPGA instead of a CPLD. (I kind of doubt it would fit in a CPLD anyway but that raises the cost nonetheless.)
Samurai_Crow is offline  
Old 27 November 2022, 08:42   #118
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Sorry, I'm completely confused on what you are proposing here. First, you say "coprocessor circuits present in the 68020 and 68030 were offloaded to an external chip on the 68040 and 68060". Nothing was "offloaded to an external chip" there. Actually, nothing was offloaded to external hardware. The missing opcodes are offloaded to external software (the fpsp.resource). So do you want to say "I plan an external chip that replaces the fpsp.resource"? If so, softIEEE has no role here. This process works (a bit) different compared to SoftIEEE, and it would need to go through a CPU library, or lacking this, an fpsp.resource.

What has Gunnar to do with all this? Even if you disable the FPU on his design, the FPU remains active for elementary math and would continue to process data for such elementary operations in only 56 bits rather than the full precision offered by SoftIEEE. Thus, at best you can offload some transcendental functions to an external chip, but whether it makes sense to go through the emulator trap rather than his "millicode" I cannot judge.

Third, what has all this to do with a vector unit, and how does SoftIEEE plays in here? As said, going through an emulator trap does not make sense, it would be only slower than scalar math operations carried out multiple times, so as a software interface to an external chip it makes little sense. I

If you propose to use SoftIEEE as some kind of "prototype system" where you catch (lacking hardware) the instructions by software - well, you can do that as of today. It would make sense there as temporary solution just to test the chip until the full interface becomes available in silicon. Just implement a softieee.library. Will I do that? No - that's not the purpose of the project, but the interface is open and documented, so it is doable, and I can help you to understand how the interface works.

If you plan to do that as an external chip for the 68LC040 to provide an FPU - that is possible, though again not exactly fast, so I'm not sure how competitive such a design could be.

If you plan that as an external chip for Gunnar's 68EC080, I guess you better talk to Gunnar to get it linked to the system as some sort of coprocessor interface. Good luck with that. The chip currently lacks the ability to re-route all FPU instructions, and even if you can re-route the transcendental functions as a subset to an external chip, it would likely not perform very well, but that's not my problem at all.

Last but not least, I doubt any soft of FPU can be implemented on a CPLD, these chips are much too tiny for such complex operations. You can probably implement a CORDIC logic in an FPGA to get the missing functions, but you would still need to find a way how to interface this chip to either a Mot chip, or Gunnar's EC080. For the 68040 and related chips, there is no coprocessor interface, thus some software layer is necessary. Yes, SoftIEEE can do that (minus vector instructions), and the answer is that you then need the right softieee.library. Doable, read the documentation, then ask me in case you have additional questions.

Thus, to conclude: Please write a concise project proposal of what exactly you are attempting to do. I cannot really make much sense of what you have written so far - sorry.
Thomas Richter is offline  
Old 27 November 2022, 09:15   #119
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,401
I think, to summarise, SoftIEEE depends on the missing instruction exception processing handling in order to intercept any unimplemented 6888x instruction the CPU encounters. The mechanism is used to hand over to a software emulation of the missing operation. Much of the overhead is in the exception processing itself, so even if you had an eternal hardware device that could perform the operation itself, the benefit would be minimal. For a traditional SIMD unit, where the onus is on throughput, the proposition would only turn a computational profit for vectors that are very large and that the external unit would also need to load and store by itself, to be faster than a software only solution.

Vector stuff aside, this notion that the exception overhead is dominant is why I'm curious about the applicability of using the exception trap to patch the caller with a direct call to a handler function in a manner similar to OxyPatcher/CyberPatcher.
Karlos is online now  
Old 27 November 2022, 09:25   #120
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Please make that MuRedox as the above projects are dead as a dodo. This is next on my list, namely update MuRedox, but for that, first SoftIEEE needs to become stable, and I need input on that - which is exactly the purpose of this thread.

This said, even with MuRedox there is an overhead, namely copying the sources in, and the targets out, and interfacing the emulating library. It may cut down the number of instructions in the emulation path probably to one tenth of the current instruction count, but it's still very noticable. Even 20 instructions (in reality, it is more - even with MuRedox in place) plus one vector instruction is a noticeable overhead compared to 4 scalar instructions. Thus, you would really need larger vectors, even with MuRedox in place.
Thomas Richter is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Demos to test FPU on SX32 MkII (020+FPU) Rochabian request.Demos 1 21 April 2020 03:03
Betatesting Amiga and C64 Forever 7 michaelz support.Amiga Forever 23 22 June 2017 16:58
[obsolete] EoB 2 Thread AGA and translations betatesting Marcuz project.Amiga Game Factory 17 21 August 2008 22:47
Frederic's Emulator inside and Emulator thread Fred the Fop Retrogaming General Discussion 22 09 March 2006 07:31

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 13:34.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10456 seconds with 13 queries