English Amiga Board


Go Back   English Amiga Board > Support > support.Other

 
 
Thread Tools
Old 04 January 2023, 21:15   #141
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,517
@Thomas Richter

Could a variant version that has reduced precision be faster? I appreciate this isn't the goal but it seems to me that a lot of users with faster 060s tend to use their FPU for gaming rather than anything requiring full extended precision.
Karlos is online now  
Old 04 January 2023, 22:26   #142
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,326
I afraid you are expecting too much. Even if it would be at the speed of mathieeesingbas, it would still be too slow for gaming - please make the math yourself. You would be still below 6fps.

Anyhow, the framework is there, the architecture is open, the interface is documented. All that needs to be done is a re-implementation of the softieee.library. The complicated parts such as the FPU emulation, instruction decoding or online jitting is already taken care of by the SoftIEEE binary (not the library) and MuRedox. These binaries do not care how the math core works - and that is the softieee.library.
Thomas Richter is offline  
Old 04 January 2023, 22:41   #143
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,517
Fair enough. Fixed point builds on EC parts for the win then.
Karlos is online now  
Old 04 January 2023, 23:46   #144
rabidgerry
Registered User
 
rabidgerry's Avatar
 
Join Date: Nov 2018
Location: Belfast
Posts: 1,542
Quote:
Originally Posted by Thomas Richter View Post
I afraid it wouldn't get any faster. The current speed is at 1/3 of the speed of mathieeedoubbas, the latter is already quite optimized and offers only 56 bit precision. Even if it would match the speed of doubbas, or even singbas, it would still remain at 3fps or maybe 6fps, below "playable".

SoftIEEE is not supposed to replace a full fledged FPU. If you need the speed of a FPU, get a hardware FPU. It is just supposed to provide an FPU emulation for those programs whose authors were too lazy to go through the system math libraries.
I have 060 with full FPU. I simply tried the SoftIEEE as an experiment in conjunction with an LC I bought as someone had suggested it to me that I should try it. This was after I noted the LC060 was able to be overclocked quite comfortably but the games you might want the overclocking for all seemed to need the FPU in some capacity. So it was a nice little experiment but as you rightly point out it wont solve the issues for LC users will have who might want to play games like Duke Nukem etc or even Doom Attack AIO as I discovered.
rabidgerry is offline  
Old 05 January 2023, 22:45   #145
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,050
Quote:
Originally Posted by Thomas Richter View Post
frestore and fsave are of course emulated by SoftIEEE, that's not the issue. However, they cannot be replaced by jitter functions that *do not* go through an emulator trap. The trouble is that calling a "jitted" function takes at least 4 bytes (JSR.W), but there aren't 4 bytes available to patch.

Replacing them by "traps" does not provide any advantage - a trap is nothing but an exception, but then, there is nothing gained as that replaces just one exception (the original one as captured by SoftIEEE) with another exception (that of the trap).

The whole trick of MuRedox is that there are no exceptions involved anymore.
Yes, you right, i forget that these are traps too. But i think that AllocTrap version can be a few fastest version, than F-line emulation version. Because no recognise code and maybe less usage of registers
Don_Adan is offline  
Old 09 January 2023, 11:36   #146
mfilos
Paranoid Amigoid
 
mfilos's Avatar
 
Join Date: Mar 2008
Location: Athens/Greece
Age: 45
Posts: 1,978
Thomas I see version 40.6 is on Aminet (from yesterday) but the archive has version 40.5 (binary + library).
mfilos is offline  
Old 09 January 2023, 12:16   #147
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,888
@THOR - apologies upfront for my question - i'm curious how from your perspective feasible is to implement such emulation in other than MC68K ISA - so emulate in software 881/882 in additional SW/HW but still keeping your MC68K frontend - in other words - implement physical float calculation in separate solution and use such virtual 881/882 from native CPU.

Still have impression that i'm unable to express clearly my question so example:
Your SoftIEEE library but float numeric is implemented in software in different HW connected to Amiga (for example one of cheap SOC using RISC V or ARM ISA if they are equipped with float co-processor and for example DSP and such SoC is running like 300...400MHz).

How feasible is such hybrid implementation from your perspective? - lets skip numeric (i.e. not MC68K) part from the question.

Thx!
pandy71 is offline  
Old 09 January 2023, 12:29   #148
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,517
I think that question has been raised already. Someone asked about offloading the instruction to PPC (I think that's what they said), but certainly in that case there's a lot to contend with. Anything along the WarpOS route would be many orders of magnitude slower than the current software implementation.

Unless you have an extremely low latency way to do it, I don't think you'll get away with offloading externally.
Karlos is online now  
Old 09 January 2023, 13:00   #149
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,888
Quote:
Originally Posted by Karlos View Post
I think that question has been raised already. Someone asked about offloading the instruction to PPC (I think that's what they said), but certainly in that case there's a lot to contend with. Anything along the WarpOS route would be many orders of magnitude slower than the current software implementation.

Unless you have an extremely low latency way to do it, I don't think you'll get away with offloading externally.
I'm aware that overhead related is high but still 40..100MHz MC68K HW may be slower in software float calculations than external modern HW - problem is standard interface between application and external HW float implementation.
Nowadays small SoC's are equipped with HW float (albeit 32 bit) and usually DSP, some of them also capable to do some fast low precision integer dedicated NPU. Such SoC cost 2..3$ and beside glue logic has everything to do such functionality - so this was my question - small SoC incapable to perform full MC68K emulation but capable to offload for example float calculation at a fraction of the cost of original 881/882 (not mentioning 40/60 where coprocessor interface may be not even implemented on board).
Creating some API standard and separating frontend from physical implementation of the float calculation could be something interesting.

Ages ago there was for example WEITEK company that produce many solutions present as simple I/O in CPU address space... so this is question about something similar performing 4..6 times faster than MC68K in float implementation.

Or something like this https://micromegacorp.com/umfpu64.html easy to hook to even MC68000.

Some report comparing 8 bit uC in softfloat vs such softfloat implemented externally https://micromegacorp.com/downloads/...g%20WinAVR.pdf - limitation is of course due SPI inteerface but even in such case difference is obvious - assuming different way of connecting such external HW to significantly reduce communication overhead may be sane option for replacing 881/882 with SoftIEEE and receive better results.


Was curious about THOR opinion if from his perspective this is feasible to separate physical calculation implementation from frontend so he is not responsible for any foreign bugs but still can control SoftIEEE as owner and for example focus on his pure MC68K float implementation. So semi open standard.

Last edited by pandy71; 09 January 2023 at 13:37.
pandy71 is offline  
Old 09 January 2023, 13:57   #150
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,326
Quote:
Originally Posted by mfilos View Post
Thomas I see version 40.6 is on Aminet (from yesterday) but the archive has version 40.5 (binary + library).
No worries, this is the right version. I apparently forgot to bump the revision, but the binaries are correct.
Thomas Richter is offline  
Old 09 January 2023, 15:47   #151
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,326
Quote:
Originally Posted by pandy71 View Post
@THOR - apologies upfront for my question - i'm curious how from your perspective feasible is to implement such emulation in other than MC68K ISA - so emulate in software 881/882 in additional SW/HW but still keeping your MC68K frontend - in other words - implement physical float calculation in separate solution and use such virtual 881/882 from native CPU.
As said before, it is technically possible. All you need to do is to implement the softieee.library interface. This interface would take the parameters and forward it to the hardware.


However, the resulting solution would still not on par with a hardware FPU. Let's make a couple of computations: A hardware multiplication on the 68060 is ~2 cycles if I recall. The MuRedox call-in overhead is roughly one magnitude larger (~20 cycles), that of SoftIEEE through exception processing a lot larger (~200 cycles). To this, the softieee.library still has to forward parameters to the hardware, and perform the operation there. For example, for the 68882, you need to emulate the coprocessor interface in software (probably another 20 cycles) and then the 68882 has to execute the multiplication (which is another >20 cycles), so in the end, you are at about 60 to 100 cycles minimum. That's almost two magnitudes slower than the 68060.



The softieee.library multiplication engine is probably 200 cycles (just house numbers), so it is slower, but not that much slower. This is also the reason why the 68882-based "hardware accelerator" solutions were not really working well. The communication overhead to the FPU eat up the performance improvements of the FPU. The 68882 only works well with the 68020/030 hardware interface where hardware implements the interface.


Quote:
Originally Posted by pandy71 View Post
Still have impression that i'm unable to express clearly my question so example:
Your SoftIEEE library but float numeric is implemented in software in different HW connected to Amiga (for example one of cheap SOC using RISC V or ARM ISA if they are equipped with float co-processor and for example DSP and such SoC is running like 300...400MHz).
The softieee.library is a "numerics core". So it takes one or two extended precision floating point numbers in memory, and its "contract" defines that it places the result back in memory. SoftIEEE and MuRedox follow this contract. They do not care *how* the library does its job. This is "currently" an all-software implementation, but nobody stops you from implementing your own softieee.library. Such an alternative implementation would read the operands from memory, forward it to the hardware, and read the results back. Thus, while this construction would be faster, I doubt it would be *much* faster. I would expect a factor of 2 or 3 (see above for calculations), but that still places you one order of magnitude slower than native code on the 68060.


The speed would be, according to this estimate, approximately on par with the mathieeedoubbas.library.
Thomas Richter is offline  
Old 09 January 2023, 15:52   #152
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,326
Quote:
Originally Posted by Karlos View Post
I think that question has been raised already. Someone asked about offloading the instruction to PPC (I think that's what they said), but certainly in that case there's a lot to contend with. Anything along the WarpOS route would be many orders of magnitude slower than the current software implementation.

Pretty much. For PPC-offloading, you would be again slower than the 68882 solution because you need to communicate with the external CPU - some form of message passing is required. This does not pay off, it already killed the performance of PowerUp and WarpUp and made this hybrid PPC/68K solutions unpractical.
Thomas Richter is offline  
Old 09 January 2023, 15:57   #153
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,326
Quote:
Originally Posted by pandy71 View Post
Was curious about THOR opinion if from his perspective this is feasible to separate physical calculation implementation from frontend so he is not responsible for any foreign bugs but still can control SoftIEEE as owner and for example focus on his pure MC68K float implementation. So semi open standard.

See above. I'm as open as possible on the interface to make such a thing possible, and the interface of the library is as simple as it can be (two pointers to floating point numbers), but even if the actual computation would be immediate, there is still code between "your code" and the actual computation, and that is the MuRedox "trampoline code". It stores essential registers trashed by the softieee.library on the stack (d0-d2/a0-a1/a6, the ccr and the PC), loads the source operands (in the easiest case directly in the softieee.library) and calls the library.


Like it or not, this type of overhead will go away, no matter how smart your hardware is, and it is already one magnitude larger than the 68060 hardware multipliation. Even with instant operation, you would be down to the speed of a 68882, and that is really a *very* optimistic estimate.
Thomas Richter is offline  
Old 09 January 2023, 16:51   #154
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,200
@pandy71 That FPU you linked has a serial interface. That would probably limit performance on an 040+. On a 68000 though.... :-)
Samurai_Crow is offline  
Old 09 January 2023, 22:53   #155
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,888
Quote:
Originally Posted by Thomas Richter View Post
As said before, it is technically possible. All you need to do is to implement the softieee.library interface. This interface would take the parameters and forward it to the hardware.
To be honest i can't find documentation to softieee.library and as such it was my question to you as you are author and owner of the licence.

Quote:
Originally Posted by Thomas Richter View Post
However, the resulting solution would still not on par with a hardware FPU. Let's make a couple of computations: A hardware multiplication on the 68060 is ~2 cycles if I recall. The MuRedox call-in overhead is roughly one magnitude larger (~20 cycles), that of SoftIEEE through exception processing a lot larger (~200 cycles). To this, the softieee.library still has to forward parameters to the hardware, and perform the operation there. For example, for the 68882, you need to emulate the coprocessor interface in software (probably another 20 cycles) and then the 68882 has to execute the multiplication (which is another >20 cycles), so in the end, you are at about 60 to 100 cycles minimum. That's almost two magnitudes slower than the 68060.
I'm fully aware of this but firstly MC68060 from reputable source cost today more than 500$, secondly if i understand goal of this project is to provide possibility to run poorly written software incapable to run without physical HW floating point coprocessor.
Original 881/882 are rather slow HW FPU's and eventual floating point FPU emulation on typical MC68k will be even slower (due for example low clock).
Some hybrid solution can replace gap between high price reputable but close to unobtainable HW or salvaged or fake chips...


Quote:
Originally Posted by Thomas Richter View Post
The softieee.library multiplication engine is probably 200 cycles (just house numbers), so it is slower, but not that much slower. This is also the reason why the 68882-based "hardware accelerator" solutions were not really working well. The communication overhead to the FPU eat up the performance improvements of the FPU. The 68882 only works well with the 68020/030 hardware interface where hardware implements the interface.
So 20 cycles multiplication where clock is around 200MHz seem not to bad - and if i understand correctly software overhead will be exactly same for pure software or hybrid solution?
68882 is OK but quite slow - slower even than 80287 with twice lower clock and still 040 and 060 are subset of 881/882 instructions so eventual hybrid FPU approach may be still beneficial even if subpar with real HW FPU wired with CPU trough coprocessor interface?

Quote:
Originally Posted by Thomas Richter View Post
The softieee.library is a "numerics core". So it takes one or two extended precision floating point numbers in memory, and its "contract" defines that it places the result back in memory. SoftIEEE and MuRedox follow this contract. They do not care *how* the library does its job. This is "currently" an all-software implementation, but nobody stops you from implementing your own softieee.library. Such an alternative implementation would read the operands from memory, forward it to the hardware, and read the results back. Thus, while this construction would be faster, I doubt it would be *much* faster. I would expect a factor of 2 or 3 (see above for calculations), but that still places you one order of magnitude slower than native code on the 68060.

The speed would be, according to this estimate, approximately on par with the mathieeedoubbas.library.
I agree hybrid can be somewhere between pure SW and real HW.

Quote:
Originally Posted by Thomas Richter View Post
See above. I'm as open as possible on the interface to make such a thing possible, and the interface of the library is as simple as it can be (two pointers to floating point numbers), but even if the actual computation would be immediate, there is still code between "your code" and the actual computation, and that is the MuRedox "trampoline code". It stores essential registers trashed by the softieee.library on the stack (d0-d2/a0-a1/a6, the ccr and the PC), loads the source operands (in the easiest case directly in the softieee.library) and calls the library.

Like it or not, this type of overhead will go away, no matter how smart your hardware is, and it is already one magnitude larger than the 68060 hardware multipliation. Even with instant operation, you would be down to the speed of a 68882, and that is really a *very* optimistic estimate.
Currently MC68882 seem to be available from reputable source somewhere in price between 40 and 140$ depends on package and clock and it is still subpar in terms of delivered speed with other solutions...

Anyway thanks for your time and hard work.

Quote:
Originally Posted by Samurai_Crow View Post
@pandy71 That FPU you linked has a serial interface. That would probably limit performance on an 040+. On a 68000 though.... :-)
As i pointed this was example solution - simple illustration that even 8 bit embedded uC may get some help in relatively easy way.
4MHz SPI can be replaced with 80MHz SPI or by parallel interface - problem with real FPU for Amiga is high price if from reputable sources or high risk of fake or faulty chip salvaged from some junk in China, India or Africa if bought in internet...
MC68000 can use 881/882 as Motorola pointed in their application note AN947 and similar scheme could be used for hybrid emulation - nowadays there is many 4...6$ SoC's with HW FPU (usually single precision) but clocked at 100...400MHz.

This thread triggered my curiosity - missing Amiga/Commodore documentation for this interesting topic - something like Apple SANE documentation "Apple_Numerics_Manual_Second_Edition_1988.pdf"
pandy71 is offline  
Old 10 January 2023, 00:03   #156
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,326
Quote:
Originally Posted by pandy71 View Post
To be honest i can't find documentation to softieee.library and as such it was my question to you as you are author and owner of the licence.
Look, maybe that is too obvious, but the documentation is in the SoftIEEE.lha archive you get from Aminet. Where else would it be? Autodocs, pragmas, prototypes, all you need.
Quote:
Originally Posted by pandy71 View Post
So 20 cycles multiplication where clock is around 200MHz seem not to bad - and if i understand correctly software overhead will be exactly same for pure software or hybrid solution?
That is only the call-overhead of MuRedox. Remember, you still need to connect to the actual hardware, fill its registers with the source operands, and get the result back. That is an order of magnitude slower than the 060.
Quote:
Originally Posted by pandy71 View Post
Currently MC68882 seem to be available from reputable source somewhere in price between 40 and 140$ depends on package and clock and it is still subpar in terms of delivered speed with other solutions...
Yes, it is an old chip. Yet, it includes the transcendental functions. Whether you need them is another question. The softieee.library implements them through CORDIC.
Quote:
Originally Posted by pandy71 View Post
MC68000 can use 881/882 as Motorola pointed in their application note AN947 and similar scheme could be used for hybrid emulation - nowadays there is many 4...6$ SoC's with HW FPU (usually single precision) but clocked at 100...400MHz.

This thread triggered my curiosity - missing Amiga/Commodore documentation for this interesting topic - something like Apple SANE documentation "Apple_Numerics_Manual_Second_Edition_1988.pdf"
Not sure what you expect, actually. The equivalent of Apple SANE is the mathieeedoubbas/doubtrans and singbas/singtrans libraries, and its autodocs you find in the RKRMs and the NDK. Or, if you like, the autodocs of softieee, which provides something similar than Apple SANE (actually, softieee is much closer to SANE than mathieeedoubbas/trans are).

The mathffp/mathtrans libraries are based on motorola library codes for math functions.
Thomas Richter is offline  
Old 10 January 2023, 19:38   #157
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,888
Quote:
Originally Posted by Thomas Richter View Post
Look, maybe that is too obvious, but the documentation is in the SoftIEEE.lha archive you get from Aminet. Where else would it be? Autodocs, pragmas, prototypes, all you need. That is only the call-overhead of MuRedox. Remember, you still need to connect to the actual hardware, fill its registers with the source operands, and get the result back. That is an order of magnitude slower than the 060. Yes, it is an old chip. Yet, it includes the transcendental functions. Whether you need them is another question. The softieee.library implements them through CORDIC.
Apologies, downloaded SoftIEEE.lha not from Aminet (recent version) but from your opening message.
In respect to 060 - yes, but if you have 060 then seem this package is not for you but for people with LC060


Quote:
Originally Posted by Thomas Richter View Post
Not sure what you expect, actually. The equivalent of Apple SANE is the mathieeedoubbas/doubtrans and singbas/singtrans libraries, and its autodocs you find in the RKRMs and the NDK. Or, if you like, the autodocs of softieee, which provides something similar than Apple SANE (actually, softieee is much closer to SANE than mathieeedoubbas/trans are).

The mathffp/mathtrans libraries are based on motorola library codes for math functions.
To be honest i don't expect anything - it was just example of what i could hypothetically expect if Commodore by accident would be a serious company.

Thx!
pandy71 is offline  
Old 20 January 2023, 11:47   #158
shelter
Registered User
 
Join Date: Nov 2022
Location: #Amigaland
Posts: 156
Just a heads up, with SoftIEEE enabled, MacOS crashes in Shapeshifter.
shelter is offline  
Old 27 January 2023, 00:07   #159
amifan
WhatIFF? Amiga Magazine
 
amifan's Avatar
 
Join Date: Feb 2021
Location: Chiba, Japan
Age: 46
Posts: 500
Has anyone tried this with a TF1260 LC and Lightwave 3.5 FPU version? I followed the instructions for installation but get a guru error when running Lightwave. Not sure what to do next.
amifan is offline  
Old 27 January 2023, 06:36   #160
mfilos
Paranoid Amigoid
 
mfilos's Avatar
 
Join Date: Mar 2008
Location: Athens/Greece
Age: 45
Posts: 1,978
Quote:
Originally Posted by shelter View Post
Just a heads up, with SoftIEEE enabled, MacOS crashes in Shapeshifter.
I'm under the impression that you need to disabled FPU before running ShapeShifter as it uses a SoftFPU as well.

At least in my Vampire, I disable it before running it via a script.
mfilos is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Demos to test FPU on SX32 MkII (020+FPU) Rochabian request.Demos 1 21 April 2020 03:03
Betatesting Amiga and C64 Forever 7 michaelz support.Amiga Forever 23 22 June 2017 16:58
[obsolete] EoB 2 Thread AGA and translations betatesting Marcuz project.Amiga Game Factory 17 21 August 2008 22:47
Frederic's Emulator inside and Emulator thread Fred the Fop Retrogaming General Discussion 22 09 March 2006 07:31

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 17:59.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.13048 seconds with 14 queries