10 April 2005, 05:03 | #1 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
Is a faster WinUAE FPU emulation possible?
Is there a good chance to speedup some IEEE
functions in the FPU emulation of WinUAE by a factor of 3 - 5 times as in my MathLibs881? Many more apps could profit from WinUAE math. Possible IEEE code optimizations for WinUAE which should work without loosing precision at least for SINGLE and DOUBLE floatingpoint: _____________________________________________ 1.) Log2(x) works several times faster and also seems to have less rounding bugs than LogN(x) or Log10(x). Replacements: LogN(x) = Log2(x) * 1/Log2(e) 1/Log2(e) = 0x 3FFE0000 B17217F7 D1CF7800 as Double = 0x 3FE62E42 FEFA39F0 Log10(x) = Log2(x) * 1/Log2(10) 1/Log2(10) = 0x 3FFD0000 9A209A84 FBCFF000 as Double = 0x 3FD34413 509F79FE 2.) Sin(x) and Cos(x) are much faster than SinCos(x) and Tan(x) and substitute them: SinCos(x) = Sin(x) and Cos(x) separately Tan(x) = Sin(x) / Cos(x) and cos(x)<>0 3.) The hyperbolic functions can be replaced by faster terms, which use Exp() instead: SinH(x) = (Exp(x)-Exp(-x)) * 0.5 CosH(x) = (Exp(x)+Exp(-x)) * 0.5 TanH(x) = (Exp(x)-Exp(-x)) / (Exp(x)+Exp(-x)) |
10 April 2005, 11:15 | #2 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,553
|
Is this true with modern x86 FPUs too? FPU emulation basically "only" maps all FPU instructions to x86 FPU instructions via standard C-library functions. I guess most modern FPU libraries are optimized enough to handle these situations?
|
10 April 2005, 11:33 | #3 |
Junior Member
Join Date: Jan 2002
Location: Australia
Age: 45
Posts: 381
|
Hey PeterK, what are you running under WinUAE that requires so much FPU? RayTracing? Just interested.
|
11 April 2005, 01:16 | #4 | |||
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
Quote:
just 240MHz, but I guess the x87 FPU functions are nearly the same in modern Pentium 4 or AMD 64 CPUs in order to stay compatible to the 486 technology. On my machine the speed differences are really up to 500 % with these changes. Quote:
asked myself, if it might be a lack of optimization in the C-libraries from Microsoft or if they have to do these slow emulations to ensure, that the precision is still good enough for Extended floatingpoint calculations, because the x87 FPU works with 80 bits only and there are no extra bits left to compensate rounding bugs, like you find them on the 680x0 CPU/FPUs. The main problem with the x87 is, that it has only very few FPU functions directly implemented compared to the m68k. There is NO LogN(x) for example, but only the command FYL2X, which is Log2(x), thus LogN(x) has to be emulated! But at least for SINGLE and DOUBLE precision it would be much faster to use my substitution for LogN(x), it's really five times faster than the direct command mapping. Quote:
The reason, why I started this thread is, that I know, that most applications which heavily use IEEE-functions don't go through the overhead of library calls, but use inline FPU calls. That's the point, where WinUAE could help them. The optimization in the MathIEEE-libs is pretty useless here. The following chapter is already obsolete now, see below! On the other hand, the C-libs for x86 CPUs are obviously NOT good enough optimized to handle this or WinUAE is not compiled with the best compiler configuration, but sorry, I don't have any experience in PC programming yet. Please, don't misunderstand the last statement! I should better tell you now, that WinUAE is without any doubt the very best application for any PC and the only real reason to use this shitty hardware, because it makes an Amiga platform out of it. Last edited by PeterK; 11 April 2005 at 05:46. |
|||
11 April 2005, 01:30 | #5 | |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
Quote:
but of course there are other proggies too, like Digital AlmanacII, which also makes heavy use of the FPU functions. And it's normal, that the programmers of such applications know how to use the FPU directly, which means, my own MathLibs881 won't be needed. If RayTracing would also profit from the above mentioned changes is a good question, because it will probably use only the basic FPU math functions for its calculations, but I didn't try that out, yet. |
|
11 April 2005, 05:40 | #6 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
Some hours later...
Suddenly, late at night , I had the strange idea to check
out, if there could be any influence from the JIT compiler of WinUAE, which might be the reason for the mysterious behaviour of the FPU emulation. So, I switched the JIT off. And indeed, the JIT compiler seems to destroy some of the properly optimized code, which comes from the C-libs. I really would prefer to blame the Microsoft C-libs instead But the facts are, that without the JIT optimization the HsMathLibs benchmark shows me, that the directly mapped FPU code (LogN(x) -> LogN(x) in the C-Libs) is really 50% or up to 100% faster than it is with my replacement code. As soon as I switch the JIT on again, all functions get around 10 times faster in general, but my replacements can still gain another factor of 3-5 times more. See above. BTW, also Asin(), Acos() and Atan() can be improved, too. I just couldn't find a suitable replacement function for them, but there is an instruction called FPATAN for the x87 FPU. That means, all in all, there should be a potential speedup of maybe 1000% possible, because we don't need the extra overhead for making library calls, if the JIT gets optimized. Isn't that a little bit of motivation, now? 1000% speedgain! Toni, I'm sure, you will make that !! Last edited by PeterK; 11 April 2005 at 07:42. |
11 April 2005, 10:56 | #7 |
Posts: n/a
|
Compatibility
Hello PeterK!
I am just a common user, so... I am already using your replacement in my WinUAE, but I haven't noticed any speed improvements... Probably because my PC machine is quite fast even without optimized libs. In history, when working with "real" amiga 040/40, the main motivation to install the patched/optimized/whatever-modified replacements was to get more speed. The compatibilty issues had lower priority. Now, as I said, I am more than satisfied with the speed so I focus on compatibility more... Therefore I'd like to ask wheter it is "safe" to use your replacement in ALL cases, in ALL situations, you know... Thank you! |
11 April 2005, 11:53 | #8 | |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
Quote:
to find, which makes very intensive use of the faster replacements in the libs, but instead these programs are usually working with inline math instructions. What remains are the applications, which do a library call every now and then and the users will never ever notice any of the few milliseconds of speed gain. That's the reason, why it makes much more sense to improve WinUAE instead, because all these clever inline functions finally have to use its FPU emulation. But there's no need to worry about the safeness of my replacement functions! |
|
11 April 2005, 15:47 | #9 |
Registered User
Join Date: Apr 2005
Location: Glasgow, Scotland
Age: 47
Posts: 81
|
@PeterK
I have a number of newer games like Myst and Nightlong which do not playback the video clips under WinUAE that well and I would love to see any improvement in the JIT FPU performance so I don't have to use my ageing 060 powered A1200T to play them. I also used to like running the version of VISTA PRO that was given away on an Amiga Format cover CD but if I remember correctly it did not work last time I tried it but I can't remember if it did not like my 060 or if it was winUAE that it did not like. Anyway the point I was trying to make is that I would also like to be play with it again under winUAE with all the speed of my modern PC and any optimization of the JIT code would be very welcome. |
11 April 2005, 23:46 | #10 | |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
Quote:
help you very much here, I guess. Are you sure, that your games make any use of the FPU ? Usually, I would expect, that a typical game uses only the CPU integer calculations, because that's much faster and it can also run on any Amiga. |
|
12 April 2005, 11:33 | #11 |
Registered User
Join Date: Apr 2005
Location: Glasgow, Scotland
Age: 47
Posts: 81
|
Yes I am sure they require the FPU to play back the AVI video files they use for their cut seines and refuse to launch if you disable the FPU emulation.
I expect that any work you do to speed up FPU emulation to run Applications will also help to make video playback smoother as well. I also have Cinema4Dwhich I think would really benefit from any work you can do in this area. |
12 April 2005, 11:54 | #12 | |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
Quote:
please understand, that it's not me, who is doing the hard work here. If it will really be possible to speed up the FPU emulation of WinUAE, then you have to point your thanks to Toni and others! I think, the developers of WinUAE are doing a great job and my part here is only to make some suggestions, not more! |
|
12 April 2005, 12:17 | #13 | |
Registered User
Join Date: Apr 2005
Location: Glasgow, Scotland
Age: 47
Posts: 81
|
Quote:
|
|
12 April 2005, 12:21 | #14 | |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,553
|
Quote:
Anyway, bigger video playback bottleneck is most likely P96 emulation. It isn't made for special cases that transfer large chunks of data continuously. |
|
12 April 2005, 12:39 | #15 | |
Registered User
Join Date: Apr 2005
Location: Glasgow, Scotland
Age: 47
Posts: 81
|
Quote:
But I and a few other people with high end Amiga would not complain if improvements come about as a side effect of work done targeted at other more important applications. Current performance is already adequate to make the games usable on a moderately fast PC such as my Athlon MP2800. As for PPC emulation I am of the opinion that if people want PPC support they should buy an peg2 or AmigaOne system to run PPC software. |
|
12 April 2005, 14:30 | #16 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,383
|
@Toni Wilen
Hello Toni !
Hmm ?? What's wrong? No comment concerning my posts #4 and #6 ? Did you read them? Or do you think my assumptions are sooo unrealistic, that it's not worth talking about it any more? Maybe you get totally different benchmark results on your system. Tell me about your opinion, even if you think, that there are no chances to improve the JIT compiler or FPU code for more speed. |
12 April 2005, 15:54 | #17 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,553
|
I don't have much info about FPU emulation (I have only done some small changes), too easy to break something. JIT is another part that I don't want to touch at all, it is really confusing and I hate x86 assembly
Do you have any small compiled testing programs that can be used to confirm possible speedups? |
12 May 2005, 13:40 | #18 | |
Registered User
Join Date: Aug 2002
Location: Nottingham England
Posts: 277
|
Quote:
Don't get me wrong, this isn't a complaint, and the work you do on Winuae is amazing. You obviously spend an awful lot of time on this project, and I guess the aim is for absolutely perfect emulation of the Amiga and increasing the potential usefulness of an emulated Amiga. I'm guessing that you and most of the people who use either winuae and/or 'real' Amigas are (unlike me) pretty diehard fanatics, trying to squeeze every last ounce of power and versatility out of their Amiga. I would love to see decent video running on the Amiga... just to say it can be done. A slightly off topic question, aimed at Tony because he's a general programming genius Is it possible to implement a system on UAE, similar to the old 'Siamese system' whereby (if I remember correctly, but I never got my Siamese to work) launching a video on the Amiga actually transparently redirected it to a Windows media app, which would then play it? Being able to utilize a Win32 media player through UAE, without having to actually switch to Windows and launch it would be pretty impressive. Perhaps if I say pretty please? |
|
12 May 2005, 13:47 | #19 |
Posts: n/a
|
I have a similar idea... What about to add a feature that allows to run PC game from the emulation. The game itself will run on PC side ofcourse. On Amiga side, you just click on Doom3 icon that pauses the emulation and runs the PC executable in full screen. After the game is quit, it returns silently to the emulation... tricky, eh?
|
12 May 2005, 13:55 | #20 |
Registered User
Join Date: Aug 2002
Location: Nottingham England
Posts: 277
|
In effect using the PC as a glorified accellerator... a slave (albeit one that is running an emulated Amiga, which is quite a perverse scenario!) I'd love to see that idea come to fruition. Unfortunately my coding skills are limited to very basic 'BASIC' and Pascal programs. Any takers?
Toni... please forgive me for misspelling your name in my last post. Apologies for not checking it more carefully. I'll bash myself on your behalf and again... |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
which modern cpu is the faster for winuae? | laser | support.WinUAE | 39 | 14 November 2023 18:01 |
FPU emulation | elowan | support.FS-UAE | 8 | 31 January 2013 06:53 |
Possible FPU emulation bug in 2.5.1? | Zoltar | support.WinUAE | 13 | 05 January 2013 14:52 |
Can I make WinUAE faster? (loading time and such) | EssKung | support.WinUAE | 15 | 29 May 2007 11:59 |
Fpu emulation bug | cefa68000 | support.WinUAE | 2 | 09 February 2007 19:14 |
|
|