English Amiga Board

English Amiga Board (http://eab.abime.net/index.php)
-   support.FS-UAE (http://eab.abime.net/forumdisplay.php?f=122)
-   -   FS-UAE / WinUAE x86-64 JIT Compiler (http://eab.abime.net/showthread.php?t=79762)

FrodeSolheim 17 September 2015 00:01

FS-UAE / WinUAE x86-64 JIT Compiler
 
This is the main thread for issues and testing related to the new x86-64 JIT compiler (:xmas) in FS-UAE/WinUAE.

In order to test the 64-bit JIT compiler you need a 64-bit version of FS-UAE (2.7.0dev or newer) or a 64-bit version of WinUAE (3.2.0 beta 13 or newer). JIT should automatically be available the same way it is in 32-bit versions of *UAE. But always test with the latest versions available, of course.

This isn't a "new JIT compiler" as such, it is a modification of the old one (including a big merge of JIT code from the ARAnyM project, and fixing UAE-specific code for x86-64). As such, it is possible that the JIT support in the 32-bit FS-UAE/WinUAE is worse (or better) than before. So, if you discover that stuff stops working in the 32-bit versions (which worked well in previous beta/development versions), you are also encouraged to report that here.

In fact, the ideal testing procedure if you experience a problem with the 64-bit JIT compiler is:
- Test if the same problem occurs with the "new 32-bit JIT compiler".
- If so, test if the same problem occurs with the "old 32-bit JIT compiler" (in earlier versions of FS-UAE or WinUAE).

(If you experience WinUAE (or FS-UAE) 64-bit issues which has nothing to do with JIT, this is not the correct thread!).

64-bit versions of FS-UAE for OS X and Windows will be released as part of 2.7.1dev (soon-ish).

Happy testing :)

jbl007 17 September 2015 18:01

1 Attachment(s)
Quote:

Originally Posted by FrodeSolheim (Post 1041657)
Oh, and I am nearly done with JIT compiler support for the x86-64 versions of FS-UAE as well (code available in the "jit2" branch on github).

Couldn't wait to try it. ;) Tested direct mode only. Raw cpu speed is ~7% faster than original jit according "cpuspeed" benchmark. "Long" reads/writes are 1.5x-2x faster while "Byte"/"Word" are a litte slower. Haven't had a single crash. Looks very promising. :great

Can it work for 68000 cpu also?

bernd roesch 18 September 2015 11:11

Quote:

Originally Posted by FrodeSolheim (Post 1041657)
Exception handler is now fixed for OS X as well, so JIT direct memory will work on OS X too in the next development version. 2.5.41dev will also *default* to using direct memory.

Oh, and I am nearly done with JIT compiler support for the x86-64 versions of FS-UAE as well (code available in the "jit2" branch on github).

that sound good. hope all work

if it not work, you can set breakpoints in debugger at functions

Code:

static inline void emit_byte(uae_u8 x)
static inline void emit_word(uae_u16 x)
static inline void emit_long(uae_u32 x)
static __inline__ void emit_quad(uae_u64 x)

in file https://github.com/FrodeSolheim/fs-u...mu_support.cpp

and then when you use go(let breakpoints allways on) , you can see all that JIT output and look at asm output whats go wrong

bernd roesch 18 September 2015 11:16

Quote:

Originally Posted by jbl007 (Post 1041752)
Couldn't wait to try it. ;) Tested direct mode only. Raw cpu speed is ~7% faster than original jit according "cpuspeed" benchmark. "Long" reads/writes are 1.5x-2x faster while "Byte"/"Word" are a litte slower. Haven't had a single crash. Looks very promising. :great

Can it work for 68000 cpu also?

I think the measure rate is too bad, but when you try several times, maybe byte and word are a little faster. but in theory they should same. best is when you use hd-rec benches. i like also see a screenshot of guit time benchmark ;-)

FrodeSolheim 18 September 2015 19:02

Quote:

Originally Posted by jbl007 (Post 1041752)
Couldn't wait to try it. ;) Tested direct mode only. Raw cpu speed is ~7% faster than original jit according "cpuspeed" benchmark. "Long" reads/writes are 1.5x-2x faster while "Byte"/"Word" are a litte slower. Haven't had a single crash. Looks very promising. :great

Thanks for testing :) (More updated JIT code is now available in the *future* branch. The JIT updates have also been merged into WinUAE)

Quote:

Originally Posted by jbl007 (Post 1041752)
Can it work for 68000 cpu also?

JIT is explicitly unavailable for 68000. I'm not familiar with all the reasons. One reason can be due to minor CPU differences. Another can be that it is considered pointless, since you often want accurate-ish emulation when using 68000, and JIT will not give any kind of accurate CPU-chipset synchronization... (In any case, I don't plan to do anything about it).

Quote:

Originally Posted by bernd roesch (Post 1041839)
if it not work, you can set breakpoints in debugger at functions

I know, I have had to resort to this several times (latest occurrence yesterday evening) ;)

bernd roesch 19 September 2015 09:56

JIT for 68000 CPU i think too is not usefull. because a PC is lots faster as 68000 CPU and any programs that need more speed work with 68020+ CPU, because a real 68000 CPU give not enough speed, so programmer always do a 68020+FPU versio

jbl007 19 September 2015 19:18

1 Attachment(s)
Quote:

Originally Posted by FrodeSolheim (Post 1041901)
(More updated JIT code is now available in the *future* branch. The JIT updates have also been merged into WinUAE)

Compiled from future branch now (currently at 9e807589eb). Unfortunately it doesn't run well. Tried different configs, jit direct/indirect - no change. Workbench starts, but applications do not run or behave incorrectly. Sometimes fs-uae crashes. Last tested version (from jit2 branch) worked much better. Log attached.

FrodeSolheim 19 September 2015 19:24

If you want to help... what would really help is to find the first commit where your experience these problems (i.e. work your way backwards through the latest commits...)

FrodeSolheim 19 September 2015 19:52

I forgot to mention one thing, JIT FPU is enabled in the latest commits, and this might not work properly for 64-bit yet. Please disable with --uae-compfpu=0 and see if this helps!

jbl007 19 September 2015 23:13

2 Attachment(s)
Quote:

Originally Posted by FrodeSolheim (Post 1042075)
I forgot to mention one thing, JIT FPU is enabled in the latest commits, and this might not work properly for 64-bit yet. Please disable with --uae-compfpu=0 and see if this helps!

Yes, uae_compfpu=0 helps! No more crashes.
With the 32-bit executable this option is not needed, but fonts of Amikit startmenu look bad if uae_compfpu=0 not set. Does the jit-fpu-compiler have lower accuracy or is it somewhat faulty?

FrodeSolheim 20 September 2015 00:03

64-bit JIT FPU being broken wasn't surprising. This code hasn't been checked for 64-bit compatibility yet. So in future commits, comfpu option will default to 0 for 64-bit versions.

Regarding the 32-bit JTI FPU and the font rendering - does "future version" behave worse than the current development version? Or is the behavior the same?

Regarding accuracy, an important point is that there's completely different code involved with interpreter FPU emulation and JIT FPU emulation. I don't know if the JIT FPU is supposed to have the same accuracy. But there is an additional option, you can try
Code:

uae_fpu_strict = 1
(uae_fpu_strict = 0 means "faster, but less strict rounding" according to source code).

FrodeSolheim 20 September 2015 02:23

Found the issue causing the rendering problem, will push fix tomorrow

bernd roesch 20 September 2015 11:14

Quote:

Originally Posted by FrodeSolheim (Post 1042115)
64-bit JIT FPU being broken wasn't
Regarding accuracy, an important point is that there's completely different code involved with interpreter FPU emulation and JIT FPU emulation. I don't know if the JIT FPU is supposed to have the same accuracy. But there is an additional option, you can try
Code:

uae_fpu_strict = 1
(uae_fpu_strict = 0 means "faster, but less strict rounding" according to source code).

The JIT have less accuracy and return diffrent values as CPU emu. but in real world this is no problem. amiga FPU use 80 bit. in JIT code there is a option to use 80 bit precision for JIT. i look in your source, but i do not find 80 bit or 80bit. maybe all is remove. this slow all down, and i know no program that work better then.

I know only 1 program that seem do heavy round precision tricks. its raytracer imagine. this do strange things when not exact is emulate. but a other program i do not know. uae always execute the first loop of a code sequence before branch instruction always with interpreter mode, but compile the block for JIT. Only when a branch back is taken, there is look if the block to which the branch back want jump is compile. if compile then the jit code is execute.

also best is use only 68020/68881 first for 64 bit jit test. maybe problem come due to 68k fpu exception problem.

the uae generate some files not in your source(gencomp), about which 68k commands work with JIT. so everybody who can compile can help to come near the problem. to find what is the problem with FPU, it is usefull to disable all other and first enable only if fmove commands work ok. if so then enable fbne commands and then more and more. so if the command can see, there can then a testprogram in 68k do, that verify the commands that let winuae crash more.

if somebody can upload the uae generatet files, i can show in source how deactivate can work.
it is more easy and more flexible and give much faster compile, when the generatet files are modify. but of course gencomp should not run, because it overwrite generatet hand modify files. if only fmove instructions are in jit compile can see, when see code at emit_xxxxx. instruction that are done by uae CPU emu can see that they contain a jump.

FrodeSolheim 20 September 2015 22:43

In the latest commits in the "future" branch:
- Fixed the font rendering issue with the 32-bit FPU JIT
- The uae_compfpu setting defaults to 0 on x86-64 until some issues are fixed.

(I have already fixed some 64-bit FPU JIT crashes locally, so AmiKit almost boots now with the 64-bit FPI JIT enabled, but I'll want to look more at it before pushing any commits).

EDIT: future branch is merged into master (and removed).

bernd roesch 21 September 2015 10:15

have you build a windows version with 64 bit for fs-uae ?. I can then test with hd-rec and fpu testprograms(that check rounding if all work ok, and maybe find FPU problem instructions.

what was the problem with 64 bit fpu support, in general ?.

There is a define

USE_X86_FPUCW

this is normaly used. is this enable or disable in your 64 bit builds ?
i do not remember that FPU JIT work really ok, when FPUCW is disable. I try long time ago a build without it, and i notice FPU work wrong and many programs. I did not search wy. x86 use diffrent default rounding as 68k

and btw dont forget a testprogram need always run in a 2 times loop, only then the jit code is execute. thats important to know, because the testprograms need do loop. but correct working can check because 1 loop time output uae interpreter values, 2. loop output from JIT code. and output both values and see diffrence, show error

jbl007 21 September 2015 10:48

Quote:

Originally Posted by FrodeSolheim (Post 1042214)
- Fixed the font rendering issue with the 32-bit FPU JIT

Yes, they are. I made a visual diff of 2 screenshots (old-jit, new-jit). I also generated some images with ChaosPro. No difference visible.

I exported a demo project with HD-Rec. Waveforms are different. This is visible in an audio editor using a very high zoom level. Difference is not hearable as far I can trust my ears. :) So precision for real world performance might be good enough.
BTW. Results of old-jit/no-jit are identical (checksums match).

Edit: I tested the 32bit versions. Looks like this is off topic in this thread now. ;-)

bernd roesch 21 September 2015 18:27

If hd rec output is diffrent, this look as a rounding problem

hd-rec calculate all in float and then convert to integer.

68k FPU use in default rounding as everybody do when transfer to integer. 1.51 is round to 2. 1.49 is round to 1 0.99 is round to 1
but X86 do by default round only 0.99 to 1
maybe the exception handler of windows/linux part, change FPUCW, and do not restore rounding in the FPUCW Register when leave

SaphirJD 21 September 2015 19:48

What.... 64 Bit Jit?

That is AMAZING :D Can't wait to test it :D

FrodeSolheim 21 September 2015 20:12

Quote:

Originally Posted by jbl007 (Post 1042264)
BTW. Results of old-jit/no-jit are identical (checksums match). Edit: I tested the 32bit versions. Looks like this is off topic in this thread now. ;-)

I am also interested in changes in behavior for the 32-bit JIT :) So for now, we will define this as "on topic". I'll make a note of this, and perhaps/probably need more testing assistance later. Of course, if someone (tm) could find a minimal example of different FPU behavior that would be ideal (say a dedicated Amiga test program running just a few floating point operations with differing behavior). Probably wishful thinking, I know :)

Quote:

Originally Posted by bernd roesch (Post 1042257)
have you build a windows version with 64 bit for fs-uae ?. I can then test with hd-rec and fpu testprograms(that check rounding if all work ok, and maybe find FPU problem instructions.

Coming soon :) But to be clear, there is no point in testing the 64-bit JIT FPU yet. I need to investigate and fix known issues first, and then - when it (looks like it) works for me - I'll enable the JIT FPU by default in the development releases.

Quote:

Originally Posted by bernd roesch (Post 1042257)
what was the problem with 64 bit fpu support, in general ?.

There are several issues, but mostly, 64-bit issues (in general, not specific to the FPU) are related to loading/storing from host memory via 32-bit pointers. Using 32-bit addresses work fine (as long as memory is mapped below 0xffffffff), but care is needed to make sure the addresses are zero-extended and not sign-extended. The latter is the most common cause of crashes.

The latest FPU issue/crash I investigated was related to fldcw_m_indexed. If all the 32-bit x86 registers were already allocated, and the JIT register allocation routine chose on of the R8..R15 registers on x86-64, this would fail horribly, since FLDCW cannot be used with these registers (issue is fixed).

Quote:

Originally Posted by bernd roesch (Post 1042257)
There is a define USE_X86_FPUCW this is normaly used. is this enable or disable in your 64 bit builds ?

Yes, this is enabled.

bernd roesch 22 September 2015 10:15

good, i see in the JIT source that more registers are use, even it not work as before. I think for first test, do not use more registers as on old JIT. only when work, then use more. this make problems find more easy maybe there are more that do not work with more registers. and when all work, then can see, when increase register number when it fail, that there is another problem with more register. i think on X64 all specific commands that are selden used, do not work on R8-R15.

this i guess are also the fpu flag (that track greater, smaller of fpu compare), and maybe some other. JIT need load and store this register too


All times are GMT +2. The time now is 05:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.

Page generated in 0.04644 seconds with 11 queries