08 December 2013, 20:08 | #1 |
Registered User
Join Date: Aug 2004
Location:
Posts: 3,344
|
Experimental/testing builds of WinUAE 2.7.0 with VS 2013
I have created experimental/testing builds of WinUAE 2.7.0 using Visual Studio 2013. (The official WinUAE binary is built with Visual Studio 2010.)
~14.5MB archive (nine executables and source code): 4-s-h-a-r-e-d. (Copy and paste link into your browser, change hxxp to http and remove - symbols before pressing enter.) These are not official builds, so don't bug Toni if they don't work properly; post here instead. They should be compatible with Windows XP SP3 and later. Please let me know if they crash or otherwise misbehave on your XP SP3 system. The source code is unmodified, but the built-in AROS ROM is probably an earlier version than that in the official build. (I used aros.rom.cpp_AROS-20131028.zip as uploaded by acd2001; Toni doesn't include aros.rom.cpp in the official source archive.) They might be slightly faster than the official build, assuming Microsoft improved their compiler between VS2010 and VS2013. However any difference probably won't be huge, or perhaps even noticeable at all. And I haven't done any testing. There is probably most likely to be a difference when using a C-language CPU core (68030/040/060 with MMU) or running a demo which has effects which are very CPU-heavy to emulate. There are speed- and size-optimised executables for x86, SSE, SSE2 and AVX. It probably makes sense to use the "best" one that your computer's CPU supports. I didn't add any CPU check code, so if you run one which uses instructions your CPU doesn't support it will probably crash. Pentium III doesn't support SSE2 and Core 2 Duo doesn't support AVX for example. There probably isn't much point using the size-optimised ones, but I compiled them to see how much difference optimising for size made (about 1MB). Optimisation settings used: Full Optimization (/Ox) Inline Function Expansion: Any suitable (/Ob2) Favor fast code (/Ot) Omit Frame Pointers: Yes (/Oy) Whole Program Optimization: Yes (/GL) Also included as an experiment is a build (for SSE2) using the "fast" floating-point model (instead of precise). I'm not sure how that difference could affect emulation, so why not test it out? Other possible ideas for building faster executables (any comments Toni?):
Last edited by mark_k; 01 December 2014 at 13:24. Reason: Updated link due to crazy EAB policy... |
08 December 2013, 21:51 | #2 |
Amiga 500 User
Join Date: Jun 2013
Location: EU
Posts: 1,508
|
My config is: WinXP SP3 (regularly updated), Pentium M 1600 o/c @2133MHz (16x133), DDR2-533 @266 2x512 Dual Channel (128-bit), GeForce 6800
All except avx versions start/run fine on my config! I have tested only speed versions according belove demo problem thread: http://eab.abime.net/showpost.php?p=922806&postcount=21 (Sine Intro V1.0 (Intro) by Excel) and found only "speed_fastFP" version to not produce sound crackling on my config: winuae2700_vs2013_sse2_speed_fastFP.exe ... no sound crackling !!! ... and all other versions produce sound crackling, some more some less noticeable. And I have also tested on this "speed_fastFP" version game Jim Power in Mutant Planet and demo Silents/Maximum Velocity and didnt notice any problem so far. So this "speed_fastFP" version could be maybe then the best one. Also maybe it will need some more testings, but as dont have much time for more detail investigation, that will be all for now. |
09 December 2013, 16:54 | #3 | ||||
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,532
|
Quote:
Quote:
Quote:
Quote:
Last edited by Toni Wilen; 09 December 2013 at 17:09. |
||||
09 December 2013, 19:39 | #4 |
Amiga 500 User
Join Date: Jun 2013
Location: EU
Posts: 1,508
|
I have done some more tests and found some interesting case:
When running demo "Cylindric Scroll (Demo) by Concept" -> http://janeway.exotica.org.uk/release.php?id=6403 The "sse2_speed" version is the fastest one (86-88% in task manager) and "sse2_speed_fastFP" is the slowest one (88->90% in task manager) !!! |
09 December 2013, 20:14 | #5 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,532
|
http://www.winuae.net/files/b/winuae2.zip is MSVC 2013 (same optimizations as in OP) + SSE2 + __fastcall compiled.
|
09 December 2013, 20:26 | #6 |
Amiga 500 User
Join Date: Jun 2013
Location: EU
Posts: 1,508
|
Well, winuae2.zip is somewhere in between of "sse2_speed" and "sse2_speed_fastFP" if testing it on "Cylindric Scroll (Demo) by Concept" demo ...
so "sse2_speed" is still the fastest one, at least on my PC config. (also it still produce sound crackling on Sine Intro V1.0 (Intro) by Excel) EDIT: On Jim Power title screen the fastest is "sse2_speed_fastFP" version! Last edited by amilo3438; 09 December 2013 at 21:10. |
09 December 2013, 21:02 | #7 | |||
Registered User
Join Date: Aug 2004
Location:
Posts: 3,344
|
Quote:
I might try using PGO, but only "train" it with one or two specific demos, then see if there's any noticeable change in CPU usage. Any recommendations for demos which need a lot of CPU power to emulate? I wonder whether it might be worth using #pragma float_control(...) for functions which don't require extreme accuracy (audio resampling maybe???) but forcing functions which should be as accurate as possible (like 68881/2 emulation) to use the precise behaviour. Quote:
Regarding the sound crackling with Sine Intro V1.0, if you change WinUAE audio settings does the crackling disappear or lessen? Maybe change sampling rate from 44100 to 48000 (or vice versa), set stereo separation to 100%, use a different interpolation method. Quote:
|
|||
09 December 2013, 21:15 | #8 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,532
|
I had to recompile all of them with same settings. Boring but this time it worked. fastcalllibs.zip in same place as winuae2.zip.
|
09 December 2013, 21:16 | #9 | |
Amiga 500 User
Join Date: Jun 2013
Location: EU
Posts: 1,508
|
Quote:
EDIT: Final conclusion: The "sse2_speed_fastFP" is on my pc config: the fastest version on: Sine Intro V1.0 (Intro) by Excel ... (dont produce crackling sound) Jim Power title screen ... (least cpu power usage in task manager) but slowest on: Cylindric Scroll (Demo) by Concept ... (most cpu power usage) ...here must be something internally in WinUAE that make it to use so much CPU power ... maybe some loop or something that it execute more repeatedly than other versions, dont know, just guess! ... what then would mean its still the fastest version! EDIT: Maybe the "winuae2.zip" if will using the "fast" floating-point model (similar to "sse2_speed_fastFP") will be the best one! EDIT2: I have also noticed that running demos on "winuae2.zip" somehow produce a little sharper scrolling text on my laptop LCD screen ... f.e. on "PartyDemo/Ghostriders" demo the logo is for some reason less blurred on "winuae2.zip" than on any previous winuae test version !!! Last edited by amilo3438; 12 December 2013 at 00:19. |
|
05 May 2014, 22:26 | #10 |
Zone Friend
Join Date: Apr 2005
Location: London
Posts: 1,176
|
Can someone with the necessary software compile winuae 2.8.0 with sse2, if it is not too much trouble?
Even a few extra fps is welcome on my old 50hz-capable laptop. Better still, can I compile it myself? Would Visual Studio Express 2013 work? |
11 May 2014, 13:44 | #11 | |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,532
|
Quote:
I don't bother with 2.8.0 anymore but I can create SS2 version of next beta but only if you or someone else will do speed tests and confirm if there is any gain |
|
12 May 2014, 02:49 | #12 |
Zone Friend
Join Date: Apr 2005
Location: London
Posts: 1,176
|
That's great, thanks, when it is time for the next beta I'll ask about an sse2 version, and do any speed tests requested.
Thanks also for the tip about vs2013 express, I'll try it. |
12 May 2014, 11:43 | #13 |
Moderator
Join Date: Jan 2003
Location: ...
Age: 52
Posts: 1,838
|
The biggest gain you could possibly get is using a native x64 build as the architecture is very different, with lots of registers (no pressure) etc.
|
12 May 2014, 19:56 | #14 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,532
|
"Proper" x64 build would also use 64-bit integers where it would increase performance, like planar to chunky conversion and other parts of emulation that do "wide" operations divided in multiple 32-bit pieces.
Not really worth the trouble until everything (=JIT) are 64-bit compatible. |
03 June 2014, 20:38 | #15 |
Zone Friend
Join Date: Apr 2005
Location: London
Posts: 1,176
|
Is now a good time to test a beta version compiled for Sse2 against a normally compiled version? If so, what speed test would you like done? Thanks
|
03 June 2014, 20:59 | #16 | |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,532
|
Quote:
Tests: Fast CPU mode: Do not use sysinfo!. Better idea is to time (use stop watch, do not use any emulated timers) some long (at least 10s) easily repeatable operation, like lha compression (from ram disk to ram disk), 3d rendering, aibb tests that do not finish immediately also work, and so on.. For A500/A1200 compatible mode: for example run some (100% non-interactive) demos in warp mode from start to end and compare how long it takes to finish. |
|
05 June 2014, 01:49 | #17 |
Zone Friend
Join Date: Apr 2005
Location: London
Posts: 1,176
|
Thanks, i'll do some tests as directed with a stopwatch and report back.
|
06 June 2014, 16:24 | #18 |
Amiga 500 User
Join Date: Jun 2013
Location: EU
Posts: 1,508
|
I have made a speed-test using demo Coppermaster / Angels ... and the difference in speed is so obvious, see on picture (as was expected ).
But also, for some reason, it looks that the non-SSE2 version works a little stable than the SSE2 version. EDIT: "more stable" Last edited by amilo3438; 06 June 2014 at 18:30. |
09 September 2014, 22:26 | #19 |
Zone Friend
Join Date: Apr 2005
Location: London
Posts: 1,176
|
I did some tests as directed with a stopwach and carefully noted the results. Then I lost my notes.
It doesn't matter, there was no difference to the execution time in warp mode, aside from a couple of seconds per minute. I didnt look at cpu load during these tests. Perhaps my 2ghz cpu isnt 'slow' enough to show a difference with these tests. Thanks for providing the sse2 exe to test with, it was interesting to do. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Dopus Open Source. Latest Nightly Builds | Retrofan | support.Other | 1 | 13 October 2013 02:28 |
Deluxe Galaga on Amiga Forever 2013/WinUAE problem | letsplayac | support.Games | 3 | 29 August 2013 20:22 |
2.4.X builds and low audio volume | KoasKid | support.WinUAE | 1 | 21 May 2012 20:12 |
anyone know stuff about MAME arcade cabinet builds | techn1um | Retrogaming General Discussion | 59 | 10 September 2010 20:29 |
AMIGIFT v2.0 PRE-Release (new builds!) | Paul | News | 2 | 01 August 2006 15:38 |
|
|