English Amiga Board


Go Back   English Amiga Board > Support > support.WinUAE

 
 
Thread Tools
Old 29 March 2011, 21:24   #1
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Is WinUAE optimized for lowest possible input lag?

Hi Toni,

I have a general question on input lag. I've been reading up on this topic with regards to emulation and have taken a few measures to lower it as much as possible on my WinUAE setup with Windows 7 64-bit.

The bottom line of my story below leads to me these two questions, which I hope you can answer:

  • Is WinUAE using 'DIRECTINPUT_VERSION 0x0500'? or is it using another version? Apparently "Direct Input" of "Version 5(old)" is generating the least amount of input lag in windows. If WinUAE is using another version, could you enable Version5(old) as an option to test for us?
  • How do you feel about the two strategies proposed by Ootake to minimize possibilities of input lag with emulation on a PC?

Background:

I'm using my old Suzo Arcade joystick with WinUAE (through the excellent Stella adapter) and comparing the response in action games to playing them on my real amiga. I've also set up my old 15khz Amiga monitor as a second monitor on my PC through Soft15khz and the VGA-->SCART cable (sold by wolfsoft in Germany). This gives me quite accurate comparison of WinUAE to my real Amiga (A1200/060)

Measures that I´ve taken to lower input lag as much as possible are:
  • In WinUAE use double buffering, v-sync off
  • set 'flip queue size' to zero for my ati radeon driver (with radeonpro utility)
  • overclock the USB port on which the joystick is set (shaving off ~additional 6-7 milliseconds)
Input response now seems very close to my real Amiga, but still with very fast shoot 'm ups it feels just slightly more 'rubber band' (for example in Apidya after a few speed up power-ups or with X-out, or other fast shoot 'm ups a real amiga feels more precise than WinUAE. It's hard to test how much delay there still is in the PC, so I'm wondering if it's only the 20ms of the double buffer or if there is more delay through input drivers, or emulation timing etc. That last bit I'd like to find out, because maybe there is additional room to get even closer to "cycle exact" input? :-)

Now by reading up on ways to lower input lag created by the PC system I stumbled upon the page of 'Ootake', author of the PC-Engine emulator. And he is pointing to some interesting stuff that I thought might be of interest to WinUAE. His English is not so good, but nonetheless I'll quote some of his findings below.

Quote:
About the cause of becoming "Bigger Delay Problem"
In "Two strategies of Ootake" previously described, about one frame is prevented being delayed.

To my regret, there is a cause of "Big Delay Problem (about 2-4 frames)" that becomes a problem any more.
It is "Problem of DirectX(Direct Input) of Windows".

If "Direct Input of Windows" is not used well, a big delay is generated.
It seems to be different according to the library used. I am adding the following code for the development environment of Ootake (MinGW+DirectX library).

#define DIRECTINPUT_VERSION 0x0500

"Direct Input" of "Version 5(old)" is used. As a result, the least input delay has been achieved.
(confirm the operation on WindowsXP)

When the version above "Version 5(old)" is used, a big input delay is generated in Ootake, too.
There is a possibility that the person who had made "Direct Input of an old version" loved playing a game more. Or, there is a possibility that the delay occurs when the library is old.

To our regret, there are a lot of emulators where this delay problem occurs.
The problem is caused in "VirtuaNES (Excellent NES emulator)", too.
The problem is caused in "BuleMSX (Excellent MSX emulator)", too. *2008.9.20 It was considerably improved now (ver2.8 wonderful!).
These are very too good. In "Nestopia" and "ParaMSX(latest bata)", this problem is not caused. When the "action & shooting game" are played seriously and compared, the
difference is remarkable. (In "VirtureNES" and "BuleMSX", if improved it is glad as one user.)

* The above-mentioned is a confirmation of operation in "WindowsXP SP2" environment.
Besides the DirectInput stuff Ootake has some general comments about optimizing emulation for lowest input lag. These are the following two "strategies":

Quote:
Hello. I am Kitao Nakamura producing "PC Engine" emulator "Ootake".
I have the request to the emulator authors. The request is a solution of "Delay Problem".

To my regret, the emulator with sticking to "Operation Sense of Joypad" is rare.

In the action game and the shooting game, the difficulty of the game goes up,
when there is "Delay Problem". This might be misunderstood , saying that "Became less capable"
and "It is not more interesting than old times".

Of course, it is likely to become less capable to age.
However, "Played with emulator with the delay problem" is the cause in most cases.
Even if humans becomes 30 years old or 40 years old, they do not become weak too much.

For instance, please play the high-score attack (2min mode) of "Super Star Soldier"
seriously by both "Ootake" and "MagicEngine" and compare.
In "Ootake", I can do play that exceeds 500,000 points also even by my arm.
However, in "MagicEngine", I felt that excess by 500,000 points might be absolutely impossible.
Because the joypad-reaction of "MagicEngine" is late.
(standard WindowsXP SP2 environment)

This greatly influences only no "Difference of the score" it and "Happiness of the game", too.
Honestly, even if "Super Star Soldier" is played by "MagicEngine",
happiness of "Original Super Star Soldier" cannot be tasted.

To our regret, also in the commodity (emulator used) for the active service game machine,
the commodity with big "Delay Problem" exists.
(I think that the manufacturer that puts the title with big "Delay Problem" on the market
doesn't have love to the game at all.)

The delay also has danger of making it to the one to which even "Evaluation of the game"
was mistaken. Therefore, "Ootake" is checked always severely
"Whether it is possible to enjoy it by the sense similar to a real machine or not?".
Because it is a respect of minimum to a past masterpiece "PC Engine(TG16)" game.

Method of "Delay Problem" solution executed by "Ootake"
In "Ootake", the "Delay Problem" is solved(reduced) by using two strategies.

The First Strategy...
When "Instruction that sees the state of joypad" comes, "Latest state of the Windows Joypad Input" is always acquired.

In a lot of emulators, "State of Windows Joypad Input" is acquired only once in one frame(1/60 seconds). As a result, if the joypad is input according to the timing immediately after acquisition, about one frame input will be delayed.

It can be solved by the above-mentioned strategy.
However, this strategy makes the emulation processing considerably heavy.

In "Ootake" for that, if "Instruction that sees the state of joypad" comes continuously, the frequency that "State of Windows Joypad Input" acquires is limited.

Concretely, when "State of Windows Joypad Input" is acquired, "Processed scanning lines No." is recorded. And, when the following "Instruction that sees the state of joypad" comes, if "Be still processing it at same scanning lines No.", the acquisition of "State of Windows Joypad Input" is omitted.

This method makes the reaction early enough, and processing doesn't become heavy too much.
* However, the emulator certainly becomes heavy for this processing. However, I think that it is necessary processing to "The happiness of the game is not ruined".


The Second Strategy...
The emulated processing is delimited by "about 1/240 seconds (1/60 seconds are divided into four)" unit. As a result, operation approaches "State of the passage of time in a real machine".

When the emulator works by "PC that there is room at the processing speed", it becomes as shown in figure below (Flow of 1/60 seconds).

->[ Emulate Processing ]->[ Wait V-Sync Processing (Rest) ]->[Draw Processing]->(To head)

In a word, the time of "Rest until the V-Sync signal comes" becomes long. (Oppositely, in "PC that there is no room in performance", this "Rest" decreases, and the time of "Emulate Processing" becomes long.)

At the time of this "Rest", the input judgment of joypad cannot be done. As a result, the period when joypad can be input narrows. It causes to be input delaying one frame.

Then, the method like the figure below (flow of 1/60 seconds) is executed in "Ootake".

->[Emu.]->[Rest]->[Emu.]->[Rest]->[Emu.]->[Rest]->[Emu.]->[Rest]->[Draw]->(To head)

In a word, the "Rest" is put every about 1/240 seconds (Divide without taking a rest to one degree). As a result, the flow of the passage of time near a real machine is made. The input judgment of joypad is prevented being delayed by this method.

Moreover, not only the reaction of the joypad but also the accuracy of emulation(Timer interruption etc.) rises by this processing.

Especially, this is important in the reproduction of the tone of PSG (wavy memory + noise) sound.
* In "Ootake", when "Light PSG" is chosen by "Volume" menu, it becomes usual operation every 1/60 seconds.
Thanks and best regards

Last edited by Dr.Venom; 06 April 2011 at 21:58.
Dr.Venom is offline  
Old 29 March 2011, 21:36   #2
Retro-Nerd
Missile Command Champion

Retro-Nerd's Avatar
 
Join Date: Aug 2005
Location: Germany
Age: 50
Posts: 12,268
Quote:
How do you feel about the two strategies proposed by Ootake to minimize possibilities of input lag with emulation on a PC?
I think this was a lot of nonsense. None of his suggestions improved something input-lag related in Ootake+Windows 7. The controls works as good/bad as in all other emulators, a bit less precise as on real hardware.

Last edited by Retro-Nerd; 29 March 2011 at 21:52.
Retro-Nerd is offline  
Old 30 March 2011, 08:46   #3
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 47
Posts: 25,382
I'll have to agree. IMHO too much generalization without any real proof. Probably only specific to Windows XP or used drivers or hardware or..

WinUAE checks input at least 4 times/frame. (it is pseudo-random to handle programs that read mouse/joystick more than once/frame)

Try setting WinUAE Display panel buffering to "no buffering". It may cause other issues but it should also decrease lag.
Toni Wilen is offline  
Old 31 March 2011, 09:39   #4
Minuous
Coder/webmaster/gamer
Minuous's Avatar
 
Join Date: Oct 2001
Location: Canberra/Australia
Posts: 2,375
It just goes to show, Windows has been getting gradually worse for the past 10 years or so. There's not much Toni or I can do, as emu coders, about these kinds of host-side issues, it needs fixing at Microsoft's end.
Minuous is offline  
Old 31 March 2011, 10:02   #5
Hewitson
Registered User
Hewitson's Avatar
 
Join Date: Feb 2007
Location: Melbourne, Australia
Age: 39
Posts: 3,747
For the past 10 years? 3.1 was the last decent version as far as I'm concerned.
Hewitson is offline  
Old 31 March 2011, 10:27   #6
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 47
Posts: 25,382
That stupid "here we go again" OS discussion stops now.
Toni Wilen is offline  
Old 03 April 2011, 16:19   #7
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Hi Toni,

Thanks for your reply. It's good to know WinUAE already checks input status multiple times a frame. Can you shortly explain a little bit about the "at least 4 times a frame pseudo randomly"; why is it "at least" and what determines how many times a frame it actually polls the input state?

I've tested the "no buffering option". This gives the fastest input response but also leads to screen tearing, which makes me prefer the double buffer for now.

I've done some additional testing which also give rise to some additional questions. This is a bit technical but I hope you'll bear with me.

I found out that the input response seems to be dependant on the specific timings used in the screenmode:
  1. lag/floating movement is introduced when the vertical refresh deviates too much from 50hz
  2. and/or (possibly, i'm not fully sure) when there is mismatch in the horizontal line time and vertical frame time and porch/sync timings versus a real amiga.
I would like to know if I can enhance the timing specifications in the current modeline I use to get as close as possible to true Amiga/WinUAE PAL output. I also have a question/suggestion which might get the input lag to very low (i.e. no-buffering) in combination with no screen tearing. Here goes:
  1. What is the exact vertical refresh of the video output of a PAL Amiga? Is it 50hz, or (rounded) 50,08 hertz. The last one would be the case for a real Amiga driving a TV in progressive mode with 312 total lines and horizontal line time of 64us (microseconds), or would it not?
  2. What is the exact vertical refresh rate of WinUAE? Is it exact 50hz or 50,08hz or is it varying a bit because of cycle-exact issues?
  3. For the timings, can you help with the following:
    • What is horizontal line time (in microseconds). Is it 64 us (microseconds)?
    • How is horizontal line time build up through pixel clock and total pixels? Is the pixel clock 14,28571Mhz and is the total pixel number per line 916?
    • How long are the horizontal front- and backporch and sync time in microseconds?
    • If I translate those to 'pixel time' at the specified pixelclock (14,29Mhz), at what horizontal pixel number does the horizontal sync start and how many pixels does it take? Are these values 767 and 67?
    • For vertical specs: What are the total number of lines, is it 312 or 313?
    • How long is active frame time, is it 288 lines?
    • How long is the vertical blanking time? Is it 20 lines?
    • How long is the vertical sync time? Is it 8 lines?
    • Where does the vertical sync start, is it at line number 290? Comparing screen positioning with my real amiga seems to suggest this.
    • How many lines are the vertical front and back-porches?
  4. Final question ( the more tricky part? )
The modeline in combination with the way soft15khz / the graphic card driver handles it seems to round the specified pixelclock (14,28571Mhz) to two decimals (14,29) and probably gives rise to additional roundings somewhere, which makes actual refresh time slightly different from the theoretical requested time.

As example, the current modeline I use is:

modeline "736x288@49,995" 14,28571 736 767 834 916 288 290 298 312 -hsync -vsync

Theoretically this should give 49,995hz. But, with the tool Freqtest (http://www..com/?lycrjcm55j37n) you can test for the exact vertical refresh rate. Which comes out at a real refresh rate of 50,028087hz.

Would it be possible to feed this measured real refresh rate back into WinUAE, as long as it is within certain boundaries (e.g. allowing adjustments to the pixelclock rate in WinUAE slightly - within the timing specification range of the real analog timings) to get an EXACT match of the WINUAE output and the real screen refresh? Then by setting the display settings to "no buffering" AND enable vsync, would we then not get the closest thing possible to a real amiga, i.e. lowest possible input lag ("no buffering"), but with steady output display, like a real amiga because of the v-sync and EXACT match in refresh rate?

It might be a bit far fetched, but I would love to hear your thoughts on this.


Background information on the modeline
(you may already be familiar with this).

Soft15Khz creates custom screenmodes based on what is called a "modeline". This modeline specifies both the horizontal and vertical active pixel time, front and back porch time and the horizontal and vertical sync time (horizontal and vertical blanking). The modeline specifies it in the following format:

Modeline <NAME> <PIX_FREQ> <H_AKTIV> <H_START> <H_END> <H_TOTAL> <V_AKTIV> <V_START> <V_END> <V_TOTAL> <OPTIONS>

PIX_FREQ stands for the Pixel Frequency, in MHz. Important to know is that the frequency actually seems to get rounded to 2 decimals by the PC, so e.g. 14,28571Mhz seems to get rounded to 14,29.

H_ACTIV stand for the active Pixels horizontally
H_START stands for the beginning of the Synchronisation on the horizontal line.
H_END stands for the end of the Synchronisation on the horizontal line.
H_TOTAL stand for the total pixels on the horizontal line.

V_ACTIV stands for the active lines vertically.
V_START stands for the beginning of the Synchronisation vertically
V_END stands for the end of the Synchronisation vertically.
V_TOTAL stands for the total number of lines vertically.

With the tool FrequencyTest (http://www..com/?lycrjcm55j37n) you can actually test what the PC returns exactly as refresh rate for each created screenmode. This gives the ability to then tweak the modeline (and can be quite a time consuming effort...)


Quote:
Originally Posted by Toni Wilen View Post
I'll have to agree. IMHO too much generalization without any real proof. Probably only specific to Windows XP or used drivers or hardware or..

WinUAE checks input at least 4 times/frame. (it is pseudo-random to handle programs that read mouse/joystick more than once/frame)

Try setting WinUAE Display panel buffering to "no buffering". It may cause other issues but it should also decrease lag.

Last edited by Dr.Venom; 03 April 2011 at 16:30.
Dr.Venom is offline  
Old 04 April 2011, 16:49   #8
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 47
Posts: 25,382
I may be missing something because I don't see the point of all those questions..

Maybe this is what you wanted to know:

"Real world" and emulated hardware rates are completely separate. You can set "real world" rate whatever you want it to be without having any effect on accuracy of emulated hardware.

You won't gain much by matching real PAL/NTSC modes. At least as long as "real world" refresh rate is close enough so that sound pitch and perceived emulation speed feels good enough.

Internally emulation runs in bursts (when not in fastest possible mode), emulate one frame as fast possible, show the frame, wait until one "real world" worth of frame time has passed, emulate next frame and so on..

This makes regular input polling quite impossible, requiring some tricks to get smooth mouse counter changes during single frame instead of one or more sudden jumps.

Do I still need to answer all those questions? (I don't even know all answers) Get the Agnus datasheet from EAB file server, it contains PAL and NTSC timing diagram.

Some quick answers:

312 and 313 both are correct (software selectable in all Amiga models), in interlace mode 312 and 313 automatically toggles. 313 is the default.

Visible lines: 26 to last line (PAL), 21 (NTSC)

Vblank start: last line
Toni Wilen is offline  
Old 05 April 2011, 21:54   #9
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Many thanks for answering and pointing me to the Agnus datasheet on the EAB fileserver, very helpful. Also good to know how the "real world" and the emulated hardware are related.

So no, you don't have to answer all those questions . But, I do have another set of questions

Some additional explanation
I'll try and explain what I was/am after. As said I have attached my old Amiga monitor as a second monitor to the PC. This specifically for the display of native ECS/AGA screenmodes (games and demos); workbench is run in a window on my HD LED monitor. The Amiga monitor is driven through a 15Khz screenmode on the PC side, inserted in the ATI driver through soft15khz. This display setup gives fabulous replication of the old Amiga feeling for games and demos, while not giving away the benefit of running a state of the art HD workbench setup on the LED PC monitor.

To get the same pixel width and timing of a real Amiga/Agnus driving an old CRT TV or Monitor directly, it's (for me) necessary to insert a screenmode on the PC-side that also has the exact same timing specifications as the real Agnus. That was one reason why I was asking all those questions. Exact emulation to my mind also comes down to the timing of the emulated hardware and its output to the "real world" behaving the same as real hardware would.

OK, so what I found out from the Agnus data sheet were the missing pieces for creating a screenmode with the exact video timing specifications of a real Agnus. Without going too much indepth on the exact synchronization timings, the most important timings are:

total horizontal line time (in microseconds) and how this converts to pixelclock (pixeltime) and total number of pixels, and total number of lines. Pixelclock determines how "square" the pixels look on screen.

Taking those from the datasheet:
pixeltime = 70,484184nS (=0,070484184 microseconds (uS) or [1/0,070484184] = 14,18758Mhz, this last being the relevant pixelclock for the Modeline)
total number of pixels on a line = 908
This means total line time of 908 x 0,070484184 uS = ~63,99964 uS
Total number of lines vertically is 313 (the default, I would wonder if there is any commercial software actually software changing this to 312, I cannot think of a benefit)
Actual screen refresh rate of a PAL Amiga then comes out at ~63,99964 x 313 = 20031,887 uS per frame. Number of frames per second then comes out at 1000000/20031,887 = 49,92041 Hz.

The Questions:

Question 1: a) When WINUAE display setting is set to vsync-off, to what timer does it sync screen refresh rate then? and b) is it actually outputting 49,92041 frames per second in that timer mode?

Question 2: Theoretically, if I would set WinUAE display to 'no-buffering' and WINUAE is outputting exact 49,92041 fps AND the "real world" monitor is running at a refresh rate of 49,92041 fps, would that not lead to an exact match and thus ZERO screen-tearing?

This would be a dream as we could have the no-buffering option enabled, thus having almost non-existant input lag on the joystick*, AND have no screen-tearing! That would make the WINUAE setup in combination with an old 15Khz monitor idistinguishable in both look and feel from the real setup! ... I.e. My Dream (tm)

* i.e. when the other sources of lag have been killed on the PC side, see my earlier post

OK, before I go further I need to stress that I would like to have v-sync off in all cases, as in my experience having this enabled leads to unacceptable (IMHO) input lag. Especially in combination with Direct3D vsync. In DirectDraw mode it actually doesn't seem to add to input lag, but unfortunately it leads to "wobbly" screen when used in combination with "no-buffering". I'm still a bit puzzled why the v-sync so adversely impacts the input lag.

Question 3: The Dilemma
So now you probably understand my dream, but there is a dilemma. Inserting a 15Khz screenmode into the ATI driver set, based on Agnus timing specification, leads to some very small roundings in the pixelclock, which makes the real world rate differ slightly from the requested refresh rate.

As example, the current screenmode that I use requests the exact Amiga timing, thus being 49,92041 Hz. But if I use the previously mentioned FreqTest to measure the real world refresh rate of the monitor, then it comes out at a slightly different 49,91769 Hz. THUS, dilemma! Because if WinUAE is outputting exactly the PAL Agnus rate, i.e. 49,92041 Hz, then we have a mismatch which will result in screen tearing.

The above is probably why in my current setup I have blazing fast input response with the no-buffering option enabled (vsync-off), but I still suffer from slight screen-tearing. There is a line (tearing) very slowly rolling upwards the screen, then out of sight and then starting again at the bottom.

OK, so finally, given that the answers to the above questions are yes (which I would welcome ;-) ), here comes

Question 4: The Solution?
There already is a FPS slider in the display panel of WinUAE. But it can only be incremented in 1 frame per second. Would it be possible to make it such, or add a (config file) option, which makes VERY granular adjustment to the FPS (or refresh rate timing) possible, such that people (like myself) would be able to adjust the WinUAE screen refresh rate just that tiny bit to make it exactly match the "real world" refresh rate? As said, that would lead to the BIG benefit of being able to run WinUAE with the no-buffering option, i.e. least amount of input lag, AND benefit from screen output with no screen tearing. The best of all worlds and the most close to real hardware

E.g. my current real world rate is ~49,91769 Hz, so the adjustment slider should be able to let me make an adjustment as granular as [49,92041] minus [49,91769] = -0,00272 fps.

Or maybe another possibility would be that WinUAE allows for an option like "real world refresh rate" to be set somewhere, which it then uses for timing the display output to?

I hope that I actually made some sense here. Looking forward to your thoughts on this.

Thanks and best

Last edited by Dr.Venom; 05 April 2011 at 22:36.
Dr.Venom is offline  
Old 09 April 2011, 17:07   #10
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 47
Posts: 25,382
Quote:
Originally Posted by Dr.Venom View Post
Question 1: a) When WINUAE display setting is set to vsync-off, to what timer does it sync screen refresh rate then? and b) is it actually outputting 49,92041 frames per second in that timer mode?
In non-vsync modes it something "close enough". WinUAE sleeps until time to wait is less than few milliseconds, then it busy waits until QueryPerformanceCounter() Windows API minus count of previous frame >= one frame worth of counter ticks.

Note that time base of QueryPerformanceCounter() is hardware specific (it is listed in winuaebootlog.txt)

Quote:
Question 2: Theoretically, if I would set WinUAE display to 'no-buffering' and WINUAE is outputting exact 49,92041 fps AND the "real world" monitor is running at a refresh rate of 49,92041 fps, would that not lead to an exact match and thus ZERO screen-tearing?
Close enough but 100% exact match is probably impossible.

Quote:
This would be a dream as we could have the no-buffering option enabled, thus having almost non-existant input lag on the joystick*, AND have no screen-tearing! That would make the WINUAE setup in combination with an old 15Khz monitor idistinguishable in both look and feel from the real setup! ... I.e. My Dream (tm)
It is impossible to have lag free software emulation. Time to emulate single frame is always there (which is only 5%-20% of whole frame time in modern PC in A500 mode but it is still there)

Sound is another problem, unbuffered sound output is impossible.

Quote:
Question 4: The Solution?
There already is a FPS slider in the display panel of WinUAE. But it can only be incremented in 1 frame per second. Would it be possible to make it such, or add a (config file) option, which makes VERY granular adjustment to the FPS (or refresh rate timing) possible, such that people (like myself) would be able to adjust the WinUAE screen refresh rate just that tiny bit to make it exactly match the "real world" refresh rate? As said, that would lead to the BIG benefit of being able to run WinUAE with the no-buffering option, i.e. least amount of input lag, AND benefit from screen output with no screen tearing. The best of all worlds and the most close to real hardware
This can be implemented.
Toni Wilen is offline  
Old 14 April 2011, 14:50   #11
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Quote:
Originally Posted by Toni Wilen View Post
In non-vsync modes it something "close enough". WinUAE sleeps until time to wait is less than few milliseconds, then it busy waits until QueryPerformanceCounter() Windows API minus count of previous frame >= one frame worth of counter ticks.

Note that time base of QueryPerformanceCounter() is hardware specific (it is listed in winuaebootlog.txt)
Thanks for explaining. If I understand correctly then the time base for this function is formed by QueryPerformanceFrequency (QPF). Frame time is then specified/calculated as number of counter ticks divided by the QPF.

From a little bit of research I understand that the High Precision Event Timer (HPET/QPF) is dependant on the hardware (as you pointed out), and in modern day hardware it should provide accuracy up to a few hundred nanoseconds per counter tick. At least if I may believe the Intel/Microsoft specification, saying that the HPET needs to be at least 10Mhz. (I will be able to check the winuaebootlog.txt of my own setup tomorrow. My guess is that it provides good resolution of the counter since the hardware is quite new).

Quote:
Close enough but 100% exact match is probably impossible.
I think I understand; the more/less precise the QPF the more/less close we’ll get to a 100% match?

I guess it also depends a bit on the refresh speed of the screenmode, which might line up with the time units from the QPF. So it might be worthwile for users who are after an exact 100% match to create a few screenmodes for the CRT screen with slightly different timings and then lining them up through the granularity of the WinUAE timing adjustments… Through such an iteration chances of an (almost) exact 100% match may be increased even further…

Quote:
It is impossible to have lag free software emulation. Time to emulate single frame is always there (which is only 5%-20% of whole frame time in modern PC in A500 mode but it is still there)

Sound is another problem, unbuffered sound output is impossible.
I understand that there will always be some loss in lag because of software emulation. Does the sound buffer setting directly affect the amount of joystick/keyboard input lag, or does it more or less only affect the time the sound lags to onscreen events?

Also, on a sidenote, I read on one of the threads that you’re using the Auzen X-Fi Prelude, is that specifically because of it being a card with low sound lag or was it for other reasons? (I’m using a Creative X-Fi.)

Quote:
This can be implemented.
COOL

I have some ideas/suggestions for the implementation, it would be great if you could consider them.

I think primarily it would be good to provide the finest possible granularity in adjusting the screen refresh timing. That would mean that it would be possible to adjust the frame time in granularity steps of +/- 1 counter tick.

So as an example (just to be clear on this), suppose a setup has a QPF that provides counter ticks of 1 microsecond, then by default a 50hz winuae screen refresh would take 20.000 counter ticks. This default value would then be adjustable through the config file, so the user can change it to e.g. 20.001 ticks per frame (leading to a refresh rate of 49,9975Hz). If on a setup with a QPF of counter tick time of 0,2 microsecond, then the default would be 100.000 ticks per frame. A change with 1 or more ticks in the config file would then provide even more granularity, e.g. an adjustment to 100.001 ticks would lead to 49,9995Hz refresh rate.

Taking it one step further would be accessability through the GUI (that would be really cool).

Basically you currently have an FPS slider that increments with +/- 1 frame and defaults at 50. What could be done is to add two sliders below it that add high precision adjustment. The first added slider adjusts the counter ticks per frame in increments of 10 ticks. The second slider below it provides granularity in increments of 1 counter tick per frame. Next to these sliders, it would show how an adjustment to these sliders translate in effective refresh rate.

A good idea would probably be to create a bandwidth/boundary for the high precision adjustments to about +/- 0,1Hz. That would also make sense from a hardware emulation perspective, because basically the native screenmodes of the Amiga vary within these boundaries:

Lowres/hires/superhires non-laced screenmodes vary depending on the total number of lines vertically (312/313). For 313 lines we’re at a refresh rate of ~49,92Hz. For 312 lines we’re at ~50,08Hz.

Interlaced mode has 625 lines interlaced, so refresh rate runs at exactly 50Hz (i.e. equivalent of progressive 312,5 lines x 64 microseconds per line equals 1/50th second per frame).

The boundaries for the high precision adjustments would thus range from ~49,92Hz --- 50Hz --- ~50,08Hz. To accommodate for some rounding in reality with modelines used to create screenmodes for CRT monitors/TV’s, it would probably be good to make the range between 49,90hz --- 50Hz --- 50,10Hz.

Also to make it robust for the other settings in WinUAE, primarily the “Auto VSync”, which makes screenmodes automatically change from 50 to 60hz, it would be good to make it such that the high precision adjustments are saved per individual screenmode. That way users can make precision adjustments to both the PAL and NTSC refresh rates and have it work in combination with the AutoVSync feature.
Dr.Venom is offline  
Old 15 April 2011, 12:45   #12
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 47
Posts: 25,382
Quote:
Originally Posted by Dr.Venom View Post
I think I understand; the more/less precise the QPF the more/less close we’ll get to a 100% match?
Not really. Main problem (which I forgot to explain previously) is that in non-vsync mode display is "synced to sound" which means refresh slightly changes depending on sound buffer state to prevent sound glitches.

Sound glitches are impossible to miss, tiny refresh rate changes are practically impossible to see (in non-vsync modes)

In vsync mode sound is synced to display and sound rate is slightly adjusted if it drifts too far.

Unfortunately it is impossible to have both display and sound in 100% sync when running on multitasking OS.

Quote:
I understand that there will always be some loss in lag because of software emulation. Does the sound buffer setting directly affect the amount of joystick/keyboard input lag, or does it more or less only affect the time the sound lags to onscreen events?
It has no effect on input but it can have affect on display (see above)

Quote:
Also, on a sidenote, I read on one of the threads that you’re using the Auzen X-Fi Prelude, is that specifically because of it being a card with low sound lag or was it for other reasons? (I’m using a Creative X-Fi.)
I have Auzen X-FI Forte now (which is more or less same but PCIe connector). I only chose it because it has Dolby Digital Live/DTS Connect (on the fly multichannel to DD/DTS digital output) and it has X-FI chip without being standard Creative

Quote:
I have some ideas/suggestions for the implementation, it would be great if you could consider them.
It is highly unlikely I ever add any non-integer refresh rate GUI adjustments unless it can be done without adding new GUI elements.

Quote:
Also to make it robust for the other settings in WinUAE, primarily the “Auto VSync”, which makes screenmodes automatically change from 50 to 60hz, it would be good to make it such that the high precision adjustments are saved per individual screenmode. That way users can make precision adjustments to both the PAL and NTSC refresh rates and have it work in combination with the AutoVSync feature.
This does not need any adjustments because it is the display card that sets the refresh rate, not WinUAE. You need to adjust it using your display hardware specific utilities.

Note that Windows only shows (and all APIs return) integer refresh rates even if real rate has decimal part..

Non-integer refresh rates will be supported in next beta but only by manual configuration file editing.
Toni Wilen is offline  
Old 15 April 2011, 21:17   #13
andreas
Zone Friend
 
Join Date: Jun 2001
Location: Germany
Age: 48
Posts: 5,857
Send a message via ICQ to andreas Send a message via AIM to andreas
Lightbulb

Quote:
Originally Posted by Toni Wilen View Post
It is highly unlikely I ever add any non-integer refresh rate GUI adjustments unless it can be done without adding new GUI elements.
Without adding any will be near-impossible. But a simple "no-clutter" way:
Make a neat "Fine tune" checkbox which (obviously) is not checked by default.
Step = +/-1

Once user checks Fine tune, step will change in fractions (FIXED-value; the choice is yours)
The downer is that you might need a surgeon's hand to fine tune

"Deluxe version" utopia:
Clicking "Fine tune" opens "secret" GUI element with an input field to user-set the step, which the slider will use.

Last edited by andreas; 15 April 2011 at 21:23.
andreas is offline  
Old 15 April 2011, 21:38   #14
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Quote:
Originally Posted by Toni Wilen View Post
Not really. Main problem (which I forgot to explain previously) is that in non-vsync mode display is "synced to sound" which means refresh slightly changes depending on sound buffer state to prevent sound glitches.
Does that mean that refresh rate is never at a constant rate, given that the sound buffer seems to vary a bit (with the on-screen leds the sound buffer is averaging around 0%, but changes quickly between say +/-30% on sound buffer setting 3).

Quote:
I have Auzen X-FI Forte now (which is more or less same but PCIe connector). I only chose it because it has Dolby Digital Live/DTS Connect (on the fly multichannel to DD/DTS digital output) and it has X-FI chip without being standard Creative
That's a good reason, and a good habit too

Quote:
It is highly unlikely I ever add any non-integer refresh rate GUI adjustments unless it can be done without adding new GUI elements
OK. (Edit: missed Andreas' suggestion while typing my own reaction. I do think his suggestion makes (a lot ) of sense. The surgeons hand to fine tune it would not be needed if the slider (once clicked) can be moved by the arrow keys.
Quote:
Non-integer refresh rates will be supported in next beta but only by manual configuration file editing.
Thanks! Looking very much forward to testing it.

One last question I'm wondering a bit about (on-topic). Do you know by any chance how often a real amiga polls the joystick input? Or how input polling works in a real Amiga?

What I'm wondering is something you wrote previously. Namely that emulation of one frame takes about 5-20% of frame time on a modern PC. Is the input polling also done within that 5-20% emulation time frame? If so, does that mean that there's no input polling about 80% of frame time?

If it works that way wouldn't the strategy proposed by Ootake indeed make some sense to chop up frame emulation time in a number of blocks or "sub-bursts"? Such that the "wait" period is actually not a long empty waiting period, but instead is a chain of blocks/bursts of emulation, with regular input polling inbetween?

So instead of this method:

--[Emu(input)-20%] -> [Rest-80%]->[Draw]

one would have (for example)

-> [Emu(input)-10%]->[Rest-10%]->[Emu(input)-10%]->[Rest-10%]->[Emu(input)-10%]->[Rest-10%]->[Emu(input)-10%]->[Rest-10%]-> [Emu(input)-10%]->[Rest-10%]->[Draw]->

The advantage being that this way always the latest relevant user input would be known by the emulator, and chances of missing/delaying input reaction by a frame or more are minimized. Or would it not?

OK, forget if I'm talking total bollocks here! Just being curious how this stuff works.

Last edited by Dr.Venom; 15 April 2011 at 21:50.
Dr.Venom is offline  
Old 17 April 2011, 18:45   #15
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Quote:
Originally Posted by Toni Wilen View Post
Non-integer refresh rates will be supported in next beta but only by manual configuration file editing.
Toni,

Many, many thanks for adding this option to the configuration file in the last beta.

I now understand what you meant by that a 100% match of refresh rates would probably be impossible, because of the relation with the sound sync. Nonetheless, after some iterations with various settings I found that this option helps greatly in minimizing screen tearing in the 'no-buffering' mode.

Also, to my great joy I additionally found a little 'trick' which completely puts away any visible screen tearing when playing a game... As said with some iterations it's possible to match the refresh rates in such a way that the screen tearing is brought down to a minimum. In this case only about 2-3 lines shift back and forth; most importantly these lines stay in the same place on the screen. Already a welcome advancement from the 'rolling' screen tearing that was the case without the floating point option... Interestingly, if you press F12 and OK a few times when running a game, then the 2-3 lines move upward or downward by a number of lines, but then stabilize again at that position. Now it's simply a case of pressing F12 and OK a few times until the "jittery" 2-3 horizontal lines move out of the visible screen area (into the vertical blanking area) and voila, with this trick I've just been playing fast shoot'm ups in 'no-buffering' mode (&thus fast input response), without any screen tearing at all!. A great joy

I can imagine you keep this as a configuration file option only, as my guess is that it's mostly of a benefit to users who are already willing to put some extra effort in configuring WinUAE.

There are some small issues, where the custom floating point refresh is lost or overwritten in the config, but I'll post them in the beta thread.

Cheers m8 and thanks again!

Last edited by Dr.Venom; 17 April 2011 at 19:05.
Dr.Venom is offline  
Old 01 July 2011, 01:25   #16
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Further thoughts on bringing input response to "perfection"

Toni,

Since you've created the big step forward by implementing the vsync(new), I have been thinking about that related topic, input response. With the new vsync in combination with the no-buffering option, mouse and joystick input response is working really really well already and I was thinking "is it perfect already, or is it close to perfection?" And I came to realise it's very close, but possibly still one step away from total "perfection" .

I hope you're open to some thoughts on the current implementation and a suggestion to consider for possible future exploration or implementation.

This is 100% about the software side of things related to input response. This is explicitly not about lag caused by hardware or devices. These have been discussed a number of times in other threads on the forum.

Before I come to my thoughts and suggestion, I'll compare the real Amiga and the emulated Amiga by WinUAE below.

Input response is about how much time is between the user pushing a button and seeing it reflected on-screen. Ideally the moment he pushes a button or moves the stick to the left we see (for example) the spaceship in Hybris fire a bullet or move to the left.

So in the ideal world, there is no delay between moving the joystick in a direction and the computer outputting the image reflecting the change to direction. This ideal world will feel the most responsive to the user regardless of human reaction time (which is related but does not change this story or conclusion).

How a real Amiga comes close the ideal world.

A real Amiga comes very close to ideal world. It basicly polls input and renders the screen image in the vertical blank, and starts displaying it inmediately after. So given that a PAL Amiga runs at ~50hz, every frame takes about 20 milliseconds (20ms). This is the same as a CRT monitor or TV (the devices Amiga's were developed for displaying on) would take for the electron beam to display the whole picture.

Code:
Real Amiga
I=Input
E=Emulate frame (read: 'render frame' in case of real Amiga)
O=Output/Display

      Button press    Visual response
               v         v
                IEO         IEO        IEO        IEO     
       #----------#----------#----------#----------#--
(ms)   0         20         40          60
The above example shows that if the user pushes a button just before vblank (IE), he'll see a response within 20ms.

But if the user pushes a button just after I, then he'll not see a response until (at most) 40ms later:
Code:
        Button press          Visual response
                   v                  v
                IEO         IEO        IEO        IEO     
       #----------#----------#----------#----------#--
(ms)   0         20         40          60
So for a real Amiga, the average lag will be about 30ms. What's important is that in the ideal world the "poll input, render frame" is directly attached to "display frame". Basicly pushing a button is very directly followed by displaying the frame.

Now consider the emulated Amiga (WinUAE), which emulates each frame in a "burst" and then waits for vblank before displaying the image, runs the next burst in parallel, waits, etc.. For a modern computer the burst (IE) takes about 20% of frame time when emulating an A500, so about 4ms. For sake of simplicitly I assume this 4ms to be ~0. This makes the example easier to follow, and it doesn't change the conclusion if I were to include it.

Code:
Emulated Amiga
      Button press    Visual response
        v                  v
       O IE       O IE       O IE       O IE       O     
       #----------#----------#----------#----------#--
(ms)   0         20          40         60
The above example shows that if the user pushes a button just before input polling (IE), he'll see a response within 40ms.

But if the user pushes a button just after I, then he'll not see a response until (at most) 60ms later:
Code:
           Button press      Visual response
            v                         v
       O IE       O IE       O IE       O IE       O     
       #----------#----------#----------#----------#--
(ms)   0         20          40         60
So for the emulated Amiga, the average lag will be about 50ms. Or 20ms longer than the Real Amiga / ideal world.

Now what's causing this additional lag?? Basicly what's making the difference is that the emulated Amiga does input polling and frame emulation (burst) at the beginning of a frame, then waits for vblank, which can take up-to 10 to 15ms. With a real Amiga the input polling and frame render is all done in vertical blanking and thus directly attached/followed by display of the frame.

Code:
Real Amiga

                   IEO        IEO        IEO        IEO     
         #----------#----------#----------#----------#--
  (ms)   0          20        40         60

  Emulated Amiga

          IE        O IE       O IE       O IE       O     
         #----------#----------#----------#----------#--
  (ms)   0          20        40         60

How to get the Emulated Amiga closer to the ideal world


Give the analysis above, it's clear that to get the Emulated Amiga close(r) to the ideal world / the Real Amiga, the IE needs to move to the right, that is get as close as possible to the start of displaying the frame.
Code:
The road to improvement? 

           IE ------>O IE------>O IE------>O IE------>O     
          #----------#----------#----------#----------#--
   (ms)   0          20         40        60

such that the emulated Amiga equals the ideal world / a Real Amiga:

                    IEO         IEO        IEO        IEO     
           #----------#----------#----------#----------#--
    (ms)   0          20         40        60
This could create up-to 20ms of lower input latency for WinUAE! IMHO, a massive move towards replication of real world hardware.

But Toni, I understand that it's not that easy . With emulation you basicly don't know how long it is going to take a frame to emulate. So if you would stick emulating the frame 5 ms before displaying the frame, and rendering of the frame does take 6 ms, then it misses the vblank (continually) and you'll be further away from home... You also don't know how fast the host computer is that is emulating the frame, etc, etc.. This is why currently it needs to start emulating the next frame in parallel directly after displaying a frame, instead of directly before displaying a frame (the ideal world), to make sure the frame is emulated in time to catch vblank.


A suggestion for exploring possible improvement

I have been thinking about a possible solution to the above problem, and how to get the emulated world closer to the ideal world. Here's my idea, which IMHO is an elegant way to come closer to the real world the faster the host computer is, but not being a disadvantage to slower computers.

How I understand it WinUAE does the following: Poll input/emulate --> wait if time to next vblank >= 2/3 of frame time -->display

So for each frame it renders it actually knows how long it took to render the frame. Since with the new VSync the timing for vblank has become quite accurate this opens the door to new opportunities:

What if WinUAE renders the frame and instead of putting in a wait for the next frame it:
A) evaluates how long it took to render the frame
B) evaluates the time left to next vblank
C) re-renders the last frame (including fresh poll of input) if: A < B
D) re-iterates a-b-c until time to next vblank < A

Thus bringing the "IE" closer in front of the "O" as mentioned above, or in other words closer to the ideal world:
Code:
           ------>IE-O------>IE-O------>IE-O------>IE-O     
          #----------#----------#----------#----------#--
   (ms)   0          20         40         60
and by doing so lowering average input latency quite significantly, possibly in the order of ~10ms (rough guess..)!

This would be of particular use to host hardware that has the power to emulate (burst) a frame in a very short time, because the shorter the time, the more close it will get in iteration to replicating real hardware (that is the more close the "IE" will get in front of the "O"). I for one would be happy to "sacrifice" the current unused CPU capacity when running in cycle exact A500 mode, for getting input latency even lower then it already is, and closer to "perfection"...

Last but not least: Would this be a viable option to explore within WinUAE?

Last edited by Dr.Venom; 01 July 2011 at 01:35.
Dr.Venom is offline  
Old 01 July 2011, 12:14   #17
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 47
Posts: 25,382
Quote:
Originally Posted by Dr.Venom View Post
Would this be a viable option to explore within WinUAE?
Yes. But..

Emulation time usage can suddenly jump if program uses some chipset heavy effects which is impossible to predict.

I guess it is possible to configure static "delay" (for example 10ms) and hope that remaining time is enough to emulate complete frame.

(btw, nice to see someone who really understand emulation "theory" and knows technical reasons why it must work that way )
Toni Wilen is offline  
Old 01 July 2011, 18:57   #18
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Quote:
Originally Posted by Toni Wilen View Post
Yes. But..

Emulation time usage can suddenly jump if program uses some chipset heavy effects which is impossible to predict.
I understand, and that's also specifically what the proposed solution is addressing .

I'll try to explain/clarify the suggestion in my previous post a bit better. By always rendering at least frame "i" (for i=1 to ~50), you'll know at each frame-interval after emulating, how long it took to emulate the frame. So at each individual frame of "i" - let's take frame i=1 as example - after emulating it, you'll know how long it took to create the frame (whatever was emulated in the frame, heavy custom chipset stuff and all). Given that you know how long it took to render the frame, this then enables you to re-emulate "IE" for frame i=1, let's call it IE*, as long as time to next vblank is longer than time to re-emulate IE.

So you'll get as as close as possible to "O" by the following algorithm:

For i=1 to ~50, example i=1
  1. Emulate Frame1; calculate time it took to emulate the frame -> Time1
  2. Calculate time left to next vblank -> Time2
  3. Re-emulate the current frame with new poll of input as long as Time1 < Time2
  4. "Next i"
This process is probably the only "safe" way to get "IE" as close as possible to "O", without taking chances on missing vsync, for instance by putting in a static delay and hoping for the best.

So to clarify further, the flow of emulation will be:

Code:
Current situation

            IE        O IE        O IE        O IE        O IE
          #-----------#-----------#-----------#-----------#
   (ms)   0          20          40          60

New situation (an example)

IE*= re-emulate current frame with fresh poll of input 

           IE IE* IE* O IE        O IE  IE*   O IE IE* IE*O    
          #-----------#-----------#-----------#-----------#
   (ms)   0          20          40          60

When there are less IE* in a frame, it means the frame emulation
time is long because of "heavy custom chipset" processing.
In the "worst" case there is no IE*, i.e. just rendering the plain
frame (IE) takes all frame time
Code:
In the most ideal situation "IE*" will end up attached to "O":

       IE*O        IE*O        IE*O        IE*O        IE*O
          #-----------#-----------#-----------#-----------#
   (ms)   0          20          40          60
As said, the suggested algorithm above will consume more cpu time, but on average the new situation will get input latency even closer to the "ideal world". Nice thing to think about is that theoretically with this algorithm, the emulated Amiga can become even better than a Real Amiga. This is because a real Amiga takes ~4ms for vertical blank before displaying. While more and more powerful PC's in the future will be able to emulate each Amiga frame within the range of <1ms and display directly after that. Or in other words: "IEO" that surpasses the real thing in low latency response! .
Dr.Venom is offline  
Old 01 July 2011, 19:07   #19
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 47
Posts: 25,382
Quote:
Originally Posted by Dr.Venom View Post
re-emulate IE.
Going back in time is very complex operation (more or less same as restoring state file). _Everything_ needs to be reset to previous frame's state.
Toni Wilen is offline  
Old 01 July 2011, 19:35   #20
Dr.Venom
Registered User
 
Join Date: Jul 2008
Location: Netherlands
Posts: 485
Quote:
Originally Posted by Toni Wilen View Post
Going back in time is very complex operation (more or less same as restoring state file). _Everything_ needs to be reset to previous frame's state.
I already imagined it wouldn't be that easy... Apart from the work involved, would it be something that could theoretically work? Or would such an operation simply consume to much time/resources of the host computer to be effective for the purpose mentioned?
Dr.Venom is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Putty Input Lag - New Patch Possible? manic23 Games images which need to be WHDified 8 18 May 2013 21:27
[FS-UAE] Heavy input lag with VSync RealNC support.FS-UAE 2 07 July 2012 05:53
Input lag, soupy cursors, and whatnot twinbee support.WinUAE 3 05 June 2009 01:03
Lowest Spec for WinUAE ?? Methanoid support.WinUAE 29 20 March 2008 17:23
Joystick input lag Torkio support.WinUAE 3 06 March 2007 00:56

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:03.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.
Page generated in 0.10916 seconds with 13 queries