English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 08 April 2019, 11:55   #1
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
C2P Performance issues

Hi,

I recently started playing with C2P routines. I've read all the available docs on the subject, so I thought I'd have a go at writing my own from scratch. I've also got copies of the Azure and Kalms c2p's just for comparison.
What I'm struggling with is the following.. I've run bustest on my A1200/030-50mhz and on WinUAE(set to emulate 060, no jit, cycle-exact). This was to get some baselines numbers for maximum bandwidth to establish what is write and copy speed.

For here I ran some tests with DMA on/off etc and established the following numbers:
A1200 -> 7mb/s chip bandwidth
UAE -> 6.5mb/s chip bandwidth

320x256x8 buffer = 81,920 bytes

A1200 should be able to write the bitplanes(8) in AGA at 89FPS (verified with a tight write only loop) = 11.23ms for a full lores frame write.
UAE should be able to write bitplanes(8) in AGA at 84FPS (verified) = 11.9ms for a full lores frame write.

Fastmem on the A1200 has a longword bandwidth of 33mb/s
Fastmem on UAE has a longword bandwidth of 113mb/s

So for a copy-loop taking into account the fastmem reads:
A1200 should take 2.4ms to read 81,920 bytes from fastmem (I'm not taking any sort of data-cache access)
UAE should take 0.7ms to read 81,920 bytes from fastmem ..
in theory these should be faster, especially on 060 with a bigger data-cache.. but ballpark we get

A1200 = 11.23 + 2.4ms = 13.63ms
UAE = 11.9 + 0.7 = 12.6ms

Once again I verified these with a tight copy loop (DMA off) and that seems to be right, and in line with the 13ms referred to from other C2P docs.

So now, no matter which C2p routine I use, my own, the pre-existing ones.. with DMA off just to measure.. they get nowhere close to copyspeed. All 3 (Kalms,Azure and mine) take about 26ms on UAE..
Either I've missed something obvious, or perhaps this is a known issue in WinUAE? (I don't have a real 060 to test on).

Obviously none of these routines are optimised for 030, and would never achieve copy-speed there anyway so I'm ignoring the timing result there.

Can anyone think of a reason why the C2P would be so slow under WinUAE?
meeku is offline  
Old 08 April 2019, 12:31   #2
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,437
I can't be sure here, but it's possible that WinUAE emulates the slow interface between faster processors and chip memory. On a real A1200 (and IIRC A4000), almost all accelerators do not actually manage to write to chip memory at the same speed as a the base A1200.

Some, especially very fast, accelerators apparently barely manage more than 50% of that throughput.

I'm not entirely sure why this is, but my guess is that is has to do with the chip memory bus and accelerator fast memory bus running at different speeds and the required synchronisation for writing taking up extra time.
roondar is offline  
Old 08 April 2019, 15:16   #3
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
I don't think that is my issue in this case, as the straight memory write or copy loop are achieving the correct bandwidth. It's only when the C2P runs that this is no longer the case. The idea in the C2p is that each chip memory write takes an awfully long time, so they're interleaved with the c2p processing code (to not so much hide the latency) but ensure that the merging/c2p logic is basically for free while you're waiting for the chip writes.. yielding a c2p which is the same speed as a normal copy loop. It's almost as if this memory write pipelining isn't happening in winuae, like it stalls on the chip write even though the cpu could continue with useful work.
meeku is offline  
Old 08 April 2019, 15:35   #4
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
You may try commenting out instructions or blocks of instructions, to see the impact of each on timing (= what's "covered" by chipmem writes and what's not).
meynaf is offline  
Old 08 April 2019, 15:42   #5
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,437
To be fair, WinUAE emulation of faster processors with cache memory is known to not be entirely accurate as apparently the different times instructions take in all the different situations that exist is really complicated to emulate accurately (more so since there is apparently not enough reliable data on what causes each instruction to run at what speed when).

This could also play a role.

Note that this isn't a dig at WinUAE, it simply is very complicated to make it run cycle exact for the more advanced 68K processors.
roondar is offline  
Old 08 April 2019, 15:48   #6
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
Agreed, by and large WinUAE is doing an amazing job, and to be honest I wouldn't expect it to be able to get this sort of timing 100% It would need such a detailed simulation of the core, pipeline stages and scheduling that it would probably not be possible. Meynaf, I'll try your suggestion and start removing blocks to see which instructions (that should be covered by the chip write) aren't and are adding to the time. I suspect it's all of them.. so I'll start with turning the C2p back into a raw copy and add them back in one at a time.
meeku is offline  
Old 08 April 2019, 15:59   #7
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
Ok, tried removing various blocks of code.. removing anything reduces the time, so it would seem like nothing is latency-hid by the write-pipelining of the chip writes..
meeku is offline  
Old 08 April 2019, 17:30   #8
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
Tried the various C2P's on the 1200, given that they're CPU only 060 optimised.. On there it averages 28fps (+- 35ms) to do the C2P with DMA off.
meeku is offline  
Old 09 April 2019, 16:19   #9
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
Got access to some real 060's tested on real H/W and can confirm that the C2P does run at copy-speed there. So this is definitely an issue in WinUAE, it doesn't seem to respect memory pipelining.
meeku is offline  
Old 09 April 2019, 18:05   #10
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
That is correct. Making (near) cycle accurate emulation is a lot of work and costs a lot of processing power. For a stock A500 it is worth it since the machine+CPU is not that complicated, many applications need cycle accurate emulation to work, and the performance differential between an A500 and a PC makes it feasible to run the detailed simulation at native speed. For newer-generation/faster Amigas and its software none of those three factors hold true - it's not worth the effort compared to just do a fairly accurate chipset emulation and letting the CPU emulation run as quickly as possible. The memory pipelining that you refer to is on the CPU/accelerator board side of things, not on the chipset side, and thus not emulated cycle-exact for accelerated machines.

In practice, you can expect the performance characteristics of an emulated 060 to be off with several hundred % between the real hardware vs the emulated machine - some constructs are slower than others on 060, other constructs are slower than others in JITed x64 code on an Intel CPU.

For performance benchmarking you will need to run on the real HW.
Kalms is offline  
Old 09 April 2019, 18:29   #11
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
Agreed!

On Toni's suggestion I've just tried 4.2 beta, and it definitely helps narrow the gap as it has a new write buffer to chip, allowing the write to terminate early from the cpu's perspective. So this seems to bring them almost in line with expectation. Obviously there will be wild variation between emulation and reality, as you say some things slower, some faster..

I need to find a 1260.. that isn't selling for as much as a car!
meeku is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
WinUAE Performance Issues. kevlarian support.WinUAE 2 20 February 2019 16:33
P-UAE Recommended settings to improve performance and prevent sound issues. qclart support.OtherUAE 2 04 February 2019 12:51
Seemingly random performance issues SLC support.WinUAE 17 02 February 2018 00:39
performance issues in some games?! trydowave support.WinUAE 3 09 January 2011 19:22
Performance issues with Kid Chaos OCS/ECS on Intel Atom Mequa support.WinUAE 0 24 March 2010 13:52

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 07:36.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.08182 seconds with 13 queries