26 May 2024, 16:14 | #101 |
Registered User
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 56
|
|
26 May 2024, 16:14 | #102 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
|
26 May 2024, 16:27 | #103 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
I wonder how running the cpu asynchronous (like most accelerators do) would affect the result.
still such a missed opportunity, not having akiko DMA to chip ram |
26 May 2024, 16:57 | #104 | |
Registered User
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 56
|
Quote:
DMA without fast RAM doesn't improve much though. DOOM on CD32 would have been good enough. Remember that back then 160 x 200 15 fps was considered a great doom port. |
|
26 May 2024, 18:50 | #105 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,213
|
Great! 3 cycles is an odd (no pun intended) number of cycles for the accesses to take though.
Time for phase 2 Does 030 actually benefit from "burst reads", filling complete cache line or something like that? Otherwise it might be better (and easier) to just keep caches disabled during C2P as the cache would be trashed anyway. |
26 May 2024, 19:40 | #106 |
Registered User
Join Date: Jul 2017
Location: San Jose
Posts: 676
|
Does the test also do the necessary chipmem writes? If not, it’s not a realistic scenario.
One opportunity could be to see if Akiko access and chipmem writes can somehow be scheduled in a clever way(?) |
26 May 2024, 19:43 | #107 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
No not yet. I was curious about the, pardon the pun, bit by bit breakdown. I'll add that next. My thoughts are that that maybe there's a hacky, cache slapping way to improve it
|
26 May 2024, 19:59 | #108 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,213
|
It would be very interesting to get numbers from 020 with and without fast ram (if possible) for this simplified test. I'm almost willing to bet access time is going to be an even number of 14Mhz cycles. In principle the chip write (in best case) is just going to add 8 more 14Mhz cycles (2*CCK), so 14 in total per long word with this config.
Seems like (again my math is probably off) akiko is a win if C2P can't be done in 8*6*(50/14) ~171 cycles (at 50MHz). Lots of effects make it more complicated (included what can and cannot overlap), but you need to read from (maybe fast) RAM and write to chip in either case. |
26 May 2024, 20:14 | #109 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
I'm doubtful we'll come across some hitherto unknown speedup but it's fun to poke about.
|
26 May 2024, 20:26 | #110 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,213
|
Definitely! And seeing real numbers of the low level stuff is very interesting (instead of FPS from various games).
|
26 May 2024, 21:56 | #111 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
I wonder if it would be better to use akiko for the c2p but to fast ram and then copy to chip.
|
26 May 2024, 22:11 | #112 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
|
26 May 2024, 22:19 | #113 | |
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 195
|
nice investigation
thanks to you we now know that Akiko is much better than we thought, even on accelerated machine. Quote:
|
|
26 May 2024, 22:26 | #114 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
|
27 May 2024, 14:23 | #115 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
On an 030, If I'm doing a write to chip ram via an address pointer and I want to add an offset to the pointer immediately afterwards (e.g. calculating the next plane to write to), is the cost of that operation fully masked by the pending write? How many cycles should I expect to be able to execute while the write is happening, assuming operations that aren't doing any data memory accesses?
|
27 May 2024, 14:50 | #116 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Nearly every register-only instruction seems to 'pipeline' well, except iterative instructions such as mul & div which stall like memory accesses. For 50Mhz 030 : at least 24, usually 26. Experiments have shown exact number isn't easy to predict. |
|
27 May 2024, 15:07 | #117 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
How about using movem to transfer a number of registers worth of data from a source buffer? Or is it better to just use separate moves? Thinking about instruction cache size here.
|
27 May 2024, 15:37 | #118 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
But cache size isn't an issue here as the loop appears to be very small. |
|
27 May 2024, 15:39 | #119 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
|
27 May 2024, 16:16 | #120 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,469
|
I have just pushed an update to the branch that contains the most naive implementation possible as a test case. The lha file contains the updated binary.
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
C2P Performance issues | meeku | Coders. Asm / Hardware | 10 | 09 April 2019 18:29 |
Alien Breed 3D CD32 - Akiko C2P? | wairnair | support.Games | 9 | 06 July 2018 14:32 |
Gloom Akiko C2P? | Whitesnake | support.Games | 5 | 23 April 2007 19:01 |
Blizzard 030/50 Accelerators | Parsec | Amiga scene | 20 | 14 February 2004 17:48 |
Cd32 Emulator (AKIKO) | Doozy | support.WinUAE | 3 | 06 December 2001 08:41 |
|
|