21 May 2024, 22:52 | #21 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,843
|
That'll be cool, though I suspect the target application isn't going to be fast enough on an 020/14MHz. It might make a bigger relative difference on there than on 030/50. Time will tell, hopefully.
|
21 May 2024, 23:28 | #22 |
Registered User
Join Date: May 2020
Location: Figueira da Foz
Posts: 454
|
It could also be interesting to see how Akiko stands against real RTG on the same CPU/ram combo
Last edited by pixie; 22 May 2024 at 09:37. |
22 May 2024, 01:22 | #23 |
Registered User
Join Date: Jul 2017
Location: San Jose
Posts: 686
|
In case of AB3D2 you could also implement a C2P path that would use the OS' C2P routines in the hope that on CD32 the OS will use Akiko and on other OS some other function. Back in the day it was very useful to install BlazeWCP to replace the OS routine. IDK if this is still the case with OS3.1.4+
|
22 May 2024, 07:05 | #24 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,432
|
Quote:
The Os 3.1 graphics.library of the CD32 still used Akiko, but in an awkward way by first performing an off-side C2P conversion into a side-buffer, and then using the blitter to blit the buffer to the screen. Needless to say this is extra-slow. 3.1.4 replaced the C2P function to a completely CPU-driven approach which does the conversion and clipping in one single function and is thus faster without Akiko than the old one with it. It is still not ideal, but it is quite ok given the limited ROM footprint the function has, and its generality. P96 has a similar function that is more optimized for the 68020 but otherwise following the same principle. The latest version will be again a bit faster. |
|
22 May 2024, 09:05 | #25 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,843
|
I'm such a one trick pony, everyone just assumes this is for TKG...
It is for TKG. |
22 May 2024, 09:34 | #26 | ||
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,599
|
Quote:
Quote:
The whole argument is : is it faster to write/read the Akiko C2P registers (which are uncacheable) or is it faster to use the CPU to perform C2P using the benefits of a data cache? There are three complex dynamics at work. ChipRAM contention, instruction cache & data cache utilisation. Can you run while the planar data is being fetched from ChipRAM? Can your code fit in the Instruction cache? Can you optimally use the data cache? The conclusion I've always read has been when the source is ChipRAM and destination is ChipRAM the winner is Akiko but when the Source is FastRAM and destination is ChipRAM then it's the CPU. I don't have any evidence. Maybe we'll find out.... Over to Karlos Last edited by alexh; 22 May 2024 at 10:52. |
||
22 May 2024, 10:19 | #27 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,843
|
Gains, if any, are likely to be marginal. I'm ok with with that, an extra 1fps is significant when you are already barely reaching double digits. What it does mean, however, is that I'm probably looking for a metal banging rather than OS routine solution. We already have one of, if not the best in class 030 C2P routine available and we have direct RTG support for machines that have it. What I am looking for here is anything that might improve performance on an 030 class CD32.
Last edited by Karlos; 22 May 2024 at 12:53. |
22 May 2024, 12:43 | #28 |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,916
|
Akiko C2P is covered by patent http://www.freepatentsonline.com/5461680.html
Doubt if CPU can be faster to perform 16 pixel C2P conversion at 16R/W cycles (i.e. 1 pixel per clock). |
22 May 2024, 13:23 | #29 | |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,599
|
Quote:
The Akiko clock speed is going to be 14MHz? 32-bit data width? Can the CPU continuously W/R on every Akiko clock cycle? Or is there bus contention? Can it do anything else? (Presumably for a faster processor, accessing Akiko registers results in processor wait states?) A CPUs clock speed can be 30-60MHz with FastRAM being the same if the C2P fits in the instruction cache and the operating data fits in the data cache then were looking at FastRAM->ChipRAM copy speed being the limiting factor? (This is what I've always read) Last edited by alexh; 22 May 2024 at 13:31. |
|
22 May 2024, 14:10 | #30 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,843
|
If it's anything like writing to chip RAM you might be able to do other instructions between the writes that can execute while they are ongoing but the truth is, what are you going to do there that's remotely useful?
|
22 May 2024, 14:31 | #31 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,753
|
Quote:
Ideally, this would be set up with a 68040, and the destination aligned with the cache lines for the Copyback cache. For each small Akiko call, first load all registers so that they are primed for purely internal operations. Applications would be fixed and/or floating-point calculations, or better, C2P conversion of a small string of pixels elsewhere in the same buffer. The Akiko and the CPU operations are ideally sized to end on the same cycle. Write the CPU-converted pixels to memory, and repeat. This test case could then be compared to the best C2P routines running on the same Hw setup. If there isn't much difference, this is an argument to use Akiko only if you detect an unexpanded CD32. Last edited by Photon; 22 May 2024 at 14:36. |
|
22 May 2024, 15:13 | #32 |
Guru Meditating
Join Date: Jun 2014
Location: England
Posts: 2,365
|
Some Akiko testing was done when TF330 came out, using DoomAttack, so results on this thread are using a 50mhz 030 in a CD32.
https://www.exxosforum.co.uk/forum/v...t=akiko#p21169 |
22 May 2024, 15:13 | #33 | |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,599
|
Akiko can't "do" anything. It is a slave. You write data into it and then read it back using the CPU.
Quote:
I'm curious to know what this means? |
|
22 May 2024, 15:25 | #34 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,843
|
Just as a reminder, the context here is 030+Akiko. As far as I know, there's no physical hardware that has Akiko with an 040 or higher. I need to do typical pal low-res 8-bit C2P and 2/3 size Both sizes are 32 pixel aligned so there should be minimal fuss. I hope. I haven't had time to write any code but it will be along the lines of Super(), smash the 030 specific CACR bits for data cache disable (not icache), convert the pixels, re-enable. I'm not sure yet if it's safe or advisable to stay in supervisor state the whole time but I was thinking to Forbid() during the conversion in any case.
|
22 May 2024, 15:26 | #35 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,843
|
Also, provide a means of switching between CPU only and Akiko for relative measurements on the same scenes.
|
22 May 2024, 17:12 | #37 | |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,916
|
Quote:
My assumption is that registers are accessible from same bus as CHIP RAM i.e. general CHIP access limitations apply to Akiko. My assumption is that C2P on CPU is more than just writing and reading - some additional operations must be performed like shift, mask etc so for 1 pixels more CPU cycles is required. |
|
22 May 2024, 17:35 | #38 | ||
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,599
|
Quote:
Quote:
It is, but it is taking place in the CPU data cache at the CPU clock frequency (e.g. 50MHz). Last edited by alexh; 22 May 2024 at 17:46. |
||
22 May 2024, 17:36 | #39 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,843
|
It's a single address that you can find via the graphics library. You make 8 writes to it and then you read back from it 8 times.
|
22 May 2024, 17:38 | #40 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,286
|
From the schematics (https://www.amigawiki.org/doku.php?i...ice:schematics) it does look plausible that it the same access restrictions as chipmem apply, and that you'd be able to do proper 32-bit accesses. Looks like it's clocked at 7Mhz by the looks of it, but I'm not a HW person.
The doom attack source on aminet (http://aminet.net/game/shoot/DoomAttack_src.lha) has c2p routines, and they are very very simple, just write 8 longs to the chip, and read them back. From WinUAE source code I can see that the register in question is located at $b80038. Would be interesting with measurements of the raw speed, i.e. interrupts and DMA off, and just Code:
rept 8 move.l d0,(a0) endr rept 8 move.l (a0),d0 endr |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
C2P Performance issues | meeku | Coders. Asm / Hardware | 10 | 09 April 2019 18:29 |
Alien Breed 3D CD32 - Akiko C2P? | wairnair | support.Games | 9 | 06 July 2018 14:32 |
Gloom Akiko C2P? | Whitesnake | support.Games | 5 | 23 April 2007 19:01 |
Blizzard 030/50 Accelerators | Parsec | Amiga scene | 20 | 14 February 2004 17:48 |
Cd32 Emulator (AKIKO) | Doozy | support.WinUAE | 3 | 06 December 2001 08:41 |
|
|