English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 21 May 2024, 22:52   #21
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,331
Quote:
Originally Posted by hitchhikr View Post
I do have a cd32 with fast ram, i can make some tests if you need, eventually.
That'll be cool, though I suspect the target application isn't going to be fast enough on an 020/14MHz. It might make a bigger relative difference on there than on 030/50. Time will tell, hopefully.
Karlos is online now  
Old 21 May 2024, 23:28   #22
pixie
Registered User
 
pixie's Avatar
 
Join Date: May 2020
Location: Figueira da Foz
Posts: 399
Quote:
Originally Posted by Karlos View Post
Right now, I'm going to be pitting it against Kalm's 030 routines. It seems self evident that the read/write throughput will be the limit for Akiko, but we aren't dealing with large resolution. Like you I have no HW to test on, so I'll be prototyping in UAE.
It could also be interesting to see how Akiko stands against real RTG on the same CPU/ram combo

Last edited by pixie; 22 May 2024 at 09:37.
pixie is offline  
Old 22 May 2024, 01:22   #23
pipper
Registered User
 
Join Date: Jul 2017
Location: San Jose
Posts: 668
In case of AB3D2 you could also implement a C2P path that would use the OS' C2P routines in the hope that on CD32 the OS will use Akiko and on other OS some other function. Back in the day it was very useful to install BlazeWCP to replace the OS routine. IDK if this is still the case with OS3.1.4+
pipper is offline  
Old 22 May 2024, 07:05   #24
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,277
Quote:
Originally Posted by pipper View Post
In case of AB3D2 you could also implement a C2P path that would use the OS' C2P routines in the hope that on CD32 the OS will use Akiko and on other OS some other function. Back in the day it was very useful to install BlazeWCP to replace the OS routine. IDK if this is still the case with OS3.1.4+

The Os 3.1 graphics.library of the CD32 still used Akiko, but in an awkward way by first performing an off-side C2P conversion into a side-buffer, and then using the blitter to blit the buffer to the screen. Needless to say this is extra-slow. 3.1.4 replaced the C2P function to a completely CPU-driven approach which does the conversion and clipping in one single function and is thus faster without Akiko than the old one with it. It is still not ideal, but it is quite ok given the limited ROM footprint the function has, and its generality. P96 has a similar function that is more optimized for the 68020 but otherwise following the same principle. The latest version will be again a bit faster.
Thomas Richter is offline  
Old 22 May 2024, 09:05   #25
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,331
I'm such a one trick pony, everyone just assumes this is for TKG...




It is for TKG.
Karlos is online now  
Old 22 May 2024, 09:34   #26
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,414
Quote:
Originally Posted by derSammler View Post
Note that when the CPU does C2P, processing time is wasted that would otherwise be available to execute code.
Erm ok. Let's see where you're going with this.

Quote:
Originally Posted by derSammler View Post
It seems people often completely ignore that. The DoomAttack figures are a perfect example of that. The CPU is better used to render the 3d graphics. Even if Akiko is slower in doing C2P than the CPU would, you still get better speed in the end.
So your argument is that while the CPU is writing and reading the Akiko registers to perform C2P it could be doing something else? Yeah right.

The whole argument is : is it faster to write/read the Akiko C2P registers (which are uncacheable) or is it faster to use the CPU to perform C2P using the benefits of a data cache?

There are three complex dynamics at work. ChipRAM contention, instruction cache & data cache utilisation.

Can you run while the planar data is being fetched from ChipRAM? Can your code fit in the Instruction cache? Can you optimally use the data cache?

The conclusion I've always read has been when the source is ChipRAM and destination is ChipRAM the winner is Akiko but when the Source is FastRAM and destination is ChipRAM then it's the CPU.

I don't have any evidence.

Maybe we'll find out.... Over to Karlos

Last edited by alexh; 22 May 2024 at 10:52.
alexh is offline  
Old 22 May 2024, 10:19   #27
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,331
Gains, if any, are likely to be marginal. I'm ok with with that, an extra 1fps is significant when you are already barely reaching double digits. What it does mean, however, is that I'm probably looking for a metal banging rather than OS routine solution. We already have one of, if not the best in class 030 C2P routine available and we have direct RTG support for machines that have it. What I am looking for here is anything that might improve performance on an 030 class CD32.

Last edited by Karlos; 22 May 2024 at 12:53.
Karlos is online now  
Old 22 May 2024, 12:43   #28
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,853
Akiko C2P is covered by patent http://www.freepatentsonline.com/5461680.html

Doubt if CPU can be faster to perform 16 pixel C2P conversion at 16R/W cycles (i.e. 1 pixel per clock).
pandy71 is offline  
Old 22 May 2024, 13:23   #29
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,414
Quote:
Originally Posted by pandy71 View Post
Doubt if CPU can be faster to perform 16 pixel C2P conversion at 16R/W cycles (i.e. 1 pixel per clock).
Presumably it depends how long a cycle is? And bus contention?

The Akiko clock speed is going to be 14MHz? 32-bit data width? Can the CPU continuously W/R on every Akiko clock cycle? Or is there bus contention? Can it do anything else? (Presumably for a faster processor, accessing Akiko registers results in processor wait states?)

A CPUs clock speed can be 30-60MHz with FastRAM being the same if the C2P fits in the instruction cache and the operating data fits in the data cache then were looking at FastRAM->ChipRAM copy speed being the limiting factor? (This is what I've always read)

Last edited by alexh; 22 May 2024 at 13:31.
alexh is offline  
Old 22 May 2024, 14:10   #30
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,331
If it's anything like writing to chip RAM you might be able to do other instructions between the writes that can execute while they are ongoing but the truth is, what are you going to do there that's remotely useful?
Karlos is online now  
Old 22 May 2024, 14:31   #31
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,642
Quote:
Originally Posted by Karlos View Post
If it's anything like writing to chip RAM you might be able to do other instructions between the writes that can execute while they are ongoing but the truth is, what are you going to do there that's remotely useful?
I'm imagining some "ideal Akiko usage" test case where Akiko does quite small conversions from fastmem to chipmem, as small as possible without causing much overhead.

Ideally, this would be set up with a 68040, and the destination aligned with the cache lines for the Copyback cache.

For each small Akiko call, first load all registers so that they are primed for purely internal operations. Applications would be fixed and/or floating-point calculations, or better, C2P conversion of a small string of pixels elsewhere in the same buffer.

The Akiko and the CPU operations are ideally sized to end on the same cycle. Write the CPU-converted pixels to memory, and repeat.

This test case could then be compared to the best C2P routines running on the same Hw setup. If there isn't much difference, this is an argument to use Akiko only if you detect an unexpanded CD32.

Last edited by Photon; 22 May 2024 at 14:36.
Photon is offline  
Old 22 May 2024, 15:13   #32
trixster
Guru Meditating
 
Join Date: Jun 2014
Location: England
Posts: 2,347
Some Akiko testing was done when TF330 came out, using DoomAttack, so results on this thread are using a 50mhz 030 in a CD32.

https://www.exxosforum.co.uk/forum/v...t=akiko#p21169
trixster is offline  
Old 22 May 2024, 15:13   #33
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,414
Quote:
Originally Posted by Photon View Post
Akiko does quite small conversions from fastmem to chipmem
Akiko can't "do" anything. It is a slave. You write data into it and then read it back using the CPU.

Quote:
Originally Posted by Photon View Post
Ideally, this would be set up with a 68040, and the destination aligned with the cache lines for the Copyback cache.
This is to optimise the software C2P?

Quote:
Originally Posted by Photon View Post
For each small Akiko call, first load all registers so that they are primed for purely internal operations.
I'm curious to know what this means?
alexh is offline  
Old 22 May 2024, 15:25   #34
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,331
Just as a reminder, the context here is 030+Akiko. As far as I know, there's no physical hardware that has Akiko with an 040 or higher. I need to do typical pal low-res 8-bit C2P and 2/3 size Both sizes are 32 pixel aligned so there should be minimal fuss. I hope. I haven't had time to write any code but it will be along the lines of Super(), smash the 030 specific CACR bits for data cache disable (not icache), convert the pixels, re-enable. I'm not sure yet if it's safe or advisable to stay in supervisor state the whole time but I was thinking to Forbid() during the conversion in any case.
Karlos is online now  
Old 22 May 2024, 15:26   #35
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,331
Also, provide a means of switching between CPU only and Akiko for relative measurements on the same scenes.
Karlos is online now  
Old 22 May 2024, 16:55   #36
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,414
Quote:
Originally Posted by karlos View Post
as far as i know, there's no physical hardware that has akiko with an 040 or higher.
tf360
alexh is offline  
Old 22 May 2024, 17:12   #37
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,853
Quote:
Originally Posted by alexh View Post
Presumably it depends how long a cycle is? And bus contention?

The Akiko clock speed is going to be 14MHz? 32-bit data width? Can the CPU continuously W/R on every Akiko clock cycle? Or is there bus contention? Can it do anything else? (Presumably for a faster processor, accessing Akiko registers results in processor wait states?)
TBH i don't even know where is absolute address for those registers - based on patent seem 8x 32 bit dwords are converted to 8x 32 bit dwords.

My assumption is that registers are accessible from same bus as CHIP RAM i.e. general CHIP access limitations apply to Akiko.

Quote:
Originally Posted by alexh View Post
A CPUs clock speed can be 30-60MHz with FastRAM being the same if the C2P fits in the instruction cache and the operating data fits in the data cache then were looking at FastRAM->ChipRAM copy speed being the limiting factor? (This is what I've always read)
My assumption is that C2P on CPU is more than just writing and reading - some additional operations must be performed like shift, mask etc so for 1 pixels more CPU cycles is required.
pandy71 is offline  
Old 22 May 2024, 17:35   #38
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,414
Quote:
Originally Posted by pandy71 View Post
TBH i don't even know where is absolute address for those registers - based on patent seem 8x 32 bit dwords are converted to 8x 32 bit dwords.
Just one 32-bit address : 0x00b8_0038

Quote:
Originally Posted by pandy71 View Post
My assumption is that registers are accessible from same bus as CHIP RAM i.e. general CHIP access limitations apply to Akiko.
I don't know. Looking at the CD32 schematic the Akiko must also contain the equivalent of the A1200 Budgie. It's a Zorro II FastRAM address but is it shared with accesses to the CHIP RAM bus? I'm not 100% sure.

Quote:
Originally Posted by pandy71 View Post
My assumption is that C2P on CPU is more than just writing and reading - some additional operations must be performed like shift, mask etc so for 1 pixels more CPU cycles is required.
It is, but it is taking place in the CPU data cache at the CPU clock frequency (e.g. 50MHz).

Last edited by alexh; 22 May 2024 at 17:46.
alexh is offline  
Old 22 May 2024, 17:36   #39
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,331
It's a single address that you can find via the graphics library. You make 8 writes to it and then you read back from it 8 times.
Karlos is online now  
Old 22 May 2024, 17:38   #40
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,169
From the schematics (https://www.amigawiki.org/doku.php?i...ice:schematics) it does look plausible that it the same access restrictions as chipmem apply, and that you'd be able to do proper 32-bit accesses. Looks like it's clocked at 7Mhz by the looks of it, but I'm not a HW person.

The doom attack source on aminet (http://aminet.net/game/shoot/DoomAttack_src.lha) has c2p routines, and they are very very simple, just write 8 longs to the chip, and read them back. From WinUAE source code I can see that the register in question is located at $b80038.

Would be interesting with measurements of the raw speed, i.e. interrupts and DMA off, and just
Code:
  rept 8
  move.l d0,(a0)
  endr
  rept 8
  move.l (a0),d0
  endr
in a loop as well as variations of the above reading from (chip/fast)/writing to chip.
paraj is online now  
 


Currently Active Users Viewing This Thread: 2 (1 members and 1 guests)
paraj
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
C2P Performance issues meeku Coders. Asm / Hardware 10 09 April 2019 18:29
Alien Breed 3D CD32 - Akiko C2P? wairnair support.Games 9 06 July 2018 14:32
Gloom Akiko C2P? Whitesnake support.Games 5 23 April 2007 19:01
Blizzard 030/50 Accelerators Parsec Amiga scene 20 14 February 2004 17:48
Cd32 Emulator (AKIKO) Doozy support.WinUAE 3 06 December 2001 08:41

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 17:50.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10541 seconds with 16 queries