22 May 2024, 19:19 | #42 | |
Registered User
Join Date: Jun 2008
Location: somewhere else
Posts: 549
|
Quote:
EDIT: i mean, you don't need to read back the register 8 times if you don't use 8 bitplanes. Last edited by hitchhikr; 22 May 2024 at 19:40. |
|
22 May 2024, 20:05 | #43 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,918
|
|
22 May 2024, 21:04 | #44 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,087
|
sure the 060 has no need for c2p hardware, but, if is not a hinderance then why not use it? the 060 is probably just waiting around for chip ram bus access so it could equally just wait around for akiko and bus access and do even less work
|
22 May 2024, 21:19 | #45 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,309
|
Quote:
Accessing the Akiko C2P hardware is not free, you need to write and read stuff back, it would need to be essentially free to compete on 060 and very fast on 030/050. |
|
22 May 2024, 21:24 | #46 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,087
|
and I agree
that is why I said as long as it is not a hinderance. the proof is in the testing, not just in the theory |
22 May 2024, 21:36 | #47 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,309
|
Ah yes, fully agree. Very annoying that even simple, raw numbers (R/W without dma/ints) aren't available. Hopefully this effort will bring them
|
22 May 2024, 21:51 | #48 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,490
|
The hindrance is that Akiko has relatively narrow 8-bit input and output registers, and each write and read access requires a full synchronization with the chip clock. For the CPU, it can essentially park four long words in the CPU push buffer and continue working (provided chip mem is marked as "imprecise" by the MMU), and while the CPU keeps working, the push buffer is "retiring" the writes.
|
22 May 2024, 21:59 | #49 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,918
|
I feel like this is starting to go beyond what I originally intended.
Thinking about it, Photon makes an interesting suggestion: hybrid C2P. *If* the CPU is able to execute instructions while waiting on writes to Akiko and Chip memory, it's not beyond the realms of possibility thay you might be able to craft a routine that uses both to perform C2P on different parts of the whole workload. A task for a hardcore optimisation expert |
22 May 2024, 22:16 | #50 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,087
|
|
22 May 2024, 22:19 | #51 | |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,087
|
Quote:
|
|
22 May 2024, 22:20 | #52 | |
Registered User
Join Date: Jul 2017
Location: San Jose
Posts: 688
|
Quote:
Nothing like a function call "GetAkikoInterface" or anything... |
|
22 May 2024, 22:21 | #53 | |
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 196
|
Quote:
It would be cool to see figures. I wonder if accessing Akiko is similar to hardware registers, if I'm not mistaken, 2 cycles of 3.5MHz per access or faster. |
|
23 May 2024, 00:50 | #54 | ||
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,919
|
Oh... so you firstly wrote 8 times DWORD to this address and after this you just read 8 times DWORD from same address?
Strange - i would do 8 registers but perhaps it was idea behind such implementation. Quote:
Quote:
Let say Read and Write can be same speed as Write and Read to and from Akiko then data shuffling for sure will take more cycles than R/W. And Akiko C2P HW perform data shuffling immediately as it is hardwired. |
||
23 May 2024, 12:22 | #55 | |||
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,641
|
Quote:
Quote:
I don't think so. Write and read to/from Akiko is at best 14MHz (but probably slower 3.5MHz) whereas the C2P is happening in cache on 030@50MHz. I think that gives C2P ~28 CPU cycles to break even with Akiko @14MHz. (Possibly more if the write to ChipRAM is more efficient) Last edited by alexh; 23 May 2024 at 12:41. |
|||
23 May 2024, 14:17 | #56 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,663
|
The chip itself runs at the same clock as the cpu (~14MHz) as far as I can see from the schematic. What it does with this internally, I don't know.
All the bus arbitration stuff in the CD32 is handled by Akiko (i.e. it "controls" access to the rest of the custom chips, not the other way round), so you would need to know the internals of it to determine what the rules are for accessing the C2P register. I would expect the rules are the same as for fastram, except that the address needs to be marked as non-cacheable. |
23 May 2024, 18:00 | #57 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,758
|
Quote:
IIRC there's at least a way to get close to the write speed of the memory. The caching itself can't improve speed if you do an entire buffer conversion once per frame - you read each source address only once, and write each destination address only once. It means that ideally before every write, have the CPU prepared to calculate internally immediately after, from already cached or register data, with instructions that won't have to read from memory, using instructions already in the cache and partially already in the pipeline. This is just what would be ideal for a CPU that is several times faster than memory. The design of the individual model could deviate from the ideal for many reasons, or already detect write-throughs and defer them to not stall the pipeline. Anyway, I thought you wrote an address to the Akiko register. Even if it completes a conversion in the time it takes to feed it data, it should assist soft C2P less than the Blitter. I'm starting to think this extra chip is best used only if stock CD32 is detected. Possibly you could scatter a few move.l (a0),(a1) somewhere in a C2P routine without terrible consequences. Then it could help convert a few pixels per row maybe. But I think only if you get them virtually for free. And only for full 8-bit C2P, since you have to write all 8. |
|
24 May 2024, 00:46 | #58 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,918
|
I may have hit a snag. I've written a small test C program that first tries to detect or akiko is present (looks for magic 0xCAFE ident at the hardware address that I've just forgotten having turned off the computer). Turning UAE chipset extra to CD32 results in this detection working as expected and reporting Akiko exists. Reverting to A1200 chipset fails the test and reports no Akiko, as expected, so I'm pretty sure this is fine.
Next, I have a tiny ASM function to write 8 ULONG pointed to by a0 to the hardware address at $B80038 and then read them back to a buffer pointed to by a1. This is just for validation purposes so far and I used assembler to ensure nothing could be optimised away by the compiler here. However, it seems all I'm getting back is zero (the destination buffer is prefilled with a different value). As I'm hitting the (virtual) metal directly, I naively assumed that under emulation conditions, this would just work up to this stage. I've tried messing CPU cache on the Amiga side and various UAE settings in the emulator, but so far, no dice. |
24 May 2024, 07:01 | #59 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,087
|
This is the c2p from Adoom, don't know if it works under emulation.
Code:
mc68020 multipass if (_eval(DEBUG)&$8000) debug on,lattice4 endc ;void __asm c2p_akiko (register __a0 UBYTE *chunky_data, ; register __a1 PLANEPTR raster, ; register __a2 UBYTE *dirty_list, ; register __d1 ULONG plsiz, ; register __a5 UBYTE *akiko_address); ; a0 -> width*height chunky pixels in fastmem ; a1 -> contiguous bitplanes in chipmem ; a2 -> dirty list (1-byte flag for whether each 32 pixel "unit" needs updating) ; d1 = width*height/8 (width*height must be a multiple of 32) ifeq depth-8 xdef _c2p_8_akiko _c2p_8_akiko: else ifeq depth-6 xdef _c2p_6_akiko _c2p_6_akiko: else fail "unsupported depth!" endc endc xref _GfxBase movem.l a2/a3/a6,-(sp) move.l d1,d0 ; plsiz lsl.l #3,d0 ; 8*plsiz lea (a0,d0.l),a3 ; a3 -> end of chunky data sub.l d1,d0 ; d0 = 7*plsiz ifle depth-6 sub.l d1,d0 sub.l d1,d0 ; d0 = 5*plsiz if depth=6 endc movem.l d0/d1/a0/a1,-(sp) movea.l (_GfxBase).l,a6 jsr (_LVOOwnBlitter,a6) ; gain exclusive use of Akiko movem.l (sp)+,d0/d1/a0/a1 loop: tst.b (a2)+ ; does next 32 pixel unit need updating? bne.b c2p ; branch if yes adda.w #32,a0 ; skip 32 pixels on input addq.l #4,a1 ; skip 32 pixels on output cmpa.l a3,a0 bne.b loop bra.b exit ; exit if no changes c2p: move.l (a0)+,(a5) ; write 32 pixels to akiko move.l (a0)+,(a5) move.l (a0)+,(a5) move.l (a0)+,(a5) move.l (a0)+,(a5) move.l (a0)+,(a5) move.l (a0)+,(a5) move.l (a0)+,(a5) move.l (a5),(a1) ; plane 0 adda.l d1,a1 move.l (a5),(a1) ; plane 1 adda.l d1,a1 move.l (a5),(a1) ; plane 2 adda.l d1,a1 move.l (a5),(a1) ; plane 3 adda.l d1,a1 move.l (a5),(a1) ; plane 4 adda.l d1,a1 ifgt depth-6 move.l (a5),(a1) ; plane 5 adda.l d1,a1 move.l (a5),(a1) ; plane 6 adda.l d1,a1 endc move.l (a5),(a1)+ ; last plane suba.l d0,a1 ; -7*plsiz (or 5*plsiz) (or 3*plsiz) cmpa.l a3,a0 bne.b loop exit: jsr (_LVODisownBlitter,a6) ; free Akiko movem.l (sp)+,a2/a3/a6 rts |
24 May 2024, 09:10 | #60 |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,641
|
https://github.com/tonioni/WinUAE/blob/master/akiko.cpp
Line 300 for the code for the akiko emulation |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
C2P Performance issues | meeku | Coders. Asm / Hardware | 10 | 09 April 2019 18:29 |
Alien Breed 3D CD32 - Akiko C2P? | wairnair | support.Games | 9 | 06 July 2018 14:32 |
Gloom Akiko C2P? | Whitesnake | support.Games | 5 | 23 April 2007 19:01 |
Blizzard 030/50 Accelerators | Parsec | Amiga scene | 20 | 14 February 2004 17:48 |
Cd32 Emulator (AKIKO) | Doozy | support.WinUAE | 3 | 06 December 2001 08:41 |
|
|