English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 22 May 2024, 17:39   #41
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,394
Quote:
Originally Posted by alexh View Post
There's flat out no way Akiko is going to be worth using on this though.
Karlos is online now  
Old 22 May 2024, 19:19   #42
hitchhikr
Registered User
 
Join Date: Jun 2008
Location: somewhere else
Posts: 523
Quote:
You make 8 writes to it and then you read back from it 8 times.
Not necessarily 8, can be less.

EDIT: i mean, you don't need to read back the register 8 times if you don't use 8 bitplanes.

Last edited by hitchhikr; 22 May 2024 at 19:40.
hitchhikr is offline  
Old 22 May 2024, 20:05   #43
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,394
Quote:
Originally Posted by hitchhikr View Post
Not necessarily 8, can be less.

EDIT: i mean, you don't need to read back the register 8 times if you don't use 8 bitplanes.
Sure, but we are in this case.
Karlos is online now  
Old 22 May 2024, 21:04   #44
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
Quote:
Originally Posted by Karlos View Post
There's flat out no way Akiko is going to be worth using on this though.
sure the 060 has no need for c2p hardware, but, if is not a hinderance then why not use it? the 060 is probably just waiting around for chip ram bus access so it could equally just wait around for akiko and bus access and do even less work
abu_the_monkey is offline  
Old 22 May 2024, 21:19   #45
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,182
Quote:
Originally Posted by abu_the_monkey View Post
sure the 060 has no need for c2p hardware, but, if is not a hinderance then why not use it? the 060 is probably just waiting around for chip ram bus access so it could equally just wait around for akiko and bus access and do even less work
On 060 you can (mostly) overlap all C2P calculations while waiting for the chipmem writes to retire, so you're not just waiting around, instead you're fetching the next data from fast mem to be converted and doing so. For very simple stuff you can more or less render and C2P a complete frame just while waiting for chipmem!


Accessing the Akiko C2P hardware is not free, you need to write and read stuff back, it would need to be essentially free to compete on 060 and very fast on 030/050.
paraj is offline  
Old 22 May 2024, 21:24   #46
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
and I agree

that is why I said as long as it is not a hinderance.

the proof is in the testing, not just in the theory
abu_the_monkey is offline  
Old 22 May 2024, 21:36   #47
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,182
Quote:
Originally Posted by abu_the_monkey View Post
and I agree

that is why I said as long as it is not a hinderance.

the proof is in the testing, not just in the theory
Ah yes, fully agree. Very annoying that even simple, raw numbers (R/W without dma/ints) aren't available. Hopefully this effort will bring them
paraj is offline  
Old 22 May 2024, 21:51   #48
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by abu_the_monkey View Post
sure the 060 has no need for c2p hardware, but, if is not a hinderance then why not use it?
The hindrance is that Akiko has relatively narrow 8-bit input and output registers, and each write and read access requires a full synchronization with the chip clock. For the CPU, it can essentially park four long words in the CPU push buffer and continue working (provided chip mem is marked as "imprecise" by the MMU), and while the CPU keeps working, the push buffer is "retiring" the writes.
Thomas Richter is offline  
Old 22 May 2024, 21:59   #49
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,394
I feel like this is starting to go beyond what I originally intended.

Thinking about it, Photon makes an interesting suggestion: hybrid C2P. *If* the CPU is able to execute instructions while waiting on writes to Akiko and Chip memory, it's not beyond the realms of possibility thay you might be able to craft a routine that uses both to perform C2P on different parts of the whole workload. A task for a hardcore optimisation expert
Karlos is online now  
Old 22 May 2024, 22:16   #50
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
Quote:
Originally Posted by Karlos View Post
I feel like this is starting to go beyond what I originally intended.
you do remember what site you are posting on
abu_the_monkey is offline  
Old 22 May 2024, 22:19   #51
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
Quote:
Originally Posted by Karlos View Post
Thinking about it, Photon makes an interesting suggestion: hybrid C2P. *If* the CPU is able to execute instructions while waiting on writes to Akiko and Chip memory, it's not beyond the realms of possibility thay you might be able to craft a routine that uses both to perform C2P on different parts of the whole workload. A task for a hardcore optimisation expert
this might be something to explore, but, for me I would start with the simplest implementation and see what happens and keep an open mine on possible improvements.
abu_the_monkey is offline  
Old 22 May 2024, 22:20   #52
pipper
Registered User
 
Join Date: Jul 2017
Location: San Jose
Posts: 675
Quote:
It's a single address that you can find via the graphics library. You make 8 writes to it and then you read back from it 8 times.
Yeah, this is like the weirdest/laziest interface they could come up with.
Nothing like a function call "GetAkikoInterface" or anything...
pipper is offline  
Old 22 May 2024, 22:21   #53
Cyprian
Registered User
 
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 192
Quote:
Originally Posted by paraj View Post
From the schematics (https://www.amigawiki.org/doku.php?i...ice:schematics) it does look plausible that it the same access restrictions as chipmem apply, and that you'd be able to do proper 32-bit accesses. Looks like it's clocked at 7Mhz by the looks of it, but I'm not a HW person.

The doom attack source on aminet (http://aminet.net/game/shoot/DoomAttack_src.lha) has c2p routines, and they are very very simple, just write 8 longs to the chip, and read them back. From WinUAE source code I can see that the register in question is located at $b80038.

Would be interesting with measurements of the raw speed, i.e. interrupts and DMA off, and just
Code:
  rept 8
  move.l d0,(a0)
  endr
  rept 8
  move.l (a0),d0
  endr
in a loop as well as variations of the above reading from (chip/fast)/writing to chip.

It would be cool to see figures.
I wonder if accessing Akiko is similar to hardware registers, if I'm not mistaken, 2 cycles of 3.5MHz per access or faster.
Cyprian is offline  
Old 23 May 2024, 00:50   #54
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,867
Quote:
Originally Posted by alexh View Post
Just one 32-bit address : 0x00b8_0038
Oh... so you firstly wrote 8 times DWORD to this address and after this you just read 8 times DWORD from same address?
Strange - i would do 8 registers but perhaps it was idea behind such implementation.

Quote:
Originally Posted by alexh View Post
I don't know. Looking at the CD32 schematic the Akiko must also contain the equivalent of the A1200 Budgie. It's a Zorro II FastRAM address but is it shared with accesses to the CHIP RAM bus? I'm not 100% sure.
Yes, i also checked CD32 schematics and obviously Akiko is accessible from CPU and CHIP by two independent 32 bit buses so technically C2P can be clocked with higher clock, also seem Akiko use CPU clock (made from XORed 7MHz and CDAC) i.e. 14MHz, side to this weirdly to me it is accessible(?) from CPU reserved type space.

Quote:
Originally Posted by alexh View Post
It is, but it is taking place in the CPU data cache at the CPU clock frequency (e.g. 50MHz).
Yes but you need to read data by CPU, perform shuffling (time costly), write data to CHIP or somewhere else.
Let say Read and Write can be same speed as Write and Read to and from Akiko then data shuffling for sure will take more cycles than R/W.
And Akiko C2P HW perform data shuffling immediately as it is hardwired.
pandy71 is offline  
Old 23 May 2024, 12:22   #55
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,448
Quote:
Originally Posted by pandy71 View Post
Quote:
Originally Posted by alexh
Quote:
Originally Posted by pandy71 View Post
My assumption is that C2P on CPU is more than just writing and reading - some additional operations must be performed like shift, mask etc so for 1 pixels more CPU cycles is required.
It is, but it is taking place in the CPU data cache at the CPU clock frequency (e.g. 50MHz).
Yes but you need to read data by CPU, perform shuffling (time costly), write data to CHIP
Yes.

Quote:
Originally Posted by pandy71 View Post
Let say Read and Write can be same speed as Write and Read to and from Akiko
Read from FastRAM into data cache which is the same. Write to ChipRAM which is the same (maybe?). One has a R/W to Akiko. The other has C2P. It's all down to the bandwidth from CPU cache to/from Akiko vs C2P running from processor cache.

Quote:
Originally Posted by pandy71 View Post
then data shuffling for sure will take more cycles than R/W.
I don't think so. Write and read to/from Akiko is at best 14MHz (but probably slower 3.5MHz) whereas the C2P is happening in cache on 030@50MHz. I think that gives C2P ~28 CPU cycles to break even with Akiko @14MHz. (Possibly more if the write to ChipRAM is more efficient)

Last edited by alexh; 23 May 2024 at 12:41.
alexh is offline  
Old 23 May 2024, 14:17   #56
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,636
The chip itself runs at the same clock as the cpu (~14MHz) as far as I can see from the schematic. What it does with this internally, I don't know.

All the bus arbitration stuff in the CD32 is handled by Akiko (i.e. it "controls" access to the rest of the custom chips, not the other way round), so you would need to know the internals of it to determine what the rules are for accessing the C2P register.

I would expect the rules are the same as for fastram, except that the address needs to be marked as non-cacheable.
hooverphonique is offline  
Old 23 May 2024, 18:00   #57
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,650
Quote:
Originally Posted by alexh View Post
Akiko can't "do" anything. It is a slave. You write data into it and then read it back using the CPU.
Right. Sigh, you see some of the info sometimes, and then I forgot. This makes it "half-14MHz-copyspeed-with-some-extra-work-for-the-CPU". The reward is that hopefully the conversion only takes as long as the last write to the Akiko address.

Quote:
Originally Posted by alexh View Post
This is to optimise the software C2P?
IIRC there's at least a way to get close to the write speed of the memory. The caching itself can't improve speed if you do an entire buffer conversion once per frame - you read each source address only once, and write each destination address only once.

Quote:
Originally Posted by alexh View Post
I'm curious to know what this means?
It means that ideally before every write, have the CPU prepared to calculate internally immediately after, from already cached or register data, with instructions that won't have to read from memory, using instructions already in the cache and partially already in the pipeline.

This is just what would be ideal for a CPU that is several times faster than memory. The design of the individual model could deviate from the ideal for many reasons, or already detect write-throughs and defer them to not stall the pipeline.

Anyway, I thought you wrote an address to the Akiko register. Even if it completes a conversion in the time it takes to feed it data, it should assist soft C2P less than the Blitter.

I'm starting to think this extra chip is best used only if stock CD32 is detected. Possibly you could scatter a few move.l (a0),(a1) somewhere in a C2P routine without terrible consequences. Then it could help convert a few pixels per row maybe. But I think only if you get them virtually for free. And only for full 8-bit C2P, since you have to write all 8.
Photon is offline  
Old 24 May 2024, 00:46   #58
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,394
I may have hit a snag. I've written a small test C program that first tries to detect or akiko is present (looks for magic 0xCAFE ident at the hardware address that I've just forgotten having turned off the computer). Turning UAE chipset extra to CD32 results in this detection working as expected and reporting Akiko exists. Reverting to A1200 chipset fails the test and reports no Akiko, as expected, so I'm pretty sure this is fine.

Next, I have a tiny ASM function to write 8 ULONG pointed to by a0 to the hardware address at $B80038 and then read them back to a buffer pointed to by a1. This is just for validation purposes so far and I used assembler to ensure nothing could be optimised away by the compiler here.

However, it seems all I'm getting back is zero (the destination buffer is prefilled with a different value).

As I'm hitting the (virtual) metal directly, I naively assumed that under emulation conditions, this would just work up to this stage. I've tried messing CPU cache on the Amiga side and various UAE settings in the emulator, but so far, no dice.
Karlos is online now  
Old 24 May 2024, 07:01   #59
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
This is the c2p from Adoom, don't know if it works under emulation.

Code:
mc68020
		multipass
	if (_eval(DEBUG)&$8000)
		debug	on,lattice4
	endc

;void __asm c2p_akiko (register __a0 UBYTE *chunky_data,
;                      register __a1 PLANEPTR raster,
;                      register __a2 UBYTE *dirty_list,
;                      register __d1 ULONG plsiz,
;                      register __a5 UBYTE *akiko_address);

; a0 -> width*height chunky pixels in fastmem
; a1 -> contiguous bitplanes in chipmem
; a2 -> dirty list (1-byte flag for whether each 32 pixel "unit" needs updating)
; d1 = width*height/8   (width*height must be a multiple of 32)

	ifeq	depth-8
		xdef	_c2p_8_akiko
_c2p_8_akiko:
	else
	ifeq	depth-6
		xdef	_c2p_6_akiko
_c2p_6_akiko:
	else
		fail	"unsupported depth!"
	endc
	endc

		xref	_GfxBase

		movem.l	a2/a3/a6,-(sp)

		move.l	d1,d0		; plsiz
		lsl.l	#3,d0		; 8*plsiz
		lea	(a0,d0.l),a3	; a3 -> end of chunky data
		sub.l	d1,d0		; d0 = 7*plsiz
	ifle depth-6
		sub.l	d1,d0
		sub.l	d1,d0		; d0 = 5*plsiz if depth=6
	endc

		movem.l	d0/d1/a0/a1,-(sp)
		movea.l	(_GfxBase).l,a6
		jsr	(_LVOOwnBlitter,a6) ; gain exclusive use of Akiko
		movem.l	(sp)+,d0/d1/a0/a1

loop:		tst.b	(a2)+		; does next 32 pixel unit need updating?
		bne.b	c2p		; branch if yes

		adda.w	#32,a0		; skip 32 pixels on input
		addq.l	#4,a1		; skip 32 pixels on output

		cmpa.l	a3,a0
		bne.b	loop
		bra.b	exit		; exit if no changes

c2p:		move.l	(a0)+,(a5)	; write 32 pixels to akiko
		move.l	(a0)+,(a5)
		move.l	(a0)+,(a5)
		move.l	(a0)+,(a5)
		move.l	(a0)+,(a5)
		move.l	(a0)+,(a5)
		move.l	(a0)+,(a5)
		move.l	(a0)+,(a5)

		move.l	(a5),(a1)	; plane 0
		adda.l	d1,a1
		move.l	(a5),(a1)	; plane 1
		adda.l	d1,a1
		move.l	(a5),(a1)	; plane 2
		adda.l	d1,a1
		move.l	(a5),(a1)	; plane 3
		adda.l	d1,a1
		move.l	(a5),(a1)	; plane 4
		adda.l	d1,a1
	ifgt depth-6
		move.l	(a5),(a1)	; plane 5
		adda.l	d1,a1
		move.l	(a5),(a1)	; plane 6
		adda.l	d1,a1
	endc
		move.l	(a5),(a1)+	; last plane

		suba.l	d0,a1		; -7*plsiz (or 5*plsiz) (or 3*plsiz)

		cmpa.l	a3,a0
		bne.b	loop

exit:		jsr	(_LVODisownBlitter,a6) ; free Akiko

		movem.l	(sp)+,a2/a3/a6
		rts
Maybe it helps?
abu_the_monkey is offline  
Old 24 May 2024, 09:10   #60
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,448
https://github.com/tonioni/WinUAE/blob/master/akiko.cpp

Line 300 for the code for the akiko emulation
alexh is offline  
 


Currently Active Users Viewing This Thread: 2 (1 members and 1 guests)
trixster
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
C2P Performance issues meeku Coders. Asm / Hardware 10 09 April 2019 18:29
Alien Breed 3D CD32 - Akiko C2P? wairnair support.Games 9 06 July 2018 14:32
Gloom Akiko C2P? Whitesnake support.Games 5 23 April 2007 19:01
Blizzard 030/50 Accelerators Parsec Amiga scene 20 14 February 2004 17:48
Cd32 Emulator (AKIKO) Doozy support.WinUAE 3 06 December 2001 08:41

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 21:34.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10818 seconds with 16 queries