![]() |
![]() |
#201 | |
Registered User
Join Date: May 2018
Location: Ireland
Posts: 692
|
Quote:
|
|
![]() |
![]() |
#202 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
|
![]() |
![]() |
#203 | |
Registered User
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 56
|
Quote:
edit: I found the reason after reading all new posts. Last edited by Lunda; 01 June 2024 at 10:06. Reason: new info |
|
![]() |
![]() |
#204 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
Well it's fair to say, disabled write allocation fixes the Akiko read back problem but the routine is clearly far behind the software C2P on this machine. It beats it by a clear 25%
It certainly seems to be pure IO bandwidth limitation, i.e. the accepted wisdom. If the chip RAM writes were faster, the simplicity would theoretically allow it to beat the software conversion, since it hides the ALU effort behind the slow writes. We can test that actually by just doing C2P from fast to fast. |
![]() |
![]() |
#205 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
it would still be nice to know where the crossover point is between using Akiko vs CPU.
Is it an 030@50mhz or faster? |
![]() |
![]() |
#206 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
|
![]() |
![]() |
#207 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
I guess Akiko will still perform the same?
|
![]() |
![]() |
#208 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
I think it'll take the same number of cycles but the cycles will be longer. The total chip ram delay is the only real invariant. So they might just both end up converging to the same speed, limited by chip write bandwidth.
|
![]() |
![]() |
#209 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,217
|
Unfortunately it looks to me like it's just not going to be worth it on 030 unless "normal" accelerator cards behave radically different or some wizards comes up with a serious improvement to the instruction scheduling.
The time for "Naive (WA)" is still very close to "Null C2P + Akiko Limit (WA)", and Kalms - Null C2P is only 45314 ticks, so assuming just that part scales linearly with clock frequency, it'd start being faster at around 25MHz.. |
![]() |
![]() |
#210 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
Not having DMA output to chip ram. What a missed opportunity. It's not as if there's much that runs on 020/14 + Fast that can use C2P that isn't just faster using chunky copper screen tricks, so it was only ever going to be truly useful with a faster CPU in the first place.
I know it was "for free", but it's also a bit of a chocolate teapot without being able to get the data out of it faster. |
![]() |
![]() |
#211 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
yep.
still, it would be nice to have the numbers from a range of setups. at least then it can be put to bed once and for all. |
![]() |
![]() |
#212 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
I think the bus is maxed out when talking to Akiko. If it's doing a transfer every 3 cycles and the bus is 14 MHz, that's 4*14/3 = 18.67 MB/s
The conversion does 9MB/s, but considering it's a write and read workload, that's your 18MB/s nommed up. |
![]() |
![]() |
#213 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
random thunk.
Code:
; ############################################################################# movem.l d1-d7/a2/a3/a6,-(sp) ; back up the inputs move.l a0,a2 move.l a1,a3 move.l _SysBase,a6 jsr _LVOForbid(a6) jsr _LVODisable(a6) move.l #$00B80038,a0 move.w #2559-1,d0;was #2560-1 now an extra 1 less cos the last write falls through move.l a3,a1 ; a0 akiko ; a2 source ; a3 ; ############################################################################# move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) ; ############################################################################# .loop: ; write plane 0 move.l (a0),(a1) add.w #10240,a1 move.l (a0),d1 move.l (a0),d2 move.l (a0),d3 move.l (a0),d4 move.l (a0),d5 move.l (a0),d6 move.l (a0),d7 move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l (a2)+,(a0) move.l d1,(a1) add.w #10240,a1 move.l d2,(a1) add.w #10240,a1 move.l d3,(a1) add.w #10240,a1 move.l d4,(a1) add.w #10240,a1 move.l d5,(a1) add.w #10240,a1 move.l d6,(a1) add.w #10240,a1 add.w #4,a3 move.l d7,(a1) add.w #10240,a1 move.l a3,a1 dbra d0,.loop ; ############################################################################# move.l (a0),(a1) add.w #10240,a1 move.l (a0),(a1) add.w #10240,a1 move.l (a0),(a1) add.w #10240,a1 move.l (a0),(a1) add.w #10240,a1 move.l (a0),(a1) add.w #10240,a1 move.l (a0),(a1) add.w #10240,a1 move.l (a0),(a1) add.w #10240,a1 move.l (a0),(a1) add.w #10240,a1 move.l a3,a1 ; ############################################################################# jsr _LVOEnable(a6) jsr _LVOPermit(a6) movem.l (sp)+,d1-d7/a2/a3/a6 rts ; ############################################################################# |
![]() |
![]() |
#214 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 854
|
Would it be beneficial for some specs to do every other decode with Akiko and then cpu? You would have to interleave all 16 Akiko reads and writes in-between the cpu c2p (i.e. not blindly do one and then the other but both at the same time).
|
![]() |
![]() |
#215 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
@abu_the _monkey
Try it. The only things to say are you don't need Forbid/Permit since Disable achieves the same thing regardless. You will probably want to disable write allocate before talking to Akiko too. The latest code does this but it's basically identical to what paraj posted a bit earlier. |
![]() |
![]() |
#216 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,317
|
Hardly. Akiko is synchronous, and the slow part is attempting to read from its registers as the CPU needs to wait for the relatively slow chip bus. For a conversion from fast mem to chip mem, the CPU does not need to wait for anything - it can retire chip bus accesses in its push buffer while continuing to work. That does not help for Akiko,
|
![]() |
![]() |
#217 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
![]()
It has been fun but I think we've pretty effectively demonstrated the common wisdom. It's all in the bus, you just can't move the data around fast enough to beat code able to execute on the CPU behind pending writes.
|
![]() |
![]() |
#218 | |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
Quote:
yes, but where is the point/speed where it becomes better to use the cpu is something I really wanted to know. |
|
![]() |
![]() |
#219 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,480
|
I don't know - isn't the bus logic busy servicing the chip ram write? I don't think you can just go and do a read from somewhere else (unless in cache I suppose) while you are waiting for it.
This isn't my area of expertise mind. Last edited by Karlos; 01 June 2024 at 22:49. |
![]() |
![]() |
#220 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,022
|
nor mine, just a thought that popped in my tired noggin
![]() |
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
C2P Performance issues | meeku | Coders. Asm / Hardware | 10 | 09 April 2019 18:29 |
Alien Breed 3D CD32 - Akiko C2P? | wairnair | support.Games | 9 | 06 July 2018 14:32 |
Gloom Akiko C2P? | Whitesnake | support.Games | 5 | 23 April 2007 19:01 |
Blizzard 030/50 Accelerators | Parsec | Amiga scene | 20 | 14 February 2004 17:48 |
Cd32 Emulator (AKIKO) | Doozy | support.WinUAE | 3 | 06 December 2001 08:41 |
|
|