24 May 2024, 10:16 | #61 | |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
Quote:
@abu The more interesting thing about that routine is that it has a delta buffer that contains a list of which spans of 32 pixels have changed. It's an interesting idea but unless you are standing still somewhere just admiring the view, I can't imagine that many spans remain the same from one frame to the next. Last edited by Karlos; 24 May 2024 at 10:30. |
|
24 May 2024, 11:49 | #62 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,084
|
yeah, I am not even sure ADoom uses that routine but it was included with the source code.
I did have some luck running DoomAttack in winuae with an expanded cd32 setup (020+8mb fast) with the c2p_akiko2 c2p routine which is below Code:
MACHINE 68020 INCDIR AINCLUDE: INCLUDE exec/libraries.i INCLUDE lvo/exec_lib.i INCLUDE c2p.i ;************************************************************************** MOVEQ #-1,D0 RTS DC.B "C2P",0 DC.L Chunky2Planar DC.L InitChunky DC.L EndChunky DC.L C2PF_VARIABLEHEIGHT|C2PF_VARIABLEWIDTH ;************************************************************************** ;Init routine ;4(sp) Width ;8(sp) Height ;12(sp) PlaneSize ;16(sp) C2PInit InitChunky: move.l a6,-(sp) move.l 4+12(sp),d0 move.l d0,bitplanesize cmp.l #32767,d0 bgt.s .badplanesize sub #4,d0 move d0,patch1 + 2 move d0,patch2 + 2 move d0,patch3 + 2 move.l 4.w,a6 jsr _LVOCacheClearU(a6) move.l 4+16(sp),a0 move.l c2pi_GfxBase(a0),a6 cmp.w #40,LIB_VERSION(a6) blt.s .nogfx40 move.l 508(a6),d0 beq.s .noakiko move.l d0,C2Pp move.l 4+4(sp),d0 move.l 4+8(sp),d1 mulu d0,d1 lsr.l #5,d1 subq #1,d1 move d1,size move.l #1,rc .badplanesize: .noakiko: .nogfx40: move.l (sp)+,a6 move.l rc(pc),d0 rts rc: dc.l 0 ;************************************************************************** ;4(sp) chunky ;8(sp) planes Chunky2Planar: MOVEA.L $4(SP),A0 MOVEA.L $8(SP),A1 ; a0 - chunky ; a1 - bitplanes MOVEM.L D2-D7/A2-A6,-(SP) jsr _chunky2planar return: MOVEM.L (SP)+,D2-D7/A2-A6 RTS NOP EndChunky RTS section c2p,code BPLSIZE equ 8000 _chunky2planar: move.l C2Pp(pc),a2 move.w size(pc),d7 move.l bitplanesize(pc),d1 ;a1 = plane1 lea (a1,d1.w),a3 ;a3 = plane2 lea (a3,d1.w),a4 ;a4 = plane3 lea (a4,d1.w*2),a5 ;a5 = plane5 lea (a5,d1.w*2),a6 ;a6 = plane7 c2pal: move.l (a0)+,(A2) move.l (a0)+,(A2) move.l (a0)+,(A2) move.l (a0)+,(A2) move.l (a0)+,(A2) move.l (a0)+,(A2) move.l (a0)+,(A2) move.l (a0)+,(A2) move.l (a2),(a1)+ ;plane1 move.l (a2),(a3)+ ;plane2 move.l (a2),(a4)+ ;plane3 patch1: move.l (a2),BPLSIZE(a4) ;plane4 move.l (a2),(a5)+ ;plane5 patch2: move.l (a2),BPLSIZE(a5) ;plane6 move.l (a2),(a6)+ ;plane7 patch3: move.l (a2),BPLSIZE(a6) ;plane8 dbf d7,c2pal rts cnop 0,4 C2Pp: dc.l 0 bitplanesize: dc.l 0 size: dc.w 0 |
24 May 2024, 12:21 | #63 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
I think my issue required me restartimg UAE rather than the cold reset. I am now seeing plausible data conversion from my test program.
Code:
Akiko Detected C[0]: 0x80808080 P[0]: 0x0000000F C[1]: 0x40404040 P[1]: 0x000000F0 C[2]: 0x20202020 P[2]: 0x00000F00 C[3]: 0x10101010 P[3]: 0x0000F000 C[4]: 0x08080808 P[4]: 0x000F0000 C[5]: 0x04040404 P[5]: 0x00F00000 C[6]: 0x02020202 P[6]: 0x0F000000 C[7]: 0x01010101 P[7]: 0xF0000000 |
24 May 2024, 12:37 | #64 |
Registered User
Join Date: Oct 2020
Location: Bicester
Posts: 2,084
|
Excellent
|
24 May 2024, 12:51 | #65 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
On a scale of 1-10, how dangerous is this? Assume we have a 68030 and at least have Akiko and the appropriate versions of Exec and Graphics for 3.1
OwnBlitter() Forbid() - to stay on task Disable() - to prevent having to deal with interrupts after we ... SuperState() - to allow direct CACR manipulation C2P: Backup cache control register Disable/freeze 68030 data cache via cache control register Perform Akiko C2P loop Restore cache control register UserState() Enable() Permit() DisownBlitter() Last edited by Karlos; 24 May 2024 at 13:10. |
24 May 2024, 13:04 | #66 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
Assuming the above path is not entirely suicidal or at least not too Russian Roulette, I am curious as to whether or not you could do something like this inside the loop:
Turn the datacache on. Load up 8 registers, maybe via movem.l Turn the datacache off Write the 8 longs to the Akiko one at a time Read and transfer the longs from akiko to the target planes Repeat I haven't yet checked how many cycles are required in this enabling and disabling of the datacahe. |
24 May 2024, 13:20 | #67 |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,613
|
Can't you use MMU to mark Akiko address as non-cachable then no switching on/off cache? Or does that inherently slow down everything?
|
24 May 2024, 13:46 | #68 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
I am not sure TBH. I don't have any CD32 or 030 hardware so this is all hypothetical. According to the 030 mamual. an instrruction like movec d0,cacr looks like 12 cycles when the instruction is in cache. Which is not great, but it's not as bad as I thought it might be. I don't want to assume MMU since we might be dealing with an EC030 part anyway.
|
24 May 2024, 13:48 | #69 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
Maybe the next thing to do is just measure the akiko read/write bandwidth (no other memory accesses) as there seems to be some uncertainty around that.
This would require someone else actually testing it |
24 May 2024, 17:07 | #70 | |
Registered User
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 65
|
Quote:
Akiko writes are 8 cycles. I'm not sure about reads. I used to see 4 cycles reads on the logic analyzer, but maybe because of some bug I introduced. I hope your cache trick works. It's possible to use the mmu or turn off cache. Both options produce equal results for me. I tried to disable the cache(cache dis signal) only when the Akiko was accessed. This failed. |
|
24 May 2024, 18:18 | #71 |
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 196
|
|
24 May 2024, 21:10 | #72 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
Potentially dangerous binary attached. I haven't had much time, so here's what it tries to do:
1. Checks if akiko is found 2. Performs a quick single 32-byte sanity check. This doesn't mess with the cache (still to come) so this may or may not work. 3. Attempts to benchmark the Akiko Read/Write performance. This is 8 from-register writes, followed by 8 to register reads. The end result is reported in bytes/second for the complete round trip The test runs with 100,000 iterations of 32 bytes and while it runs, we are in Forbid() and Disable() for the full duration meaning that there are no task switching and no interrupt servicing. You should run this only from ram disk with nothing else trying to do any kind of disk IO, for safety. Below is from UAE, with JIT disabled and "approximate speed" set. This is just to get a measurable value and shouldn't be taken as remotely quantitative. Code:
Akiko Detected C[0]: 0x80808080 P[0]: 0x0000000F C[1]: 0x40404040 P[1]: 0x000000F0 C[2]: 0x20202020 P[2]: 0x00000F00 C[3]: 0x10101010 P[3]: 0x0000F000 C[4]: 0x08080808 P[4]: 0x000F0000 C[5]: 0x04040404 P[5]: 0x00F00000 C[6]: 0x02020202 P[6]: 0x0F000000 C[7]: 0x01010101 P[7]: 0xF0000000 Benchmarking Akiko Read/Write (reg -> hw -> reg) with 100000 iterations, 32 bytes per iteration... C.Freq: 709379 Hz Begin: 260875852 ticks Finish: 260917182 ticks Elapsed: 41330 ticks, 58 ms Perf: 54924093 bytes/second Last edited by Karlos; 24 May 2024 at 21:15. |
24 May 2024, 21:18 | #73 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,459
|
That is done anyhow if the MMU is on. If it is off, the board likely pulls CIIN indicating that caching should not be allowed, but the 68030 ignores that signal on long word writes.
|
25 May 2024, 07:20 | #74 |
Registered User
Join Date: Jan 2005
Location: Umeå
Age: 44
Posts: 962
|
There is no mirror of this $b80038 register, so 030 caching concerns could be avoided?
|
25 May 2024, 07:39 | #75 |
Registered User
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 65
|
I was wrong. See attached pics.
Clock is 14MHz. |
25 May 2024, 08:16 | #76 | |
Registered User
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 65
|
Quote:
Akiko2.jpg: From the top: No data cache. - Last plane failed for some reason. This is the only time I got that issue. The rest. Same cache setting. - for some reason every other run is slower. Akiko3.jpg: Data cache on. - Last unconverted plane is read back from cache as expected. Data cache off. Last edited by Lunda; 08 July 2024 at 14:46. |
|
25 May 2024, 08:55 | #77 |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,613
|
Has to be something wrong there no? Those numbers seem to big
|
25 May 2024, 09:57 | #78 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
I'll double check the code just to make sure I've not got a numeric overflow or something. It's using the eclock for the timing and 64 bit arithmetic.
Let's sanity check the calculation on the Beast's last run: 107580815 - 107527047 = 53768 ticks 53768 / 709379 = 0.0757958722 seconds 100000 * 32 = 3200000 bytes transformed 3200000/0.0757958722 = 42,218,658 That looks plausible, so I'm going to assume that in an overtired and heavily distracted state, I got the number of bytes or loops incorrect due to a simple coding error. I've probably done something daft and right shift the loop counter by 5 in the ASM code to account for 32 pixels and then forgotten about that in the calling scope. |
25 May 2024, 11:03 | #79 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
The benchmark function doesn't appear to have that issue
Code:
; count in d0 _bench_akiko_rw: movem.l d2/a6,-(sp) move.l d0,d2 move.l _SysBase,a6 jsr _LVOForbid(a6) jsr _LVODisable(a6) ;jsr _LVOSuperState(a6) move.l #$00B80038,a0 .loop: move.l d0,(a0) move.l d0,(a0) move.l d0,(a0) move.l d0,(a0) move.l d0,(a0) move.l d0,(a0) move.l d0,(a0) move.l d0,(a0) move.l (a0),d0 move.l (a0),d0 move.l (a0),d0 move.l (a0),d0 move.l (a0),d0 move.l (a0),d0 move.l (a0),d0 move.l (a0),d0 subq.l #1,d2 bgt.s .loop ;jsr _LVOUserState(a6) jsr _LVOEnable(a6) jsr _LVOPermit(a6) .done: movem.l (sp)+,d2/a6 rts Code:
extern void bench_akiko_rw(REG(d0, ULONG reps)); #define BENCH_INTERATIONS 100000 #define PIXELS_PER_ITERATION 32 int main(void) { if (have_akiko()) { puts("Akiko Detected"); verify_c2p(); if (get_timer()) { printf( "Benchmarking Akiko Read/Write (reg -> hw -> reg) with %d iterations, %d bytes per iteration...\n", BENCH_INTERATIONS, PIXELS_PER_ITERATION ); ULONG freq = ReadEClock(&clk_begin.ecv); bench_akiko_rw(BENCH_INTERATIONS); ReadEClock(&clk_end.ecv); printf("C.Freq: %u Hz\n", freq); printf("Begin: %llu ticks\n", clk_begin.ticks); printf("Finish: %llu ticks\n", clk_end.ticks); ULONG elapsed = (ULONG)(clk_end.ticks - clk_begin.ticks); ULONG elapsed_ms = (elapsed * 1000) / freq; printf("Elapsed: %u ticks, %u ms\n", elapsed, elapsed_ms); ULONG64 dividend = (BENCH_INTERATIONS * PIXELS_PER_ITERATION) * (ULONG64)freq; printf("\nPerf: %u bytes/second\n", (ULONG)(dividend/elapsed)); free_timer(); } } return 0; |
25 May 2024, 13:01 | #80 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,844
|
I'm going to add the cache manipulation code later, but running without datacache enabled should at least guarantee it's not just measuring datacache IO Performance
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
C2P Performance issues | meeku | Coders. Asm / Hardware | 10 | 09 April 2019 18:29 |
Alien Breed 3D CD32 - Akiko C2P? | wairnair | support.Games | 9 | 06 July 2018 14:32 |
Gloom Akiko C2P? | Whitesnake | support.Games | 5 | 23 April 2007 19:01 |
Blizzard 030/50 Accelerators | Parsec | Amiga scene | 20 | 14 February 2004 17:48 |
Cd32 Emulator (AKIKO) | Doozy | support.WinUAE | 3 | 06 December 2001 08:41 |
|
|