English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 24 May 2024, 10:16   #61
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
Quote:
Originally Posted by alexh View Post
https://github.com/tonioni/WinUAE/blob/master/akiko.cpp

Line 300 for the code for the akiko emulation
As I said, I'm writing 8 longs to this location, then reading back, but my reads seem to be zero. I'll take a closer look later.

@abu

The more interesting thing about that routine is that it has a delta buffer that contains a list of which spans of 32 pixels have changed. It's an interesting idea but unless you are standing still somewhere just admiring the view, I can't imagine that many spans remain the same from one frame to the next.

Last edited by Karlos; 24 May 2024 at 10:30.
Karlos is online now  
Old 24 May 2024, 11:49   #62
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
yeah, I am not even sure ADoom uses that routine but it was included with the source code.

I did have some luck running DoomAttack in winuae with an expanded cd32 setup (020+8mb fast) with the c2p_akiko2 c2p routine which is below

Code:
					MACHINE 68020

					INCDIR AINCLUDE:

					INCLUDE exec/libraries.i
					INCLUDE lvo/exec_lib.i
					INCLUDE c2p.i

;**************************************************************************

					MOVEQ	#-1,D0
                    RTS

                    DC.B           "C2P",0
                    DC.L           Chunky2Planar
                    DC.L           InitChunky
                    DC.L           EndChunky
                    DC.L		   C2PF_VARIABLEHEIGHT|C2PF_VARIABLEWIDTH

;**************************************************************************

					;Init routine
					;4(sp) Width
					;8(sp) Height
					;12(sp) PlaneSize
					;16(sp) C2PInit 

InitChunky:
					move.l	a6,-(sp)

					move.l	4+12(sp),d0
					move.l	d0,bitplanesize
					cmp.l	#32767,d0
					bgt.s	.badplanesize
					
					sub		#4,d0
					move	d0,patch1 + 2
					move	d0,patch2 + 2
					move	d0,patch3 + 2
					
					move.l	4.w,a6
					jsr		_LVOCacheClearU(a6)

					move.l	4+16(sp),a0
					move.l	c2pi_GfxBase(a0),a6
					
					cmp.w	#40,LIB_VERSION(a6)
					blt.s	.nogfx40
					
					move.l	508(a6),d0
					beq.s	.noakiko

					move.l	d0,C2Pp

					move.l	4+4(sp),d0
					move.l	4+8(sp),d1
					mulu	d0,d1
					lsr.l	#5,d1
					subq	#1,d1
					move	d1,size
					
					move.l	#1,rc

.badplanesize:
.noakiko:
.nogfx40:			move.l	(sp)+,a6

					move.l	rc(pc),d0
					rts

rc:					dc.l	0

;**************************************************************************

					;4(sp) chunky
					;8(sp) planes

Chunky2Planar:      MOVEA.L        $4(SP),A0
                    MOVEA.L        $8(SP),A1

					; a0 - chunky
					; a1 - bitplanes

                    MOVEM.L        D2-D7/A2-A6,-(SP)

					jsr		_chunky2planar


return:             MOVEM.L        (SP)+,D2-D7/A2-A6
                    RTS

                    NOP
EndChunky           RTS

	section	c2p,code

BPLSIZE equ 8000

_chunky2planar:		move.l	C2Pp(pc),a2
					move.w	size(pc),d7
					
					move.l	bitplanesize(pc),d1
											;a1 = plane1
					lea		(a1,d1.w),a3	;a3 = plane2
					lea		(a3,d1.w),a4	;a4 = plane3
					lea		(a4,d1.w*2),a5	;a5 = plane5
					lea		(a5,d1.w*2),a6	;a6 = plane7
					
c2pal:
					move.l	(a0)+,(A2)
					move.l	(a0)+,(A2)
					move.l	(a0)+,(A2)
					move.l	(a0)+,(A2)
					move.l	(a0)+,(A2)
					move.l	(a0)+,(A2)
					move.l	(a0)+,(A2)
					move.l	(a0)+,(A2)

					move.l	(a2),(a1)+				;plane1
					move.l	(a2),(a3)+				;plane2
					move.l	(a2),(a4)+				;plane3
patch1:				move.l	(a2),BPLSIZE(a4)		;plane4
					move.l	(a2),(a5)+				;plane5
patch2:				move.l	(a2),BPLSIZE(a5)		;plane6
					move.l	(a2),(a6)+				;plane7
patch3:				move.l	(a2),BPLSIZE(a6)		;plane8
					dbf	d7,c2pal
					rts

					cnop	0,4

C2Pp:				dc.l	0
bitplanesize:		dc.l	0
size:				dc.w	0
abu_the_monkey is offline  
Old 24 May 2024, 12:21   #63
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
I think my issue required me restartimg UAE rather than the cold reset. I am now seeing plausible data conversion from my test program.
Code:
Akiko Detected
C[0]: 0x80808080 P[0]: 0x0000000F
C[1]: 0x40404040 P[1]: 0x000000F0
C[2]: 0x20202020 P[2]: 0x00000F00
C[3]: 0x10101010 P[3]: 0x0000F000
C[4]: 0x08080808 P[4]: 0x000F0000
C[5]: 0x04040404 P[5]: 0x00F00000
C[6]: 0x02020202 P[6]: 0x0F000000
C[7]: 0x01010101 P[7]: 0xF0000000
Karlos is online now  
Old 24 May 2024, 12:37   #64
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
Excellent
abu_the_monkey is offline  
Old 24 May 2024, 12:51   #65
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
On a scale of 1-10, how dangerous is this? Assume we have a 68030 and at least have Akiko and the appropriate versions of Exec and Graphics for 3.1

OwnBlitter()
Forbid() - to stay on task
Disable() - to prevent having to deal with interrupts after we ...
SuperState() - to allow direct CACR manipulation

C2P:
Backup cache control register
Disable/freeze 68030 data cache via cache control register
Perform Akiko C2P loop
Restore cache control register

UserState()
Enable()
Permit()
DisownBlitter()

Last edited by Karlos; 24 May 2024 at 13:10.
Karlos is online now  
Old 24 May 2024, 13:04   #66
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
Assuming the above path is not entirely suicidal or at least not too Russian Roulette, I am curious as to whether or not you could do something like this inside the loop:

Turn the datacache on.
Load up 8 registers, maybe via movem.l
Turn the datacache off
Write the 8 longs to the Akiko one at a time
Read and transfer the longs from akiko to the target planes
Repeat

I haven't yet checked how many cycles are required in this enabling and disabling of the datacahe.
Karlos is online now  
Old 24 May 2024, 13:20   #67
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,448
Can't you use MMU to mark Akiko address as non-cachable then no switching on/off cache? Or does that inherently slow down everything?
alexh is offline  
Old 24 May 2024, 13:46   #68
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
Quote:
Originally Posted by alexh View Post
Can't you use MMU to mark Akiko address as non-cachable then no switching on/off cache? Or does that inherently slow down everything?
I am not sure TBH. I don't have any CD32 or 030 hardware so this is all hypothetical. According to the 030 mamual. an instrruction like movec d0,cacr looks like 12 cycles when the instruction is in cache. Which is not great, but it's not as bad as I thought it might be. I don't want to assume MMU since we might be dealing with an EC030 part anyway.
Karlos is online now  
Old 24 May 2024, 13:48   #69
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
Maybe the next thing to do is just measure the akiko read/write bandwidth (no other memory accesses) as there seems to be some uncertainty around that.

This would require someone else actually testing it
Karlos is online now  
Old 24 May 2024, 17:07   #70
Lunda
Registered User
 
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 56
Quote:
Originally Posted by Karlos View Post
Maybe the next thing to do is just measure the akiko read/write bandwidth (no other memory accesses) as there seems to be some uncertainty around that.

This would require someone else actually testing it
I want to test it!

Akiko writes are 8 cycles.
I'm not sure about reads. I used to see 4 cycles reads on the logic analyzer, but maybe because of some bug I introduced.

I hope your cache trick works.

It's possible to use the mmu or turn off cache. Both options produce equal results for me.
I tried to disable the cache(cache dis signal) only when the Akiko was accessed. This failed.
Lunda is offline  
Old 24 May 2024, 18:18   #71
Cyprian
Registered User
 
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 192
Quote:
Originally Posted by Lunda View Post
I want to test it!

Akiko writes are 8 cycles.
I'm not sure about reads. I used to see 4 cycles reads on the logic analyzer, but maybe because of some bug I introduced.
8 and 4 cycles of 14Mhz?
Cyprian is offline  
Old 24 May 2024, 21:10   #72
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
Potentially dangerous binary attached. I haven't had much time, so here's what it tries to do:

1. Checks if akiko is found
2. Performs a quick single 32-byte sanity check. This doesn't mess with the cache (still to come) so this may or may not work.
3. Attempts to benchmark the Akiko Read/Write performance. This is 8 from-register writes, followed by 8 to register reads. The end result is reported in bytes/second for the complete round trip

The test runs with 100,000 iterations of 32 bytes and while it runs, we are in Forbid() and Disable() for the full duration meaning that there are no task switching and no interrupt servicing.

You should run this only from ram disk with nothing else trying to do any kind of disk IO, for safety.

Below is from UAE, with JIT disabled and "approximate speed" set. This is just to get a measurable value and shouldn't be taken as remotely quantitative.

Code:
Akiko Detected
C[0]: 0x80808080 P[0]: 0x0000000F
C[1]: 0x40404040 P[1]: 0x000000F0
C[2]: 0x20202020 P[2]: 0x00000F00
C[3]: 0x10101010 P[3]: 0x0000F000
C[4]: 0x08080808 P[4]: 0x000F0000
C[5]: 0x04040404 P[5]: 0x00F00000
C[6]: 0x02020202 P[6]: 0x0F000000
C[7]: 0x01010101 P[7]: 0xF0000000
Benchmarking Akiko Read/Write (reg -> hw -> reg) with 100000 iterations, 32 bytes per iteration...
C.Freq:  709379 Hz
Begin:   260875852 ticks
Finish:  260917182 ticks
Elapsed: 41330 ticks, 58 ms

Perf:    54924093 bytes/second
The code is far from best practise - it doesn't own or wait for the blitter on the assumption that nobody else is actually using akiko for anything. Due to laziness the time also includes the time taken to Forbid()/Disable()/Enable()/Permit() (for reasons unknown, every attempt to call this from just the loop tended to freeze and I couldn't be bothered getting to the bottom of it).
Attached Files
File Type: lha akiko.lha (7.2 KB, 7 views)

Last edited by Karlos; 24 May 2024 at 21:15.
Karlos is online now  
Old 24 May 2024, 21:18   #73
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by alexh View Post
Can't you use MMU to mark Akiko address as non-cachable then no switching on/off cache? Or does that inherently slow down everything?
That is done anyhow if the MMU is on. If it is off, the board likely pulls CIIN indicating that caching should not be allowed, but the 68030 ignores that signal on long word writes.
Thomas Richter is offline  
Old 25 May 2024, 07:20   #74
patrik
Registered User
 
patrik's Avatar
 
Join Date: Jan 2005
Location: Umeå
Age: 43
Posts: 933
There is no mirror of this $b80038 register, so 030 caching concerns could be avoided?
patrik is offline  
Old 25 May 2024, 07:39   #75
Lunda
Registered User
 
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 56
Quote:
Originally Posted by Cyprian View Post
8 and 4 cycles of 14Mhz?
I was wrong. See attached pics.

Clock is 14MHz.
Attached Thumbnails
Click image for larger version

Name:	AkikoDoomAttack.png
Views:	62
Size:	586.6 KB
ID:	82279   Click image for larger version

Name:	FastReadAkikoWrite.png
Views:	43
Size:	457.5 KB
ID:	82280   Click image for larger version

Name:	AkikoReadChipWrite.png
Views:	38
Size:	442.8 KB
ID:	82281  
Lunda is offline  
Old 25 May 2024, 08:16   #76
Lunda
Registered User
 
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 56
Quote:
Originally Posted by Karlos View Post
Potentially dangerous binary attached. I haven't had much time, so here's what it tries to do:

1. Checks if akiko is found
2. Performs a quick single 32-byte sanity check. This doesn't mess with the cache (still to come) so this may or may not work.
3. Attempts to benchmark the Akiko Read/Write performance. This is 8 from-register writes, followed by 8 to register reads. The end result is reported in bytes/second for the complete round trip

The test runs with 100,000 iterations of 32 bytes and while it runs, we are in Forbid() and Disable() for the full duration meaning that there are no task switching and no interrupt servicing.

You should run this only from ram disk with nothing else trying to do any kind of disk IO, for safety.

Below is from UAE, with JIT disabled and "approximate speed" set. This is just to get a measurable value and shouldn't be taken as remotely quantitative.

Code:
Akiko Detected
C[0]: 0x80808080 P[0]: 0x0000000F
C[1]: 0x40404040 P[1]: 0x000000F0
C[2]: 0x20202020 P[2]: 0x00000F00
C[3]: 0x10101010 P[3]: 0x0000F000
C[4]: 0x08080808 P[4]: 0x000F0000
C[5]: 0x04040404 P[5]: 0x00F00000
C[6]: 0x02020202 P[6]: 0x0F000000
C[7]: 0x01010101 P[7]: 0xF0000000
Benchmarking Akiko Read/Write (reg -> hw -> reg) with 100000 iterations, 32 bytes per iteration...
C.Freq:  709379 Hz
Begin:   260875852 ticks
Finish:  260917182 ticks
Elapsed: 41330 ticks, 58 ms

Perf:    54924093 bytes/second
The code is far from best practise - it doesn't own or wait for the blitter on the assumption that nobody else is actually using akiko for anything. Due to laziness the time also includes the time taken to Forbid()/Disable()/Enable()/Permit() (for reasons unknown, every attempt to call this from just the loop tended to freeze and I couldn't be bothered getting to the bottom of it).
See attached pics. Sorry for the bad quality.

Akiko2.jpg:

From the top:

No data cache. - Last plane failed for some reason. This is the only time I got that issue.

The rest. Same cache setting. - for some reason every other run is slower.


Akiko3.jpg:

Data cache on. - Last unconverted plane is read back from cache as expected.

Data cache off.
Attached Thumbnails
Click image for larger version

Name:	akiko2.jpg
Views:	64
Size:	476.0 KB
ID:	82283   Click image for larger version

Name:	akiko4.jpg
Views:	52
Size:	664.1 KB
ID:	82284  
Lunda is offline  
Old 25 May 2024, 08:55   #77
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,448
Has to be something wrong there no? Those numbers seem to big
alexh is offline  
Old 25 May 2024, 09:57   #78
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
Quote:
Originally Posted by alexh View Post
Has to be something wrong there no? Those numbers seem to big
I'll double check the code just to make sure I've not got a numeric overflow or something. It's using the eclock for the timing and 64 bit arithmetic.

Let's sanity check the calculation on the Beast's last run:

107580815 - 107527047 = 53768 ticks

53768 / 709379 = 0.0757958722 seconds

100000 * 32 = 3200000 bytes transformed

3200000/0.0757958722 = 42,218,658

That looks plausible, so I'm going to assume that in an overtired and heavily distracted state, I got the number of bytes or loops incorrect due to a simple coding error.

I've probably done something daft and right shift the loop counter by 5 in the ASM code to account for 32 pixels and then forgotten about that in the calling scope.
Karlos is online now  
Old 25 May 2024, 11:03   #79
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
The benchmark function doesn't appear to have that issue

Code:
; count in d0
_bench_akiko_rw:

	movem.l	d2/a6,-(sp)
	move.l	d0,d2

	move.l	_SysBase,a6
	jsr		_LVOForbid(a6)
	jsr		_LVODisable(a6)
	;jsr _LVOSuperState(a6)

	move.l	#$00B80038,a0

.loop:
	move.l	d0,(a0)
	move.l	d0,(a0)
	move.l	d0,(a0)
	move.l	d0,(a0)
	move.l	d0,(a0)
	move.l	d0,(a0)
	move.l	d0,(a0)
	move.l	d0,(a0)

	move.l	(a0),d0
	move.l	(a0),d0
	move.l	(a0),d0
	move.l	(a0),d0
	move.l	(a0),d0
	move.l	(a0),d0
	move.l	(a0),d0
	move.l	(a0),d0

	subq.l	#1,d2
	bgt.s	.loop

	;jsr	_LVOUserState(a6)
	jsr _LVOEnable(a6)
	jsr _LVOPermit(a6)

.done:
	movem.l	(sp)+,d2/a6
	rts
This is called from the following C code

Code:
extern void bench_akiko_rw(REG(d0, ULONG reps));

#define BENCH_INTERATIONS 100000
#define PIXELS_PER_ITERATION 32

int main(void) {
	if (have_akiko()) {
		puts("Akiko Detected");
		verify_c2p();
		if (get_timer()) {
			printf(
				"Benchmarking Akiko Read/Write (reg -> hw -> reg) with %d iterations, %d bytes per iteration...\n",
				BENCH_INTERATIONS,
				PIXELS_PER_ITERATION
			);

			ULONG freq = ReadEClock(&clk_begin.ecv);
			bench_akiko_rw(BENCH_INTERATIONS);
			ReadEClock(&clk_end.ecv);

			printf("C.Freq:  %u Hz\n", freq);
			printf("Begin:   %llu ticks\n", clk_begin.ticks);
			printf("Finish:  %llu ticks\n", clk_end.ticks);

			ULONG elapsed    = (ULONG)(clk_end.ticks - clk_begin.ticks);
			ULONG elapsed_ms = (elapsed * 1000) / freq;

			printf("Elapsed: %u ticks, %u ms\n", elapsed, elapsed_ms);

			ULONG64 dividend = (BENCH_INTERATIONS * PIXELS_PER_ITERATION) * (ULONG64)freq;

			printf("\nPerf:    %u bytes/second\n", (ULONG)(dividend/elapsed));

			free_timer();
		}
	}
	return 0;
Karlos is online now  
Old 25 May 2024, 13:01   #80
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,412
I'm going to add the cache manipulation code later, but running without datacache enabled should at least guarantee it's not just measuring datacache IO Performance
Karlos is online now  
 


Currently Active Users Viewing This Thread: 2 (1 members and 1 guests)
Tsak
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
C2P Performance issues meeku Coders. Asm / Hardware 10 09 April 2019 18:29
Alien Breed 3D CD32 - Akiko C2P? wairnair support.Games 9 06 July 2018 14:32
Gloom Akiko C2P? Whitesnake support.Games 5 23 April 2007 19:01
Blizzard 030/50 Accelerators Parsec Amiga scene 20 14 February 2004 17:48
Cd32 Emulator (AKIKO) Doozy support.WinUAE 3 06 December 2001 08:41

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:19.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10820 seconds with 16 queries