English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 04 June 2024, 14:56   #261
Lunda
Registered User
 
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 56
Quote:
Originally Posted by Karlos View Post
What happens if you retest on an 8 bit low-res pal mode ? That's a better model when thinking of the true performance of chip writes for our use case.
Here are the results.

NTSC 320x200x8:
Case 0: 70fps
Case 1: 66fps
Case 7: 42fps
Case 8: 47fps

PAL 320x256x8:
Case 0: 69fps
Case 1: 69fps
Case 7: 42fps
Case 8: 46fps
Lunda is offline  
Old 07 June 2024, 17:53   #262
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 852
BTW, does anyone remember how the Graffiti(?) made its chunky modes conversion?
How many pixels did it read before it did its conversion?

This thread made me think of it, and IIRC it had a slightly convoluted pixel ordering/layout.
I expect the reason for that was limited memory before the bit-shuffle was done; if you buffered a whole line you could just split the bits out while getting input and then dumping the whole thing at the end of the line?

So I was thinking about making a linear output of bits for something like Grafitti that would make a straight non-shuffled chunky source possible and I think there should be at least a few ways to do so:
SHRES 2 bitplanes, FMODE=3, 64 pixels wide display, 8 byte modulo, bitplane 2 staring 8 bytes after bitplane 1, copper used to restart display every 64 pixels.
HRES 4 bitplanes, FMODE=1, 32 pixels wide display, 16 byte modulo, bitplanes start 4 bytes after each other, copper used to restart display every 32 pixels.
These would require the device to buffer 64 or 32 pixels worth of data, and I suspect the Grafitti only cares about 16. But someone could run with the idea and scale a device up to 32/64/320...
NorthWay is online now  
Old 08 June 2024, 02:20   #263
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
found a few moments to shoehorn the random thunk I had into the test Karlos wrote (brain damaged out most of the known / consistent tests)

this only has:

0:Vanilla Fast to Chip Copy, 8 longwords at a time.

1:Akiko C2P (Limit) Register to Akiko to Register throughput, CACR Write Allocation Disabled.

2:Akiko C2P (Naive) Chunky read from Fast, planar write to Chip, CACR Write Allocation Disabled.

3:Akiko C2P (AL_Buffer) Chunky read from Fast, planar write to Chip, register buffer to/from Akiko, CACR Write Allocation Disabled.

4:Kalms C2P (c2p1x1_8_c5_030_2) Chunky read from Fast, planar write to Chip.

run it from ram disk to be on the safe side.
give it a go if you are brave.
Attached Files
File Type: lha akiko_abu.LHA (10.0 KB, 8 views)
abu_the_monkey is offline  
Old 08 June 2024, 11:40   #264
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
If possible, run while viewing a 320*256*8 bit screen as that will simulate the chip ram contention.

If you do find a useful speed up on 030/50MHz I'm all ears btw. Until then, I'm focusing on audio for a bit ..

("all ears" for video related issues, "focusing on" for audio. Weird phraseology)

Last edited by Karlos; 08 June 2024 at 12:59.
Karlos is offline  
Old 08 June 2024, 17:51   #265
abu_the_monkey
Registered User
 
Join Date: Oct 2020
Location: Bicester
Posts: 2,018
Quote:
Originally Posted by Karlos View Post
"all ears" for video related issues, "focusing on" for audio. Weird phraseology


Quote:
If possible, run while viewing a 320*256*8 bit screen as that will simulate the chip ram contention.
yes, I think this would be preferable.
abu_the_monkey is offline  
Old 15 June 2024, 14:39   #266
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
A wild thought appears:

Consider that the limits to Akiko are in the transfer of converted planes from it's register to the chip ram.

GRIND only uses a 4 bit display depth. On an unexpanded CD32. You still have to write 8 longs to it, but you only have to transfer 4. So, it might give a significant boost to the already decent framerate on CD32.
Karlos is offline  
Old 16 June 2024, 18:57   #267
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
FWIW, to integrate Akiko into Grind requires a few assumptions on my part.

1. Chunky Pixel Framebuffer is 1 byte per pixel.
2. Just the lowest 4 bits are set in each byte.
3. It doesn't actually matter what the upper bits are.
4. Framebuffer is arranged in rows.

The Akiko workflow first involves detecting it, which is pretty easy. Theres a magic value to read from a particular address. The code is in my test repo.

The C2P loop is trivial:

You read 8 consecutive longs from the chunky buffer, writing each one into the single register address. That's 32 input pixels.

You read back 4 consecutive longs from the address. Each long is 32 bits for each of your planes. You write those to your chip ram planes. You don't have to read back all 8 and planes are returned lowest plane first.

Despite the 14MHz bus, reading and writing the Akiko register location is basically uncontended, it's only chip ram writes that are the problem.

That's it.
Karlos is offline  
Old 16 June 2024, 19:07   #268
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,194
Isn't grind 2x2? I also seem to recall that it uses (or used) an optimized "chunky" layout that would save or help one or more passes. Doesn't mean it's impossible or even hard to adapt, but I doubt implementing it will be as easy as it was in AB3DII (or say Breathless).

It's interesting to think about for sure, but my suspicious is that it will take a bit of effort (maybe much!) to get working, and would only be a late addition (probably via a plugin architecture thing where you might also be able to get better visuals on 030+ or something).
paraj is offline  
Old 16 June 2024, 19:38   #269
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
I think it's 1x2, but that could be an artefact.

I also haven't bothered implementing it I AB3D2 because I haven't really seen any evidence that it would help by the time you are on 030/50, which is the absolute MVP processor option (1x2 pixels, 2/3 size).

Last edited by Karlos; 16 June 2024 at 20:38.
Karlos is offline  
Old 16 June 2024, 20:15   #270
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
It goes without saying that if it is 2x2, Akiko may not be such a great idea.
Karlos is offline  
Old 16 June 2024, 23:40   #271
Tsak
Pixelglass/Reimagine
 
Tsak's Avatar
 
Join Date: Jun 2012
Location: Athens
Posts: 1,053
Quote:
Originally Posted by Karlos View Post
It goes without saying that if it is 2x2, Akiko may not be such a great idea.
Grind's logic pixel is 2x2. But we can use different colors for each horizontal pair, hence the 1x2 result.

You say using Akiko isn't a good idea for 2x2?
Also has anyone tested KK's blitter driven C2P against the others?
Obviously not ideal for 030+ but generally speaking.
Tsak is offline  
Old Yesterday, 00:19   #272
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
The reason it may not be such a good idea is that you'd have to do some pixel doubling either on the input to Akiko, or the output. Depending on how this is done, it might nullify any gains. As @paraj says, the current C2P method is quite specialised and what you said here also gives me pause.
Karlos is offline  
Old Yesterday, 19:22   #273
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,419
I'm not sure I understand how Grind makes 2 horizontal pixels from 1 logical 2x2 one. I remember watching a video about the development where KK described an algorithm for that, but I'd have to revisit it.
Karlos is offline  
Old Today, 02:03   #274
lmimmfn
Registered User
 
Join Date: May 2018
Location: Ireland
Posts: 689
Quote:
Originally Posted by Karlos View Post
I'm not sure I understand how Grind makes 2 horizontal pixels from 1 logical 2x2 one. I remember watching a video about the development where KK described an algorithm for that, but I'd have to revisit it.
Quite sure it's just normal resolution vertically and horizontal rez is halved, I.e. 160 pixels wide and upscaled to 320 x 256

I guess it's direct chipram writes and may be more efficient than Akiko when not doing 320x256 I.e. no half horizontal resolution.
lmimmfn is offline  
Old Today, 04:51   #275
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,199
Quote:
Originally Posted by lmimmfn View Post
Quite sure it's just normal resolution vertically and horizontal rez is halved, I.e. 160 pixels wide and upscaled to 320 x 256



I guess it's direct chipram writes and may be more efficient than Akiko when not doing 320x256 I.e. no half horizontal resolution.
Copper stretching is used vertically so the vertical resolution is about 100 pixels actual in Grind.
Samurai_Crow is offline  
Old Today, 07:44   #276
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 821
Quote:
Originally Posted by Karlos View Post
I'm not sure I understand how Grind makes 2 horizontal pixels from 1 logical 2x2 one. I remember watching a video about the development where KK described an algorithm for that, but I'd have to revisit it.
Because you're moving bytes around, you can pack two 4bit pixels into one byte which in turn gives you a kind of dithering basically for free.
britelite is offline  
Old Today, 08:34   #277
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,194
I seem to recall KK writing that the most significant pixel was already in place, so I think it's laid out something like this in memory (bits):
Code:
A3 a3 A2 a2 A1 a1 A0 a0 E3 e3 E2 e2 E1 e1 E0 e0 B3 b3 B2 b2 B1 b1 B0 b0 ...
So first byte is dithered pixel 0 followed by pixel 4, pixel 1, pixel 5, etc. Then you'd only need two blitter passes (4x2 and 2x1) for the C2P.

From the code it seems to at least be something close to that (but not exactly): https://github.com/Krzysiek-K/Dread-...a_framework.s/
paraj is offline  
 


Currently Active Users Viewing This Thread: 2 (0 members and 2 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
C2P Performance issues meeku Coders. Asm / Hardware 10 09 April 2019 18:29
Alien Breed 3D CD32 - Akiko C2P? wairnair support.Games 9 06 July 2018 14:32
Gloom Akiko C2P? Whitesnake support.Games 5 23 April 2007 19:01
Blizzard 030/50 Accelerators Parsec Amiga scene 20 14 February 2004 17:48
Cd32 Emulator (AKIKO) Doozy support.WinUAE 3 06 December 2001 08:41

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 16:05.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11663 seconds with 16 queries