10 April 2010, 21:48 | #1 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
Any C2P experts here?
I'm thinking if rewriting the gfxdriver in A/NES, my NES emulator for classic AmigaOS, and make use of chunky graphics instead of planar. There are several benefits but the c2p conversion on AGA/ECS systems is an obvious drawback.
I'd probably be using a 256x240x5 screen. A c2p using delta-conversion would probably benefit quite alot due to the nature of the graphics. I am however no c2p expert myself and the last time I checked, the "market" of public c2p's using deltabuffers were quite limited. So my first question is.. What kind of speed do you think I can expect on 68030/50 or 68060/50 using worst case c2p scenarios (full screen conversion?) The emulation itself is using *alot* of CPU time and I wonder if it's worth the effort of implementing a chunky gfxdriver for classic Amiga-systems. My second question is.. Could a c2p be customized for the kind of graphics NES uses to optimize performance? The NES gfx is tile based (8x8 pixels)with heavy restrictions on the amount colours to be used on such a tile. Thanks in advance |
11 April 2010, 00:51 | #2 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
I'll just take 68060 as an example.
a full c2p conversion (without delta etc) need to do 3 things: 1) read 256x240 bytes = 60kB of data from fastmem 2) transform it 3) write 256x240x5 bits = 37.5kB of data to chipmem Step 1 and 3 can not be done in parallel. Step 2 can mostly be done in parallel with step 1 & step 3. Step 1 takes perhaps 20 scanlines. Step 2 takes perhaps 50 scanlines. Step 3 takes perhaps 125 scanlines. [the figures above are rough estimates.] Thus you can expect the full conversion to take 20+125=145 scanlines. The CPU's memory bus will be busy all the time. The CPU will be doing actual processing during about 50 of those 145 scanlines. Rest of the time is the CPU stalling waiting for the bus interface to finish previous operations. The above points to the fact that what would make the most difference to you would be to reduce the amount of chipmem writes. Extra computations or extra fastmem accesses (for instance, by doing delta c2p) traded against less chipmem writes might help. However then a full conversion will be slightly slower than the naive approach. Your call. |
11 April 2010, 11:11 | #3 |
Banned
Join Date: Jan 2007
Location: France
Posts: 655
|
@Kalms
Please, are you ok to put in the graphics.library your fast c2p for : - 000/010 - 020 - 030 - 040/060 For coders, reinventing the wheel each times, it's not funny ! |
11 April 2010, 16:17 | #4 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
which routines are you thinking of?
generally, any program that runs on 000/010 requires the c2p conversion integrated into the rendering code to reach high enough framerates. As for 020/030, they can use the same routine, and 040/060 usually can share routine as well with no drawbacks. If you were referring to some delta c2p, then sure - but the question at hand is whether or not a delta c2p would be useful (and at what stage the delta comparison should be performed). |
11 April 2010, 20:53 | #5 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
@Kalms
Can you (or anyone else perhaps) point to a public c2p that's capable of doing a 256x240x5 (but configurable to 8) conversion? Everything on Aminet feels pretty ancient sadly. Or even better, anyone up for the task in helping with a customized c2p? |
11 April 2010, 21:29 | #6 |
2 1200s in Wisconsin
|
Snipped the first ??, as I have no idea how to speed that up. Though I think you hit the jackpot on the 2nd question. If the NES can only use 8x8 tiles, at 16 or 32 colors, you might be able to pre-convert the tile data to planar, and just use the blitter to move them about. This would require a bit more ram, but I'm guessing not as much as a c2p dump of a whole screen, though that would depend on how many "tiles" are used. The only thing I would worry about is how NES games store tiles, if its not standard, it might be problematic from game to game. Though it might be possible to do an on-the-fly conversion as tiles are used, with the option of caching the converted data for later use if the tile is called again (requiring some sort of MMU table or custom alternative).
|
11 April 2010, 21:57 | #7 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
The NES actually uses planar graphics but as I've discovered, it's still a pain to use on the Amiga due to several reasons.
NES background graphics consists of 4 planes (sort of). Two planes with actual graphics and two more ones (called attributetable)that just selects the palette. The attributetable is 64 bytes and each byte looks like this: %AABBCCDD AA represents a 16x16 block in the upperleft corner with upper color bits AA. BB represents the lowerleft 16x16 block CC -:- upper right 16x16 block DD -:- lower right So each byte describes the palette for a 32x32 pixel block. Which means, the attribute bits changes more rarely and perhaps could be handled in a separate way in a clever c2p? |
11 April 2010, 23:12 | #8 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
Unpacking attribute table with lazy evaluation, 68020+/AGA style:
do it 100% on the CPU side. Code:
char currentAttributeTable[64] = ...; char previousAttributeTable[64] = ...; for (i = 0; i < 64; i++) { if (currentAttributeTable[i] != previousAttributeTable[i]) { write 32x32 pixels x 2 bitplanes to chipmem, with currentAttributeTable[i] as the source info } } 1) 64*2 = 128 bytes of fastmem reads, regardless of number of changes 2) for every byte that has changed, generate pixelmasks to write to chipmem - should take not more than a bunch of cycles 3) for every byte that has changed, write 32*4*2 = 256 bytes to chipmem If there is a significant numbeer of updates, then 90%+ of the time will be spent waiting for chipwrites to complete. You could actually build a copperlist (or use a blitter interrupt) and get the blitter to perform the memory writes. This would work out because the amount of info that you need to transfer fast->chip is vastly smaller than the amount of chipmem writing that needs to be done. |
12 April 2010, 00:14 | #9 | |
Registered User
Join Date: Jun 2008
Location: planet earth
Posts: 1,115
|
Quote:
@Kalms: is it ok posting it again? |
|
12 April 2010, 00:49 | #10 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
sure, or just google it - http://www.google.com/#hl=en&q=kalmsc2p.lha
you probably want c2p1x1_5_c5_030.s / c2p1x1_5_c5_060.s for starters. |
12 April 2010, 01:23 | #11 |
Registered User
Join Date: Jun 2008
Location: planet earth
Posts: 1,115
|
|
12 April 2010, 01:50 | #12 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
now also available at http://unitedstatesofamiga.googlecode.com/ (and it ought to survive for a while over there)
|
12 April 2010, 08:25 | #13 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
Thanks for that. Worked great. However, I do miss some c2p's that implements deltabuffers. :-)
|
12 April 2010, 09:39 | #14 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
Try it yourself first. Use an existing routine to C2P convert fast->fast, then do a delta copy fast->chip. It's going to be a tad slower than a C2P with deltacopy built in, but not all that bad.
|
13 April 2010, 01:38 | #15 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,408
|
BlazeWCP patches the AmigaOS chunky functions already. What would be a good idea is to include BlaseWCP (or an improved CPU targeted version) into graphics.library (one less patch).
|
13 April 2010, 18:49 | #16 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
@Kalms:
I'm having issues with, what I suspect, your c2p: Check out this package I've done: http://dl.dropbox.com/u/2590713/C2PIssue.lha There are three files: SMB.Chunky = 256x240 Chunky screen (source) C2P_Result = Bitplane image converted by your c2p. Saved as IFF24 by artpro Chunkysource = SMB.Chunky converted to IFF24 by Artpro using Chunky loader Don't mind the generally buggy graphics. Check out the "chunkysource" file. There's a line of dots where Mario usually stands. The dots are like 1,0,1,0,1,0,1,0. Now check out the C2P_Result. The dots doesn't look the same. Some kind of smearing stuff and I don't understand why. I've traced my code but at the moment it looks something is strange with the c2p conversion. I'm using the c2p1x1_5_c5_060.s code. Last edited by oRBIT; 13 April 2010 at 19:14. |
13 April 2010, 19:47 | #17 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Looks like that is indeed a bug in the 5bpl c2p, when using the 6bpl c2p everything looks fine. May do some debugging later.
|
13 April 2010, 19:59 | #18 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
@Stingray: Thanks for verifying that. I thought I was going nuts.
|
14 April 2010, 00:08 | #19 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
Just a thought -- those white dots, are those color value 31 or 255? If you have pixels with color value higher than 31 in the buffer, then the 5bpl c2p will produce such smearing.
|
14 April 2010, 07:17 | #20 |
Zone Friend
Join Date: Apr 2006
Location: Gothenburg/Sweden
Age: 48
Posts: 344
|
@Kalms: Nope, they have value $1D.
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Coders Challenge #2: C2P | oRBIT | Coders. General | 4 | 04 June 2010 18:12 |
HAM8 C2P Hacking | NovaCoder | Coders. General | 2 | 25 March 2010 10:37 |
C2P Speed question | Thorham | Coders. General | 5 | 20 January 2010 04:27 |
C2P....help! | NovaCoder | Coders. General | 8 | 17 December 2009 00:15 |
Game in c2p? | oRBIT | Amiga scene | 11 | 01 February 2007 21:28 |
|
|