06 December 2021, 09:43 | #1 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,300
|
ECS Amiga 4bpl C2P routine
I'm looking for a 1x1 or 2x1 4bpl C2P routine that would perform reasonably well on an A500. I'm aware of the 0000abcd -> aa00bb00cc00dd00 scrambling, but I was unable to find a routine that performs C2P using a buffer prepared in such a format. I've found code snippets, but due to my non-existend assembly skills I'm trying to find a complete routine. Any help would be greatly appreciated.
I'm thinking about a new Catacomb 3D port, but this time for the A500 and the C2P seems to be the first major roadblock. For the later games the 3D area takes up 320x120 pixels of the screen, the 2D part of the screen uses planar 4bpl graphics and only small parts are refreshed each frame. |
06 December 2021, 17:03 | #2 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,204
|
What format do you mean *exactly*? The more "prescambling" you do the faster things will be, but it can make other things more difficult (textures using too much RAM, rendering may be more difficult). KK has some info about what he's doing for Dread in the thread, but the format might not fit what you're doing. Maybe you can even convince him to share the C2P, but again the format might not be appropriate for what you're going for.
What frame rate are you targeting and how long do you think the other stuff will take? Could be a fun exercise to try to come up with something, but my gut feeling is that 320x120 at 1x1 is going to be tough. |
06 December 2021, 17:58 | #3 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,300
|
I'm not sure about the format to be honest, I just read that scrambling the nybbles of the byte into a word like so allows to skip one pass of the C2P. If I could use the original textures (1 byte per pixel, 16 color chunky) without scrambling that would be even better. I've been following the Dread thread, but what KK is doing is much more complex than what I'd need.
I'm looking for something around 10-15 fps, which should be enough for these games. Most enemies move slowly and use either melee or projectile attacks. 2x1 would be fine, the unreleased Apple IIgs port of Catabomb Abyss also doubled the columns and it didn't look that bad. I could switch to 1x1 when the game is started on an 020 machine for example. |
06 December 2021, 19:15 | #4 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,204
|
For a C2P it basically comes down to how many places you need to move the bits, and how to do that efficiently.
%0000abcd needs to turn into %a0000000, %b0000000, %c0000000, %d0000000 (appropriately shifted for each 8 pixels) for a 1x1. If you look at the good old https://amycoders.org/sources/c2ptut.html the prescamble part is all about avoiding one (or more) of the parts mentioned in section 10. I have some old code lying around that does 320x120x4 (1x1) with the blitter using 281 raster lines (with no CPU interleaving and no other DMA active), but it assumes this format: %a3b3c3d3a2b2c2d2a1b1c1d1a0b0c0d0, i.e. 4 different textures for each horizontal position necessary (or other trickery). Might be worth it to play around with a quickish proof of concept that just assumes the basic texture format, and then work on trying to optimize it. You can just use a C2P written in C (or whatever) for this. Then you can ask the optimization gods in asm/hardware and you'll have a better idea about where it's possible to optimize. |
07 December 2021, 07:01 | #5 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,300
|
Thanks, if you find it I'd love to check that out. I'll take your advice, and start with a CPU 2x1 C2P written with 68020 in mind, and see where that takes me.
edit: I tried this one and it can do about 11 fps. This is probably impossible without some parallel processing with the Blitter, but this is way out of my league Last edited by BSzili; 07 December 2021 at 17:07. |
08 December 2021, 10:06 | #6 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,307
|
Your best guess is not to reinvent the wheel but use whatever is already present. graphics/WriteChunkyPixels8() and friends provide already functions for that, and from 3.1.4 onwards, they are even optimized. (-;
|
08 December 2021, 12:10 | #7 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,300
|
I have no doubt they are, but my goal is to get this running on an A500 with a 512KB slow RAM expansion. That also implies Kickstart 1.x, which is what most people have in their machine. If Kickstart 3+ is the minimum requirement, then I might as well stick with AGA machines.
|
08 December 2021, 12:17 | #8 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,924
|
And actually this thread is not about reinventing the wheel but quite to the contrary about finding a fitting existing one...
|
08 December 2021, 12:35 | #9 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,719
|
Quote:
I looked at the source code to Catacomb Abyss and saw that the scaling and pixel drawing routine is written in highly optimized self-modifying X86 machine code. Are you using this codebase, or something else? |
|
08 December 2021, 14:06 | #10 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,300
|
They are Catacomb Abyss, Catacomb Armageddon and Catacomb Apocalypse, or the Catacomb Adventure Series as Gamer's Edge called them.
My previous attempt was based on refkeen, and to get it to work I wrote my own very poor 68k compiled scalers back in 2016. I have completely rewrote it since to make it suck less, but this version never got used. At it's core it's just a big set of unrolled loops, that copies bytes from one location to another with fixed offsets. |
08 December 2021, 14:22 | #11 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
|
08 December 2021, 16:37 | #12 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,039
|
Quote:
|
|
08 December 2021, 16:50 | #13 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,204
|
Here's a fairly stupid blitter-only C2P for the following format: %aa00bb00cc00dd00 (where a is the most significant bit of the 4 bpl pixel).
It does 3 full passes over the data. The first one is to combine two adjacent pixels, and can be omitted if you can restructure your code to do it there (you probably want to do this for speed). Without any other DMA active and blitter nasty set it takes (in rasterlines): 320x256: 836 320x120: 394 If you can elide the first pass that goes down to: 320x256: 558 320x120: 264 Maybe still not fast enough, and you probably don't want blitter nasty set for more than a full frame (which it'll end up being once other DMA is active). Should be pretty easy to transform into a blitter queue or otherwise break up though. EDIT: Note it's still 2x1 so 320x120 means a 160x120 chunky screen, and yes it uses 16 bits for each of those pixels in the 3 pass mode. Last edited by paraj; 08 December 2021 at 17:36. |
08 December 2021, 18:29 | #14 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
@Thread
I am considering a title change to this thread. Since, the A500 does not own the monopoly on C2P code some possible options for greater topic diversity are: Low End Amiga ECS Amiga Amiga or any better ideas? I will wait a few days for feedback to be posted before making any changes... |
08 December 2021, 19:32 | #15 | ||
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,300
|
Quote:
Quote:
|
||
08 December 2021, 20:19 | #16 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,204
|
Quote:
NP. I should probably have called it "packed" rather than "scrambled" (since it's already somewhat scrambled and uses more ram...). The difference comes more down to whether you have to do special work for odd and even pixels. If the rendering is (mostly) done in vertical spans this shouldn't be too bad, but I agree it's a good idea to do a POC without it if the framerate is acceptable. It's worth noting that only the chunky output buffer needs to be in real chip ram, so if you need to duplicate chunky textures they can still be kept in slowram. Won't a '020 or better (especially with fast RAM) be the dividing line? I don't remember exactly, but weren't '030 around the time when you stopped wanting to use the blitter at all for C2P's? Either way, later on you probably want to also provide non-blitter version of the C2P for faster machines, but that's a longer term goal |
|
08 December 2021, 21:53 | #17 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,300
|
The walls and sprites are drawn as columns so if necessary they could be separated in the last drawing phase. For 020 I plan to make a separate version, because there I won't be able to use self-modifying code for the sprites. In the original version for each run it jumps into the scaler the start pixel, and patches in a return address after the end pixel. In my port I used a texel to screen pixel lookup table, which worked well enough, so with a better implementation that should work, but as you said this is a long term goal.
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
C2P 2x1 4bpl for standard BitMaps | BSzili | Coders. Asm / Hardware | 9 | 01 September 2016 19:40 |
asm routine to reloc amiga executable | jotd | Coders. Asm / Hardware | 3 | 10 March 2016 19:24 |
Which is the fastest software C2P 1x1 routine | Gunnar | Coders. Asm / Hardware | 243 | 15 January 2015 22:52 |
C2P....help! | NovaCoder | Coders. General | 8 | 17 December 2009 00:15 |
Game in c2p? | oRBIT | Amiga scene | 11 | 01 February 2007 21:28 |
|
|