English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 06 December 2021, 10:43   #1
BSzili
Registered User

BSzili's Avatar
 
Join Date: Nov 2011
Location: Hungary
Posts: 647
ECS Amiga 4bpl C2P routine

I'm looking for a 1x1 or 2x1 4bpl C2P routine that would perform reasonably well on an A500. I'm aware of the 0000abcd -> aa00bb00cc00dd00 scrambling, but I was unable to find a routine that performs C2P using a buffer prepared in such a format. I've found code snippets, but due to my non-existend assembly skills I'm trying to find a complete routine. Any help would be greatly appreciated.
I'm thinking about a new Catacomb 3D port, but this time for the A500 and the C2P seems to be the first major roadblock. For the later games the 3D area takes up 320x120 pixels of the screen, the 2D part of the screen uses planar 4bpl graphics and only small parts are refreshed each frame.
BSzili is offline  
Old 06 December 2021, 18:03   #2
paraj
Registered User

 
Join Date: Feb 2017
Location: Denmark
Posts: 167
What format do you mean *exactly*? The more "prescambling" you do the faster things will be, but it can make other things more difficult (textures using too much RAM, rendering may be more difficult). KK has some info about what he's doing for Dread in the thread, but the format might not fit what you're doing. Maybe you can even convince him to share the C2P, but again the format might not be appropriate for what you're going for.

What frame rate are you targeting and how long do you think the other stuff will take? Could be a fun exercise to try to come up with something, but my gut feeling is that 320x120 at 1x1 is going to be tough.
paraj is offline  
Old 06 December 2021, 18:58   #3
BSzili
Registered User

BSzili's Avatar
 
Join Date: Nov 2011
Location: Hungary
Posts: 647
I'm not sure about the format to be honest, I just read that scrambling the nybbles of the byte into a word like so allows to skip one pass of the C2P. If I could use the original textures (1 byte per pixel, 16 color chunky) without scrambling that would be even better. I've been following the Dread thread, but what KK is doing is much more complex than what I'd need.

I'm looking for something around 10-15 fps, which should be enough for these games. Most enemies move slowly and use either melee or projectile attacks. 2x1 would be fine, the unreleased Apple IIgs port of Catabomb Abyss also doubled the columns and it didn't look that bad. I could switch to 1x1 when the game is started on an 020 machine for example.
BSzili is offline  
Old 06 December 2021, 20:15   #4
paraj
Registered User

 
Join Date: Feb 2017
Location: Denmark
Posts: 167
For a C2P it basically comes down to how many places you need to move the bits, and how to do that efficiently.

%0000abcd needs to turn into %a0000000, %b0000000, %c0000000, %d0000000 (appropriately shifted for each 8 pixels) for a 1x1. If you look at the good old https://amycoders.org/sources/c2ptut.html the prescamble part is all about avoiding one (or more) of the parts mentioned in section 10.

I have some old code lying around that does 320x120x4 (1x1) with the blitter using 281 raster lines (with no CPU interleaving and no other DMA active), but it assumes this format: %a3b3c3d3a2b2c2d2a1b1c1d1a0b0c0d0, i.e. 4 different textures for each horizontal position necessary (or other trickery).

Might be worth it to play around with a quickish proof of concept that just assumes the basic texture format, and then work on trying to optimize it. You can just use a C2P written in C (or whatever) for this. Then you can ask the optimization gods in asm/hardware and you'll have a better idea about where it's possible to optimize.
paraj is offline  
Old 07 December 2021, 08:01   #5
BSzili
Registered User

BSzili's Avatar
 
Join Date: Nov 2011
Location: Hungary
Posts: 647
Thanks, if you find it I'd love to check that out. I'll take your advice, and start with a CPU 2x1 C2P written with 68020 in mind, and see where that takes me.
edit: I tried this one and it can do about 11 fps. This is probably impossible without some parallel processing with the Blitter, but this is way out of my league

Last edited by BSzili; 07 December 2021 at 18:07.
BSzili is offline  
Old 08 December 2021, 11:06   #6
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 1,590
Your best guess is not to reinvent the wheel but use whatever is already present. graphics/WriteChunkyPixels8() and friends provide already functions for that, and from 3.1.4 onwards, they are even optimized. (-;
Thomas Richter is offline  
Old 08 December 2021, 13:10   #7
BSzili
Registered User

BSzili's Avatar
 
Join Date: Nov 2011
Location: Hungary
Posts: 647
I have no doubt they are, but my goal is to get this running on an A500 with a 512KB slow RAM expansion. That also implies Kickstart 1.x, which is what most people have in their machine. If Kickstart 3+ is the minimum requirement, then I might as well stick with AGA machines.
BSzili is offline  
Old 08 December 2021, 13:17   #8
grond
Registered User

 
Join Date: Jun 2015
Location: Germany
Posts: 1,152
And actually this thread is not about reinventing the wheel but quite to the contrary about finding a fitting existing one...
grond is offline  
Old 08 December 2021, 13:35   #9
Bruce Abbott
Registered User

Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 934
Quote:
Originally Posted by BSzili View Post
I'm thinking about a new Catacomb 3D port, but this time for the A500 and the C2P seems to be the first major roadblock. For the later games the 3D area takes up 320x120 pixels of the screen, the 2D part of the screen uses planar 4bpl graphics and only small parts are refreshed each frame.
Which games would those be?

I looked at the source code to Catacomb Abyss and saw that the scaling and pixel drawing routine is written in highly optimized self-modifying X86 machine code. Are you using this codebase, or something else?
Bruce Abbott is offline  
Old 08 December 2021, 15:06   #10
BSzili
Registered User

BSzili's Avatar
 
Join Date: Nov 2011
Location: Hungary
Posts: 647
They are Catacomb Abyss, Catacomb Armageddon and Catacomb Apocalypse, or the Catacomb Adventure Series as Gamer's Edge called them.
My previous attempt was based on refkeen, and to get it to work I wrote my own very poor 68k compiled scalers back in 2016. I have completely rewrote it since to make it suck less, but this version never got used. At it's core it's just a big set of unrolled loops, that copies bytes from one location to another with fixed offsets.
BSzili is offline  
Old 08 December 2021, 15:22   #11
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,445
Quote:
Originally Posted by BSzili View Post
If Kickstart 3+ is the minimum requirement, then I might as well stick with AGA machines.
Actually it's 3.1+ (not present in 3.0). So there are AGA machines without it, most notably unexpanded A1200.
meynaf is offline  
Old 08 December 2021, 17:37   #12
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 53
Posts: 1,426
Quote:
Originally Posted by BSzili View Post
Thanks, if you find it I'd love to check that out. I'll take your advice, and start with a CPU 2x1 C2P written with 68020 in mind, and see where that takes me.
edit: I tried this one and it can do about 11 fps. This is probably impossible without some parallel processing with the Blitter, but this is way out of my league
This version is optimized for 68020+ (chip ram writes are pipelined), i think that pure 68000 version can be a few fastest. Some code can be moved and free temp registers for better usage #$55555555 value. Perhaps other changes can be done too.
Don_Adan is offline  
Old 08 December 2021, 17:50   #13
paraj
Registered User

 
Join Date: Feb 2017
Location: Denmark
Posts: 167
Here's a fairly stupid blitter-only C2P for the following format: %aa00bb00cc00dd00 (where a is the most significant bit of the 4 bpl pixel).

It does 3 full passes over the data. The first one is to combine two adjacent pixels, and can be omitted if you can restructure your code to do it there (you probably want to do this for speed).

Without any other DMA active and blitter nasty set it takes (in rasterlines):

320x256: 836
320x120: 394

If you can elide the first pass that goes down to:

320x256: 558
320x120: 264

Maybe still not fast enough, and you probably don't want blitter nasty set for more than a full frame (which it'll end up being once other DMA is active). Should be pretty easy to transform into a blitter queue or otherwise break up though.

EDIT: Note it's still 2x1 so 320x120 means a 160x120 chunky screen, and yes it uses 16 bits for each of those pixels in the 3 pass mode.
Attached Files
File Type: zip c2ptest.zip (22.3 KB, 29 views)

Last edited by paraj; 08 December 2021 at 18:36.
paraj is offline  
Old 08 December 2021, 19:29   #14
SpeedGeek
Moderator
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 58
Posts: 629
@Thread

I am considering a title change to this thread. Since, the A500 does not own the monopoly on C2P code some possible options for greater topic diversity are:

Low End Amiga
ECS Amiga
Amiga

or any better ideas?

I will wait a few days for feedback to be posted before making any changes...
SpeedGeek is offline  
Old 08 December 2021, 20:32   #15
BSzili
Registered User

BSzili's Avatar
 
Join Date: Nov 2011
Location: Hungary
Posts: 647
Quote:
Originally Posted by paraj View Post
Here's a fairly stupid blitter-only C2P for the following format: %aa00bb00cc00dd00 (where a is the most significant bit of the 4 bpl pixel).

It does 3 full passes over the data. The first one is to combine two adjacent pixels, and can be omitted if you can restructure your code to do it there (you probably want to do this for speed).

Without any other DMA active and blitter nasty set it takes (in rasterlines):

320x256: 836
320x120: 394

If you can elide the first pass that goes down to:

320x256: 558
320x120: 264

Maybe still not fast enough, and you probably don't want blitter nasty set for more than a full frame (which it'll end up being once other DMA is active). Should be pretty easy to transform into a blitter queue or otherwise break up though.

EDIT: Note it's still 2x1 so 320x120 means a 160x120 chunky screen, and yes it uses 16 bits for each of those pixels in the 3 pass mode.
Thanks, this will definitely be a good start for me. For the "proof of concept" I'll probably keep the textures in an unscrambled format to measure how much memory each level uses. Probably not a lot as they usually have some theme.

Quote:
Originally Posted by SpeedGeek View Post
@Thread

I am considering a title change to this thread. Since, the A500 does not own the monopoly on C2P code some possible options for greater topic diversity are:

Low End Amiga
ECS Amiga
Amiga

or any better ideas?

I will wait a few days for feedback to be posted before making any changes...
Fair enough, OCS Amiga or ECS Amiga is fine by me
BSzili is offline  
Old 08 December 2021, 21:19   #16
paraj
Registered User

 
Join Date: Feb 2017
Location: Denmark
Posts: 167
Quote:
Originally Posted by BSzili View Post
Thanks, this will definitely be a good start for me. For the "proof of concept" I'll probably keep the textures in an unscrambled format to measure how much memory each level uses. Probably not a lot as they usually have some theme.

NP. I should probably have called it "packed" rather than "scrambled" (since it's already somewhat scrambled and uses more ram...). The difference comes more down to whether you have to do special work for odd and even pixels. If the rendering is (mostly) done in vertical spans this shouldn't be too bad, but I agree it's a good idea to do a POC without it if the framerate is acceptable.


It's worth noting that only the chunky output buffer needs to be in real chip ram, so if you need to duplicate chunky textures they can still be kept in slowram.


Quote:
Originally Posted by BSzili View Post
Fair enough, OCS Amiga or ECS Amiga is fine by me

Won't a '020 or better (especially with fast RAM) be the dividing line? I don't remember exactly, but weren't '030 around the time when you stopped wanting to use the blitter at all for C2P's?


Either way, later on you probably want to also provide non-blitter version of the C2P for faster machines, but that's a longer term goal
paraj is offline  
Old 08 December 2021, 22:53   #17
BSzili
Registered User

BSzili's Avatar
 
Join Date: Nov 2011
Location: Hungary
Posts: 647
The walls and sprites are drawn as columns so if necessary they could be separated in the last drawing phase. For 020 I plan to make a separate version, because there I won't be able to use self-modifying code for the sprites. In the original version for each run it jumps into the scaler the start pixel, and patches in a return address after the end pixel. In my port I used a texel to screen pixel lookup table, which worked well enough, so with a better implementation that should work, but as you said this is a long term goal.
BSzili is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
C2P 2x1 4bpl for standard BitMaps BSzili Coders. Asm / Hardware 9 01 September 2016 20:40
asm routine to reloc amiga executable jotd Coders. Asm / Hardware 3 10 March 2016 20:24
Which is the fastest software C2P 1x1 routine Gunnar Coders. Asm / Hardware 243 15 January 2015 23:52
C2P....help! NovaCoder Coders. General 8 17 December 2009 01:15
Game in c2p? oRBIT Amiga scene 11 01 February 2007 22:28

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 02:35.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.
Page generated in 0.13580 seconds with 14 queries