Need help with use of Kalms-c2p routines for HAM8..

mateusz_s · 15 September 2022, 19:05

Hello,
I wanted to test kalms-c2p routines for converting a 32bit buffer into HAM8.
https://github.com/Kalmalyzer/kalms-c2p

So far I had success with using "normal" c2p and "bitmap" c2p when converting u_int8 buffer to AGA.

Unfortunately I didn't manage to get c2p ham8 working, can you help me with that? I would like to use: c2p_4rgb888_4rgb666h8_040.s

The summary of my tries so far:

01. First I declared the functions:

Code:

extern void c2p_4rgb888_4rgb666h8_040_init(int chunkyx __asm("d0"), int chunkyy __asm("d1"), int scroffsx __asm("d2"),  int scroffsy __asm("d3"),int rowlen  __asm("d4"), int bplsize __asm("d5"), int chunkylen __asm("d6"));
extern void c2p_4rgb888_4rgb666h8_040(void* c2pscreen __asm("a0"), void* bitplanes __asm("a1"));

02. I have my 32bit buffer in FAST MEM that will contain ARGB, and additional buffer in CHIP MEM

Code:

u_int32* fast_buffer32 = (u_int32*)malloc(320*256*32);

ULONG* chip_buffer32 = (ULONG*)AllocMem(bplsize * depth, MEMF_CHIP); // <-- bplsize * depth = (320*256/8) * 8

03. Before entering main loop I make the initializations.
I couldn't use screen.BitMap directly, don't know why, so used
ScreenBuffer instead.

Code:

const int chunkyx = 320;
const int chunkyy = 256;
const int scroffsx = 0;
const int scroffsy = 0;
const int rowlen = 320 * 8 // <----- not sure if this is ok
const int bplsize = 320 * 256 / 8;
const int chunkylen = 320 * 32;
const int depth = 8;

// Not sure if this is correct - it was working for normal c2p.
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[0] = (PLANEPTR)chip_buffer32 + bplsize * 0;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[1] = (PLANEPTR)chip_buffer32 + bplsize * 1;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[2] = (PLANEPTR)chip_buffer32 + bplsize * 2;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[3] = (PLANEPTR)chip_buffer32 + bplsize * 3;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[4] = (PLANEPTR)chip_buffer32 + bplsize * 4;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[5] = (PLANEPTR)chip_buffer32 + bplsize * 5;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[6] = (PLANEPTR)chip_buffer32 + bplsize * 6;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[7] = (PLANEPTR)chip_buffer32 + bplsize * 7;

 ChangeScreenBuffer(FRM_screen, FRM_mbuf_screen_buffer[1]);

// init c2p ham8
c2p_4rgb888_4rgb666h8_040_init(chunkyx, chunkyy, scroffsx, scroffsy, rowlen, bplsize, chunkylen);

04. In main loop after my buffer32 is filled with ARGB values I use conversion:

Code:

c2p_4rgb888_4rgb666h8_040(fast_buffer32 , chip_buffer32 );

05. Also I am not sure what Screen should I open for HAM8? I am using the requester to get Display_ID of PAL: 320x256
and then I am opening the screen using OpenScreenTags()

Code:

FRM_screen = OpenScreenTags( NULL,
                                               SA_DisplayID, FRM_requested_display_id,
                                               SA_Depth, 8, 
                                               SA_Type, CUSTOMSCREEN,
                                               SA_Quiet, TRUE,
                                               SA_Behind, TRUE,
                                               SA_ShowTitle, FALSE, 
                                               SA_Draggable, FALSE,
                                               SA_Exclusive, TRUE,
                                               SA_AutoScroll, FALSE,
                                               //  SA_Interleaved, TRUE,
                                               TAG_END);

Thanks in advance for any HELP

I noticed that some people here were using that c2p HAM8 routines..

smack · 16 September 2022, 22:45

Your parameter values for the function call seem to be wrong.

; d4.l rowlen [bytes] -- offset between one row and the next in a bpl
; d5.l bplsize [bytes] -- offset between one row in one bpl and the next bpl
; d6.l chunkylen [bytes] -- offset between one row and the next in chunkybuf

I think the values should be:

const int rowlen = 320 / 8; // 8 pixels per byte in a bitplane
const int bplsize = 320 * 256 / 8;
const int chunkylen = 320 * 4; // 4 bytes per pixel in chunky ARGB

Your malloc of the fast mem buffer is off by a factor of 8, as well. (be careful with the bits vs. bytes)

mateusz_s · 17 September 2022, 18:13

Quote:

Originally Posted by smack

Your parameter values for the function call seem to be wrong.

; d4.l rowlen [bytes] -- offset between one row and the next in a bpl
; d5.l bplsize [bytes] -- offset between one row in one bpl and the next bpl
; d6.l chunkylen [bytes] -- offset between one row and the next in chunkybuf

I think the values should be:

const int rowlen = 320 / 8; // 8 pixels per byte in a bitplane
const int bplsize = 320 * 256 / 8;
const int chunkylen = 320 * 4; // 4 bytes per pixel in chunky ARGB

Your malloc of the fast mem buffer is off by a factor of 8, as well. (be careful with the bits vs. bytes)

Hi, thanks for help.. yes my bad

I made the following corrections:

Code:

extern void c2p_4rgb888_4rgb666h8_040_init(int chunkyx __asm("d0"), int chunkyy __asm("d1"), int scroffsx __asm("d2"),  int scroffsy __asm("d3"),int rowlen  __asm("d4"), int bplsize __asm("d5"), int chunkylen __asm("d6"));
extern void c2p_4rgb888_4rgb666h8_040(void* c2pscreen __asm("a0"), void* bitplanes __asm("a1"));

const int chunkyx = 320;
const int chunkyy = 256;
const int scroffsx = 0;
const int scroffsy = 0;
const int rowlen = 320 / 8;
const int bplsize = 320 * 256 / 8;
const int chunkylen = 320 * 4;
const int depth = 8;

IO_prefs.output_buffer_32 = (u_int32*)malloc(FRM_requested_width * FRM_requested_height * 4);
UBYTE* chip_buffer = (UBYTE*)AllocMem(bplsize * depth, MEMF_CHIP);

FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[0] = (PLANEPTR)chip_buffer + bplsize * 0;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[1] = (PLANEPTR)chip_buffer + bplsize * 1;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[2] = (PLANEPTR)chip_buffer + bplsize * 2;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[3] = (PLANEPTR)chip_buffer + bplsize * 3;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[4] = (PLANEPTR)chip_buffer + bplsize * 4;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[5] = (PLANEPTR)chip_buffer + bplsize * 5;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[6] = (PLANEPTR)chip_buffer + bplsize * 6;
FRM_mbuf_screen_buffer[1]->sb_BitMap->Planes[7] = (PLANEPTR)chip_buffer + bplsize * 7;

ChangeScreenBuffer(FRM_screen, FRM_mbuf_screen_buffer[1]);

c2p_4rgb888_4rgb666h8_040_init(chunkyx, chunkyy, scroffsx, scroffsy, rowlen, bplsize, chunkylen);

// in loop
c2p_4rgb888_4rgb666h8_040(IO_prefs.output_buffer_32, chip_buffer);

Now the result in "much better" I mean the image is no shattered, but it seems stretched.. and what about colors?

It looks like this:
(on the left the image as generated in original 32bits, on the right after ham8 c2p conversion)

in superHires

ps. I think the HAM8 should be turned on some how, I don't know how to do this using system library in C.
http://amigadev.elowar.com/read/ADCD.../node0022.html

smack · 17 September 2022, 21:26

Two things seem to be missing.

First, the HAM control bitplanes must be filled with the special "RGBB" pattern.

see comment in the c2p routine:
; Bitplane data for control bitplane 0: $77777777
; Bitplane data for control bitplane 1: $cccccccc

This means that Planes[6] must filled with 0x77 bytes and Planes[7] with 0xcc bytes.

Second, open a HAM8 screen using OpenScreenTags:
SA_DisplayID, FRM_requested_display_id | HAM_KEY

smack · 17 September 2022, 21:38

Oh I noticed that the sizes of the bitplanes are wrong!

This kind of HAM8 conversion uses 4 super-hires screen pixels for each source RGB pixel.

It means that you have to use:

const int rowlen = 320 * 4 / 8;
const int bplsize = rowlen * 256;

And of course you have to open a Super-HiRes screen with 1280x256 pixels.

mateusz_s · 17 September 2022, 21:59

Quote:

Originally Posted by smack

Two things seem to be missing.

First, the HAM control bitplanes must be filled with the special "RGBB" pattern.

see comment in the c2p routine:
; Bitplane data for control bitplane 0: $77777777
; Bitplane data for control bitplane 1: $cccccccc

This means that Planes[6] must filled with 0x77 bytes and Planes[7] with 0xcc bytes.

Second, open a HAM8 screen using OpenScreenTags:
SA_DisplayID, FRM_requested_display_id | HAM_KEY

Thanks again.. its working.. the colors are ok..
(image is still streched)

smack · 17 September 2022, 22:15

Quote:

Originally Posted by mateusz_s

(image is still streched)

Yes, look at my previous comment - you have to use 1280 pixels wide bitplanes and screen.

mateusz_s · 17 September 2022, 22:29

Quote:

Originally Posted by smack

Yes, look at my previous comment - you have to use 1280 pixels wide bitplanes and screen.

Yes, thanks again

.. I messed up with cp2 init parameters (they were set to screen size not chunky size)

here is the final result - over 10 000 colors

mateusz_s · 17 September 2022, 23:35

The final result.. V1200 accelerated

AGA, HAM8 mode - approx. over 10 000 colors

[ Show youtube player ]

Karlos · 18 September 2022, 02:14

Neat!

mateusz_s · 10 March 2023, 17:48

Hi,
I got another question.

So everything works fine when the sourcev32bit buffer
Has width 320px and then I open wb screen at 1280x256.

So the screen width is 4 times wider.

But I wanted to open 640x256 on wb instead for the better performance.
So I made my source 32bit buffer at 160x256.
The speed is great and the image quality Nice.
But the image is streched a bit unfortunately.

So is it possible to display it correctly?
Is this just how Kalms function works?

alexh · 10 March 2023, 18:20

Nice.

I saw a HAM6 C2P video mode used in AmiQuake the other day. How does this work and not visibly display HAM fringing?

mateusz_s · 12 March 2023, 15:47

[ Show youtube player ]

This is quake running On a500 native ham6 mode.

I got similar speed in 640x256 wb ham8 screen on a1200. (~30fps). The source 32bit buffer was 160x256.

But the screen was bit streched.
In this quake a500 example the screen look right.

mateusz_s · 12 March 2023, 18:42

https://streamable.com/e9oz5x

New test
The source 32bit buffer is 160x256.

Converted on fly into amiga 1200 ham8 mode on 640x256 screen. Double buffered. Keeps 25fps.

Everything is cool, only the result is a bit weird streched.

smack · 12 March 2023, 21:06

Quote:

Originally Posted by mateusz_s

The source 32bit buffer is 160x256.

Have you checked that your raycasting engine can properly render such non-square pixels? I mean, maybe the problem is that the image in the source buffer is already horizontally stretched?

mateusz_s · 12 March 2023, 22:56

Quote:

Originally Posted by smack

Have you checked that your raycasting engine can properly render such non-square pixels? I mean, maybe the problem is that the image in the source buffer is already horizontally stretched?

The image in original buffer is ok, i mean nothing is streched. But yeah, i guess i would need to render it
Narrow to have correct display.. or maybe skip every second pixel in x to make it narrow. It probably would be in correct aspekt after transforming to HAM

I guess i leave it for now.

mateusz_s · 20 April 2024, 19:15

Hi,
I got two more questions for usage of the Kalms c2p routines...

1. Kalms c2p routines takes couple of parameters for init and for converting -
so they can be used in all cases.
My question is, if I know exactly that I will use them for 320x256x8 screen and
know all the values - does it make sense to put that static numbers into asm code -
so the routine will work a bit faster?
For example lets take this one: https://github.com/Kalmalyzer/kalms-...1x1_8_c5_040.s

2. Do you know what that postfixes mean? "_c5", "_c3b1", "_c4b1" ?? (cpu or cpu+blitter ?)
For exampe here: c2p1x1_8_c5_040.s, c2p1x1_8_c3b1_030.h, c2p2x1_6_c4b1_gen.h
https://github.com/Kalmalyzer/kalms-...ee/main/normal

a/b · 20 April 2024, 19:36

1. Probably. Many instructions take only 1 cycle (best case, i-cached) on 040, even add.l #data32,d0 is as fast as add.l d1,d0. And 040 is alergic to pc-relative addressing modes, often times it takes an extra cycle. Try and see.
2. cX = how many cpu passes, bX = how many blitter passes

mateusz_s · 20 April 2024, 20:15

Quote:

Originally Posted by a/b

1. Probably. Many instructions take only 1 cycle (best case, i-cached) on 040, even add.l #data32,d0 is as fast as add.l d1,d0. And 040 is alergic to pc-relative addressing modes, often times it takes an extra cycle. Try and see.
2. cX = how many cpu passes, bX = how many blitter passes

Thank you for the answer

mateusz_s · 20 April 2024, 20:19

ps.
I also wanted updated this thread and show my final results of converting real RGBA 32-bit display into HAM8 AGA display..
in this final case - there was no dunamic shading like in the previous one - the shadows are lightmaped
- and the HAM8 result after conversin is much more better:

[ Show youtube player ]

17 September 2022, 21:38	#5
smack Registered User Join Date: May 2020 Location: Germany Posts: 20	Oh I noticed that the sizes of the bitplanes are wrong! This kind of HAM8 conversion uses 4 super-hires screen pixels for each source RGB pixel. It means that you have to use: const int rowlen = 320 * 4 / 8; const int bplsize = rowlen * 256; And of course you have to open a Super-HiRes screen with 1280x256 pixels. Last edited by smack; 17 September 2022 at 22:07.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Using DOS routines after a system take over	majikeyric	Coders. General	45	23 July 2023 10:11
Help with CanDo routines	fstltna	support.Apps	2	10 March 2023 19:18
Brightness on HAM8 C2P conversion	neoman	Coders. Asm / Hardware	3	18 January 2021 09:35
decompression routines	Toni Wilen	Coders. General	12	17 May 2017 00:30
HAM8 C2P Hacking	NovaCoder	Coders. General	2	25 March 2010 10:37

16 September 2022, 22:45	#2
smack Registered User Join Date: May 2020 Location: Germany Posts: 20	Your parameter values for the function call seem to be wrong. ; d4.l rowlen [bytes] -- offset between one row and the next in a bpl ; d5.l bplsize [bytes] -- offset between one row in one bpl and the next bpl ; d6.l chunkylen [bytes] -- offset between one row and the next in chunkybuf I think the values should be: const int rowlen = 320 / 8; // 8 pixels per byte in a bitplane const int bplsize = 320 * 256 / 8; const int chunkylen = 320 * 4; // 4 bytes per pixel in chunky ARGB Your malloc of the fast mem buffer is off by a factor of 8, as well. (be careful with the bits vs. bytes)

17 September 2022, 21:26	#4
smack Registered User Join Date: May 2020 Location: Germany Posts: 20	Two things seem to be missing. First, the HAM control bitplanes must be filled with the special "RGBB" pattern. see comment in the c2p routine: ; Bitplane data for control bitplane 0: $77777777 ; Bitplane data for control bitplane 1: $cccccccc This means that Planes[6] must filled with 0x77 bytes and Planes[7] with 0xcc bytes. Second, open a HAM8 screen using OpenScreenTags: SA_DisplayID, FRM_requested_display_id \| HAM_KEY

17 September 2022, 23:35	#9
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	The final result.. V1200 accelerated AGA, HAM8 mode - approx. over 10 000 colors [ Show youtube player ]

18 September 2022, 02:14	#10
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,147	Neat!

10 March 2023, 17:48	#11
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	Hi, I got another question. So everything works fine when the sourcev32bit buffer Has width 320px and then I open wb screen at 1280x256. So the screen width is 4 times wider. But I wanted to open 640x256 on wb instead for the better performance. So I made my source 32bit buffer at 160x256. The speed is great and the image quality Nice. But the image is streched a bit unfortunately. So is it possible to display it correctly? Is this just how Kalms function works?

10 March 2023, 18:20	#12
alexh Thalion Webshrine Join Date: Jan 2004 Location: Oxford Posts: 14,343	Nice. I saw a HAM6 C2P video mode used in AmiQuake the other day. How does this work and not visibly display HAM fringing?

12 March 2023, 15:47	#13
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	[ Show youtube player ] This is quake running On a500 native ham6 mode. I got similar speed in 640x256 wb ham8 screen on a1200. (~30fps). The source 32bit buffer was 160x256. But the screen was bit streched. In this quake a500 example the screen look right.

12 March 2023, 18:42	#14
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	https://streamable.com/e9oz5x New test The source 32bit buffer is 160x256. Converted on fly into amiga 1200 ham8 mode on 640x256 screen. Double buffered. Keeps 25fps. Everything is cool, only the result is a bit weird streched.

20 April 2024, 19:15	#17
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	Hi, I got two more questions for usage of the Kalms c2p routines... 1. Kalms c2p routines takes couple of parameters for init and for converting - so they can be used in all cases. My question is, if I know exactly that I will use them for 320x256x8 screen and know all the values - does it make sense to put that static numbers into asm code - so the routine will work a bit faster? For example lets take this one: https://github.com/Kalmalyzer/kalms-...1x1_8_c5_040.s 2. Do you know what that postfixes mean? "_c5", "_c3b1", "_c4b1" ?? (cpu or cpu+blitter ?) For exampe here: c2p1x1_8_c5_040.s, c2p1x1_8_c3b1_030.h, c2p2x1_6_c4b1_gen.h https://github.com/Kalmalyzer/kalms-...ee/main/normal

20 April 2024, 19:36	#18
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,039	1. Probably. Many instructions take only 1 cycle (best case, i-cached) on 040, even add.l #data32,d0 is as fast as add.l d1,d0. And 040 is alergic to pc-relative addressing modes, often times it takes an extra cycle. Try and see. 2. cX = how many cpu passes, bX = how many blitter passes

20 April 2024, 20:19	#20
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	ps. I also wanted updated this thread and show my final results of converting real RGBA 32-bit display into HAM8 AGA display.. in this final case - there was no dunamic shading like in the previous one - the shadows are lightmaped - and the HAM8 result after conversin is much more better: [ Show youtube player ]

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)