[ASM, C, general] Is it possible to speedup RGBA operations stored in int32 ?

mateusz_s · 10 April 2021, 17:01

Hi,
My color buffer is type of u_int32, RGBA, of course A it not used.
Is it possible to somehow speed up operations on colors without peeling out separate channels R G B and shifting them back?

The most important thing for me is to calculate the average of two such colors.

Until now, I did this if I wanted to change the intensity of the entire pixel:

Code:

output_buffer_32[index] =   
                         intensity_premultipled[texture_pixel >> 16 & 0x0ff]  << RC_ch1 |
                         intensity_premultipled[texture_pixel >>  8 & 0x0ff]   << RC_ch2 |
                         intensity_premultipled[texture_pixel          & 0x0ff]   << RC_ch3;

So, first I picked out each RGB separately and then put it together again by making shifts (where the shifts were as RC_ch1, 2,3 variables so that the colors always match depending on the pixel format, and set them beforehand)

Now, instead of one LUT, I made 3 LUT, which on the output already have the color shifted to the right place, so I got rid of the offset and the efficiency increased significantly:

Code:

output_buffer_32[index] = intensity_premultipled_CH1[texture_pixel >> 16 & 0x0ff] |
                                        intensity_premultipled_CH2[texture_pixel >>  8 & 0x0ff] |
                                        intensity_premultipled_CH3[texture_pixel       & 0x0ff];

But I would like to achieve following goal. My color buffer would be filled every second pixel - after the first pass (these are the pixels obtained as a result of the calculation). Finally, I would like to make a second pass that would interpolate between the existing colors. Maybe It would make sense if the interpolation operation could be done somehow quickly. Maybe there are some assembly language tricks? Below, schematically what is going on: (see the attachment)

ps. changing buffer fron u_int32 to union with seperately char r,g,b,a doesn't change much...

mateusz_s · 10 April 2021, 17:44

Found something here, not tested yet, meybe it will help:
https://stackoverflow.com/questions/...nto-an-integer

VladR · 13 April 2021, 01:25

There is no need to do any ASM/bitshift tricks. This is what I do:

You simply interpolate the two RGBA values: Color1 + (Color2-Color1)/2

Color1: (2,4,8) = 132,104
Color2: (4,8,16)= 264,208

If you manually extracted all RGB triplets, the interpolated color would be (3,6,12) = 198,156

Now use the formula above: 132,104 + (264,208 - 132,104)/2 = 132,104 + 66,052 = 198,156

Once you load both RGBA into registers (d1:Color1, d2:Color2), you only need these 3 ops: subtract, bitshift, add

Code:

d1:Color1
d2:Color2


sub.l d1,d2
lsr.l #1,d2
add.l d2,d1

Very fast

a/b · 13 April 2021, 03:47

($040810-$020406)/2+$020408 = $03060c
($040810-$020506)/2+$020508 = $03068c

You have to make the color components even if you are going to use that trick.

VladR · 13 April 2021, 04:20

Quote:

Originally Posted by a/b

($040810-$020406)/2+$020408 = $03060c
($040810-$020506)/2+$020508 = $03068c

You have to make the color components even if you are going to use that trick.

You are not going to see that difference in a generic picture in motion.

Besides, even if you extract the separate components, a bitshift of an odd number will never produce completely correct result, as you can't have 0.5 without fixed-point/floating-point math.

And if I recall correctly, a floating-point framebuffer didn't run well even on a GeForce 5/6 - it took some serious horsepower to do that, let alone Amiga...

Jobbo · 13 April 2021, 04:31

I think you are missing the point, if the Red or Blue difference is odd then when you divide by two it'll push a one into the upper bit of the neighboring color component, which you definitely will notice!

Here is A/B's example, fixed the math and also showed it as color components:

($040810-$020406)/2+$020408 = $03060d (Red=3,Green=6,Blue=13)
($040810-$020506)/2+$020508 = $03068d (Red=3,Green=6,Blue=141) << Bad!

I would just do this instead:

result = ((color1 & 0xfefefe) + (color2 & 0xfefefe)) >> 1;

Not sure it can get much simpler.

Edit: of course the question was posed for RGBA with A being unused so really the mask should be 0xfefefe00.
Edit: also this is just averaging, the question is about generic interpolation which is different and harder.

VladR · 13 April 2021, 05:00

I just tried my function on more generic image and yeah, artifacts popped up, compared to previous test scene.

I guess my framebuffer was too low-frequency to notice it before

Jobbo · 13 April 2021, 05:19

For generic interpolation I don't think it's really practical do work on more than one channel at a time, so it would probably be best to use a byte array and do something like:

result[0] = color1[0] + (color2[0] - color1[0]) * blend;
result[1] = color1[1] + (color2[1] - color1[1]) * blend;
result[2] = color1[2] + (color2[2] - color1[2]) * blend;

Of course the multiplies will be slow so you can make a table:

result[0] = color1[0] + blendTable[color2[0] - color1[0]];
result[1] = color1[1] + blendTable[color2[1] - color1[1]];
result[2] = color1[2] + blendTable[color2[2] - color1[2]];

Just setup the table before each image with the correct blend multiplied by 0 through 255.

VladR · 13 April 2021, 06:57

This is a good thread

I have just created a more generic version of my FrameBuffer blur without artifacts that I will use for a high-frequency FB.

On V4, it should take about 2.9 frames to do a 2x2 blur at 640x400, so that's still very useable for Menus and messages.
2x1 is just about 1.6 frames. 320x200 would then be around 0.4 - that's basically almost real-time

And that's without unrolling and chaining, which should knock off additional 15-20%...

13 April 2021, 01:25	#3
VladR Registered User Join Date: Dec 2019 Location: North Dakota Posts: 741	There is no need to do any ASM/bitshift tricks. This is what I do: You simply interpolate the two RGBA values: Color1 + (Color2-Color1)/2 Color1: (2,4,8) = 132,104 Color2: (4,8,16)= 264,208 If you manually extracted all RGB triplets, the interpolated color would be (3,6,12) = 198,156 Now use the formula above: 132,104 + (264,208 - 132,104)/2 = 132,104 + 66,052 = 198,156 Once you load both RGBA into registers (d1:Color1, d2:Color2), you only need these 3 ops: subtract, bitshift, add Code: d1:Color1 d2:Color2 sub.l d1,d2 lsr.l #1,d2 add.l d2,d1 Very fast

13 April 2021, 04:31	#6
Jobbo Registered User Join Date: Jun 2020 Location: Druidia Posts: 387	I think you are missing the point, if the Red or Blue difference is odd then when you divide by two it'll push a one into the upper bit of the neighboring color component, which you definitely will notice! Here is A/B's example, fixed the math and also showed it as color components: ($040810-$020406)/2+$020408 = $03060d (Red=3,Green=6,Blue=13) ($040810-$020506)/2+$020508 = $03068d (Red=3,Green=6,Blue=141) << Bad! I would just do this instead: result = ((color1 & 0xfefefe) + (color2 & 0xfefefe)) >> 1; Not sure it can get much simpler. Edit: of course the question was posed for RGBA with A being unused so really the mask should be 0xfefefe00. Edit: also this is just averaging, the question is about generic interpolation which is different and harder. Last edited by Jobbo; 13 April 2021 at 04:39.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Logging Blitter Operations?	mcgeezer	support.WinUAE	2	02 June 2019 10:23
Load extended precision with .d operations	phx	Coders. Asm / Hardware	6	07 May 2018 13:52
Atomic operations between CPU & Blitter	leonard	Coders. Asm / Hardware	17	02 May 2018 04:43
ASM newbie, general help	fstarred	Coders. Asm / Hardware	23	22 April 2018 11:17
General asm question	Haakon	Coders. General	14	15 February 2006 21:42

10 April 2021, 17:01	#1
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	[ASM, C, general] Is it possible to speedup RGBA operations stored in int32 ? Hi, My color buffer is type of u_int32, RGBA, of course A it not used. Is it possible to somehow speed up operations on colors without peeling out separate channels R G B and shifting them back? The most important thing for me is to calculate the average of two such colors. Until now, I did this if I wanted to change the intensity of the entire pixel: Code: output_buffer_32[index] = intensity_premultipled[texture_pixel >> 16 & 0x0ff] << RC_ch1 \| intensity_premultipled[texture_pixel >> 8 & 0x0ff] << RC_ch2 \| intensity_premultipled[texture_pixel & 0x0ff] << RC_ch3; So, first I picked out each RGB separately and then put it together again by making shifts (where the shifts were as RC_ch1, 2,3 variables so that the colors always match depending on the pixel format, and set them beforehand) Now, instead of one LUT, I made 3 LUT, which on the output already have the color shifted to the right place, so I got rid of the offset and the efficiency increased significantly: Code: output_buffer_32[index] = intensity_premultipled_CH1[texture_pixel >> 16 & 0x0ff] \| intensity_premultipled_CH2[texture_pixel >> 8 & 0x0ff] \| intensity_premultipled_CH3[texture_pixel & 0x0ff]; But I would like to achieve following goal. My color buffer would be filled every second pixel - after the first pass (these are the pixels obtained as a result of the calculation). Finally, I would like to make a second pass that would interpolate between the existing colors. Maybe It would make sense if the interpolation operation could be done somehow quickly. Maybe there are some assembly language tricks? Below, schematically what is going on: (see the attachment) ps. changing buffer fron u_int32 to union with seperately char r,g,b,a doesn't change much... Attached Thumbnails

10 April 2021, 17:44	#2
mateusz_s Registered User Join Date: Jan 2020 Location: Poland Posts: 181	Found something here, not tested yet, meybe it will help: https://stackoverflow.com/questions/...nto-an-integer

13 April 2021, 03:47	#4
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,039	($040810-$020406)/2+$020408 = $03060c ($040810-$020506)/2+$020508 = $03068c You have to make the color components even if you are going to use that trick.

13 April 2021, 05:00	#7
VladR Registered User Join Date: Dec 2019 Location: North Dakota Posts: 741	I just tried my function on more generic image and yeah, artifacts popped up, compared to previous test scene. I guess my framebuffer was too low-frequency to notice it before

13 April 2021, 05:19	#8
Jobbo Registered User Join Date: Jun 2020 Location: Druidia Posts: 387	For generic interpolation I don't think it's really practical do work on more than one channel at a time, so it would probably be best to use a byte array and do something like: result[0] = color1[0] + (color2[0] - color1[0]) * blend; result[1] = color1[1] + (color2[1] - color1[1]) * blend; result[2] = color1[2] + (color2[2] - color1[2]) * blend; Of course the multiplies will be slow so you can make a table: result[0] = color1[0] + blendTable[color2[0] - color1[0]]; result[1] = color1[1] + blendTable[color2[1] - color1[1]]; result[2] = color1[2] + blendTable[color2[2] - color1[2]]; Just setup the table before each image with the correct blend multiplied by 0 through 255.

13 April 2021, 06:57	#9
VladR Registered User Join Date: Dec 2019 Location: North Dakota Posts: 741	This is a good thread I have just created a more generic version of my FrameBuffer blur without artifacts that I will use for a high-frequency FB. On V4, it should take about 2.9 frames to do a 2x2 blur at 640x400, so that's still very useable for Menus and messages. 2x1 is just about 1.6 frames. 320x200 would then be around 0.4 - that's basically almost real-time And that's without unrolling and chaining, which should knock off additional 15-20%...

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)