English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 10 April 2021, 17:01   #1
mateusz_s
Registered User
 
Join Date: Jan 2020
Location: Poland
Posts: 181
[ASM, C, general] Is it possible to speedup RGBA operations stored in int32 ?

Hi,
My color buffer is type of u_int32, RGBA, of course A it not used.
Is it possible to somehow speed up operations on colors without peeling out separate channels R G B and shifting them back?

The most important thing for me is to calculate the average of two such colors.

Until now, I did this if I wanted to change the intensity of the entire pixel:

Code:
output_buffer_32[index] =   
                         intensity_premultipled[texture_pixel >> 16 & 0x0ff]  << RC_ch1 |
                         intensity_premultipled[texture_pixel >>  8 & 0x0ff]   << RC_ch2 |
                         intensity_premultipled[texture_pixel          & 0x0ff]   << RC_ch3;

So, first I picked out each RGB separately and then put it together again by making shifts (where the shifts were as RC_ch1, 2,3 variables so that the colors always match depending on the pixel format, and set them beforehand)

Now, instead of one LUT, I made 3 LUT, which on the output already have the color shifted to the right place, so I got rid of the offset and the efficiency increased significantly:

Code:
output_buffer_32[index] = intensity_premultipled_CH1[texture_pixel >> 16 & 0x0ff] |
                                        intensity_premultipled_CH2[texture_pixel >>  8 & 0x0ff] |
                                        intensity_premultipled_CH3[texture_pixel       & 0x0ff];

But I would like to achieve following goal. My color buffer would be filled every second pixel - after the first pass (these are the pixels obtained as a result of the calculation). Finally, I would like to make a second pass that would interpolate between the existing colors. Maybe It would make sense if the interpolation operation could be done somehow quickly. Maybe there are some assembly language tricks? Below, schematically what is going on: (see the attachment)

ps. changing buffer fron u_int32 to union with seperately char r,g,b,a doesn't change much...
Attached Thumbnails
Click image for larger version

Name:	ppix.jpg
Views:	73
Size:	25.3 KB
ID:	71546  
mateusz_s is offline  
Old 10 April 2021, 17:44   #2
mateusz_s
Registered User
 
Join Date: Jan 2020
Location: Poland
Posts: 181
Found something here, not tested yet, meybe it will help:
https://stackoverflow.com/questions/...nto-an-integer
mateusz_s is offline  
Old 13 April 2021, 01:25   #3
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
There is no need to do any ASM/bitshift tricks. This is what I do:

You simply interpolate the two RGBA values: Color1 + (Color2-Color1)/2

Color1: (2,4,8) = 132,104
Color2: (4,8,16)= 264,208

If you manually extracted all RGB triplets, the interpolated color would be (3,6,12) = 198,156

Now use the formula above: 132,104 + (264,208 - 132,104)/2 = 132,104 + 66,052 = 198,156

Once you load both RGBA into registers (d1:Color1, d2:Color2), you only need these 3 ops: subtract, bitshift, add

Code:
d1:Color1
d2:Color2


sub.l d1,d2
lsr.l #1,d2
add.l d2,d1
Very fast
VladR is offline  
Old 13 April 2021, 03:47   #4
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
($040810-$020406)/2+$020408 = $03060c
($040810-$020506)/2+$020508 = $03068c

You have to make the color components even if you are going to use that trick.
a/b is offline  
Old 13 April 2021, 04:20   #5
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by a/b View Post
($040810-$020406)/2+$020408 = $03060c
($040810-$020506)/2+$020508 = $03068c

You have to make the color components even if you are going to use that trick.
You are not going to see that difference in a generic picture in motion.

Besides, even if you extract the separate components, a bitshift of an odd number will never produce completely correct result, as you can't have 0.5 without fixed-point/floating-point math.

And if I recall correctly, a floating-point framebuffer didn't run well even on a GeForce 5/6 - it took some serious horsepower to do that, let alone Amiga...
VladR is offline  
Old 13 April 2021, 04:31   #6
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 387
I think you are missing the point, if the Red or Blue difference is odd then when you divide by two it'll push a one into the upper bit of the neighboring color component, which you definitely will notice!

Here is A/B's example, fixed the math and also showed it as color components:

($040810-$020406)/2+$020408 = $03060d (Red=3,Green=6,Blue=13)
($040810-$020506)/2+$020508 = $03068d (Red=3,Green=6,Blue=141) << Bad!

I would just do this instead:

result = ((color1 & 0xfefefe) + (color2 & 0xfefefe)) >> 1;

Not sure it can get much simpler.

Edit: of course the question was posed for RGBA with A being unused so really the mask should be 0xfefefe00.
Edit: also this is just averaging, the question is about generic interpolation which is different and harder.

Last edited by Jobbo; 13 April 2021 at 04:39.
Jobbo is offline  
Old 13 April 2021, 05:00   #7
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
I just tried my function on more generic image and yeah, artifacts popped up, compared to previous test scene.

I guess my framebuffer was too low-frequency to notice it before
VladR is offline  
Old 13 April 2021, 05:19   #8
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 387
For generic interpolation I don't think it's really practical do work on more than one channel at a time, so it would probably be best to use a byte array and do something like:

result[0] = color1[0] + (color2[0] - color1[0]) * blend;
result[1] = color1[1] + (color2[1] - color1[1]) * blend;
result[2] = color1[2] + (color2[2] - color1[2]) * blend;

Of course the multiplies will be slow so you can make a table:

result[0] = color1[0] + blendTable[color2[0] - color1[0]];
result[1] = color1[1] + blendTable[color2[1] - color1[1]];
result[2] = color1[2] + blendTable[color2[2] - color1[2]];

Just setup the table before each image with the correct blend multiplied by 0 through 255.
Jobbo is offline  
Old 13 April 2021, 06:57   #9
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
This is a good thread I have just created a more generic version of my FrameBuffer blur without artifacts that I will use for a high-frequency FB.


On V4, it should take about 2.9 frames to do a 2x2 blur at 640x400, so that's still very useable for Menus and messages.
2x1 is just about 1.6 frames. 320x200 would then be around 0.4 - that's basically almost real-time

And that's without unrolling and chaining, which should knock off additional 15-20%...
VladR is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Logging Blitter Operations? mcgeezer support.WinUAE 2 02 June 2019 10:23
Load extended precision with .d operations phx Coders. Asm / Hardware 6 07 May 2018 13:52
Atomic operations between CPU & Blitter leonard Coders. Asm / Hardware 17 02 May 2018 04:43
ASM newbie, general help fstarred Coders. Asm / Hardware 23 22 April 2018 11:17
General asm question Haakon Coders. General 14 15 February 2006 21:42

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:04.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.14328 seconds with 14 queries