10 April 2021, 17:01 | #1 |
Registered User
Join Date: Jan 2020
Location: Poland
Posts: 181
|
[ASM, C, general] Is it possible to speedup RGBA operations stored in int32 ?
Hi,
My color buffer is type of u_int32, RGBA, of course A it not used. Is it possible to somehow speed up operations on colors without peeling out separate channels R G B and shifting them back? The most important thing for me is to calculate the average of two such colors. Until now, I did this if I wanted to change the intensity of the entire pixel: Code:
output_buffer_32[index] = intensity_premultipled[texture_pixel >> 16 & 0x0ff] << RC_ch1 | intensity_premultipled[texture_pixel >> 8 & 0x0ff] << RC_ch2 | intensity_premultipled[texture_pixel & 0x0ff] << RC_ch3; So, first I picked out each RGB separately and then put it together again by making shifts (where the shifts were as RC_ch1, 2,3 variables so that the colors always match depending on the pixel format, and set them beforehand) Now, instead of one LUT, I made 3 LUT, which on the output already have the color shifted to the right place, so I got rid of the offset and the efficiency increased significantly: Code:
output_buffer_32[index] = intensity_premultipled_CH1[texture_pixel >> 16 & 0x0ff] | intensity_premultipled_CH2[texture_pixel >> 8 & 0x0ff] | intensity_premultipled_CH3[texture_pixel & 0x0ff]; But I would like to achieve following goal. My color buffer would be filled every second pixel - after the first pass (these are the pixels obtained as a result of the calculation). Finally, I would like to make a second pass that would interpolate between the existing colors. Maybe It would make sense if the interpolation operation could be done somehow quickly. Maybe there are some assembly language tricks? Below, schematically what is going on: (see the attachment) ps. changing buffer fron u_int32 to union with seperately char r,g,b,a doesn't change much... |
10 April 2021, 17:44 | #2 |
Registered User
Join Date: Jan 2020
Location: Poland
Posts: 181
|
Found something here, not tested yet, meybe it will help:
https://stackoverflow.com/questions/...nto-an-integer |
13 April 2021, 01:25 | #3 |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
There is no need to do any ASM/bitshift tricks. This is what I do:
You simply interpolate the two RGBA values: Color1 + (Color2-Color1)/2 Color1: (2,4,8) = 132,104 Color2: (4,8,16)= 264,208 If you manually extracted all RGB triplets, the interpolated color would be (3,6,12) = 198,156 Now use the formula above: 132,104 + (264,208 - 132,104)/2 = 132,104 + 66,052 = 198,156 Once you load both RGBA into registers (d1:Color1, d2:Color2), you only need these 3 ops: subtract, bitshift, add Code:
d1:Color1 d2:Color2 sub.l d1,d2 lsr.l #1,d2 add.l d2,d1 |
13 April 2021, 03:47 | #4 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
($040810-$020406)/2+$020408 = $03060c
($040810-$020506)/2+$020508 = $03068c You have to make the color components even if you are going to use that trick. |
13 April 2021, 04:20 | #5 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
Besides, even if you extract the separate components, a bitshift of an odd number will never produce completely correct result, as you can't have 0.5 without fixed-point/floating-point math. And if I recall correctly, a floating-point framebuffer didn't run well even on a GeForce 5/6 - it took some serious horsepower to do that, let alone Amiga... |
|
13 April 2021, 04:31 | #6 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 387
|
I think you are missing the point, if the Red or Blue difference is odd then when you divide by two it'll push a one into the upper bit of the neighboring color component, which you definitely will notice!
Here is A/B's example, fixed the math and also showed it as color components: ($040810-$020406)/2+$020408 = $03060d (Red=3,Green=6,Blue=13) ($040810-$020506)/2+$020508 = $03068d (Red=3,Green=6,Blue=141) << Bad! I would just do this instead: result = ((color1 & 0xfefefe) + (color2 & 0xfefefe)) >> 1; Not sure it can get much simpler. Edit: of course the question was posed for RGBA with A being unused so really the mask should be 0xfefefe00. Edit: also this is just averaging, the question is about generic interpolation which is different and harder. Last edited by Jobbo; 13 April 2021 at 04:39. |
13 April 2021, 05:00 | #7 |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
I just tried my function on more generic image and yeah, artifacts popped up, compared to previous test scene.
I guess my framebuffer was too low-frequency to notice it before |
13 April 2021, 05:19 | #8 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 387
|
For generic interpolation I don't think it's really practical do work on more than one channel at a time, so it would probably be best to use a byte array and do something like:
result[0] = color1[0] + (color2[0] - color1[0]) * blend; result[1] = color1[1] + (color2[1] - color1[1]) * blend; result[2] = color1[2] + (color2[2] - color1[2]) * blend; Of course the multiplies will be slow so you can make a table: result[0] = color1[0] + blendTable[color2[0] - color1[0]]; result[1] = color1[1] + blendTable[color2[1] - color1[1]]; result[2] = color1[2] + blendTable[color2[2] - color1[2]]; Just setup the table before each image with the correct blend multiplied by 0 through 255. |
13 April 2021, 06:57 | #9 |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
This is a good thread I have just created a more generic version of my FrameBuffer blur without artifacts that I will use for a high-frequency FB.
On V4, it should take about 2.9 frames to do a 2x2 blur at 640x400, so that's still very useable for Menus and messages. 2x1 is just about 1.6 frames. 320x200 would then be around 0.4 - that's basically almost real-time And that's without unrolling and chaining, which should knock off additional 15-20%... |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Logging Blitter Operations? | mcgeezer | support.WinUAE | 2 | 02 June 2019 10:23 |
Load extended precision with .d operations | phx | Coders. Asm / Hardware | 6 | 07 May 2018 13:52 |
Atomic operations between CPU & Blitter | leonard | Coders. Asm / Hardware | 17 | 02 May 2018 04:43 |
ASM newbie, general help | fstarred | Coders. Asm / Hardware | 23 | 22 April 2018 11:17 |
General asm question | Haakon | Coders. General | 14 | 15 February 2006 21:42 |
|
|