English Amiga Board - View Single Post

meynaf · 15 January 2008, 10:58

Quote:

Originally Posted by Thorham

I gave the color space conversion a go, and this is what I came up with

You came up with something very interesting at your first try

Quote:

Originally Posted by Thorham

The code is pretty simple. It uses two multiplication tables each with a subtraction (-128). It then just calculates green first and uses the old Cb and Cr values again by multiplying them by respectively 5 and 2. This gives a good approximation of the values needed to calculate red and blue, and get's rid of some table reading. After that the rgb values just need to be clipped to fit in the range 0-255.

Hmm... I admit I don't like losses... I noted that there is further accuracy loss as compared to the original code, because the adds for green pixels (a*Cb + b*Cr) were done with 32-bit fixed-point values, not bytes.

Quote:

Originally Posted by Thorham

Like the c code, I use a table for this, but it might be faster to use compares instead. I don't know, because I can't test it!

I didn't really test it, but from the timings I know, compares would be slower. Or can you do a range-limit in less than 11 clock cycles ???

Quote:

Originally Posted by Thorham

About the approximations: I've tested these with my YCbCr program in freebasic (damned handy), and the color differences are quite small, meaning the images look great, you literary have to see the original next to the encoded version to see any difference, otherwise you'd think it's the original! I've tested it with a straight gray scale in rgb as well, and there is no difference what so ever. If this code is faster then what you've tried, it's perfect for fast viewing in high quality.

To be acceptable, such a loss must make the thing much faster.
I've counted clock cycles (including pipeline) and you should get something like 123 of them per pixel.
My actual code runs in 125 or so, with full accuracy. Not worth changing.
But you can get rid of 8 of them, going down to 115, if you replace :
- subq/bpl by a dbf (-2)
- move.b (a6,dn.l),dn / move.b dn,(a3)+ by move.b (a6,dn.l),(a3)+ (-2, 3 times)

However it's still not enough IMHO (a bit less than 10%, but much less for the overall speed). A 40% gain could be good though.

Quote:

Originally Posted by Thorham

However, you will have to integrate the code yourself, and although I've tested the approximation, the asm code itself is untested and may contain a bug here and there. Nothing serious, though. Should be easy to fix if there are any. Also due to data format differences the code may not work as is, but I suppose this should still give you a good idea of what can be done.

The data formats are the same, except that you must write red, then green, then blue, not red last.

Here is my code, should you spot something that can be done to accelerate it :

Code:

; parameters :
;    a0=input_buf[0]+input_row
;    a1=input_buf[1]+input_row
;    a2=input_buf[2]+input_row
;    a3=output_buf
;    d7=num_rows
;    d6=cinfo->output_width
ycc_rgb_convert
 subq.w #1,d6
 moveq #0,d0
 move.l #$100,d1     ; with *8, we will go to $800 bytes after the 1st array
 moveq #0,d2         ; (which will make us point on the 2nd array)
.yloop
 move.l d6,d5
 move.l (a0)+,a4
 move.l (a1)+,a5
 move.l (a2)+,a6
 movem.l a0-a3,-(a7)
 lea cxtab,a0
 lea range_limit2+$180,a2
 move.l (a3),a3
.xloop
; inner loop
 move.b (a4)+,d0
 move.b (a5)+,d1
 move.b (a6)+,d2
 lea (a0,d2.w*8),a1
 move.l (a1)+,d4
 move.l (a1),d3
 add.l d0,d3
 move.b (a2,d3.w),(a3)+
 lea (a0,d1.w*8),a1
 add.l (a1)+,d2
 add.l d4,d2
 swap d2
 add.w d0,d2
 move.b (a2,d2.w),(a3)+
 move.l d0,d3
 moveq #0,d2
 add.l (a1),d3
 move.b (a2,d3.w),(a3)+
 dbf d5,.xloop
; end of inner loop
 movem.l (a7)+,a0-a3
 addq.l #4,a3
 subq.l #1,d7
 bne.s .yloop
 rts

Note : I'm using 4 arrays, of which 2 are interleaved, all with the same pointer. Of course only the inner loop really has to be optimized.
Other note : this one has been tested and works. But it doesn't give as much gain as we could have expected...

Quote:

Originally Posted by Thorham

Ah, triangular interpolation eh? I'm going to do some yahooing on that one.

I've started this one in asm. Very funny to do (*cough*).

Quote:

Originally Posted by Thorham

Lossless is indeed interesting as an extra option. Trying to predict the data is interesting, too. Hadn't thought of that. I'm going to have to see if I can come up with anything.

Sure, this is an area where little has been done.

Quote:

Originally Posted by Thorham

Yes, it does. And it's indeed compiled as far as I know, with a compiler from 1992... Yep, it doesn't get any slower than that

I'm sure it can be made slower

Quote:

Originally Posted by Thorham

As I think of options, I'll post them here. No problem.

I'm waiting...

Quote:

Originally Posted by Thorham

Skipping just sucks rocks. One way, is to convert the data to rgb while scaling, then just render to ham. Should be ok. Another one is to do the same and count how many times each color is used during scaling. Then quick sort the table. Since the table is max 256 entries, this should be fast. Once that's done, use the 64 most frequent colors as the ham palette. Obviously, the first method is faster, but will never look as good as the original 256 color image. I do doubt it will look bad, though. This is the best I can come up with, since ham will be the only way, unless you want to do high quality rgb to 256 color conversion, which will never be as fast.

What I fear is the brutal color changes when rendering to ham : they're often too much visible.
And when scaling down an iff, you have to p2c it, then scale it, then count colors to get the most used ones, then write them back into a buffer, then c2p it. Ouch ! Ilbm displaying has never been so slow !