English Amiga Board jpeg decoding in full asm
 Register Amiga FAQ Rules & Help Members List  /  Moderators List Today's Posts Mark Forums Read

13 December 2007, 18:50   #21
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf I already know what's going on. The main problem for me is that dct trick, I know what it does, but I want the quickest way to do it without significantly losing precision - and the haskell code, not quite readable, apparently contains direct square roots and cosinus computations, which I do not want to do.
The haskell code does suck, doesn't it? And I don't suppose you want to make your own algorithm based on the formula

Quote:
 Originally Posted by meynaf Hopefully you know what I feel about those formulas
The same as me, I suppose. Math in pretty much un-cool, and it always looks like it could be done just about 10 million times simpler.

Quote:
 Originally Posted by meynaf Maybe not source code, but a detailed algorithm to efficiently perform the computations. Of course, a correctly commented source code will do Things such as sqr/sin/cos are to avoid at all costs, muls should be reduced to the bare minimum. Else we'll end up with something sloooooow
I guess I'll try again then. How difficult is it to find what you need on the net

13 December 2007, 19:13   #22
meynaf
son of 68k

Join Date: Nov 2007
Location: Lyon / France
Age: 44
Posts: 2,459
Quote:
 Originally Posted by Thorham The haskell code does suck, doesn't it? And I don't suppose you want to make your own algorithm based on the formula
I don't know if it sucks or not, but I surely don't want to make my own algorithm on the formula !

Quote:
 Originally Posted by Thorham The same as me, I suppose. Math in pretty much un-cool, and it always looks like it could be done just about 10 million times simpler.
I can't imagine writing a cosinus calculation in integer maths

Quote:
 Originally Posted by Thorham I guess I'll try again then. How difficult is it to find what you need on the net
I'm not sure if it will be useful right now, 'coz I finally dared to start the rewrite of the integer dct in asm, from jidctint.c (I prefer to convert c into asm, rather than compiling c to optimize the asm it produces).
Well, it turned out not to be that hard (once all those constants and macros have been replaced by what they mean).

But... muls, muls, muls and more muls (might remind you of something you've read recently ).
Oh, and muls again.
Did I forget muls ?

What I have now isn't a carbon copy of the original code, I had to move things to reduce register usage.
There is no init/exit code for now, and I only have the first half (columns).

I've included it here, so that you can have a look at it.
Not quite optimized already, there are some unneeded data movement.

Of course I dunno if it works

Last edited by meynaf; 12 May 2011 at 09:32.

14 December 2007, 17:55   #23
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf I'm not sure if it will be useful right now, 'coz I finally dared to start the rewrite of the integer dct in asm, from jidctint.c
Good luck. I've taken a look at the c code, and it shouldn't be that hard. Seems perfectly doable.

Quote:
 Originally Posted by meynaf (I prefer to convert c into asm, rather than compiling c to optimize the asm it produces).
Oh yeah, that is just much better, as everything gets named properly and you can easily comment everything.

Quote:
 Originally Posted by meynaf But... muls, muls, muls and more muls (might remind you of something you've read recently ). Oh, and muls again. Did I forget muls ?
That's just completely uncool. It will be very hard to reduce the number muls, if it's possible in the first place.

Quote:
 Originally Posted by meynaf What I have now isn't a carbon copy of the original code, I had to move things to reduce register usage. There is no init/exit code for now, and I only have the first half (columns). I've included it here, so that you can have a look at it. Not quite optimized already, there are some unneeded data movement.
Looking good! Is it my imagination, or does the asm look a lot cleaner then the c code? Anyway, that code looks like it's going to be pretty good. so keep up the good work

Quote:
 Originally Posted by meynaf Of course I dunno if it works
Impossible to say at this stage

14 December 2007, 18:25   #24
meynaf
son of 68k

Join Date: Nov 2007
Location: Lyon / France
Age: 44
Posts: 2,459
Quote:
 Originally Posted by Thorham Good luck. I've taken a look at the c code, and it shouldn't be that hard. Seems perfectly doable.
That might take some time before I come out with something usable, so don't worry if I remain silent for a few days

Quote:
 Originally Posted by Thorham Oh yeah, that is just much better, as everything gets named properly and you can easily comment everything.
And you know who to blame if it doesn't work...

Quote:
 Originally Posted by Thorham That's just completely uncool. It will be very hard to reduce the number muls, if it's possible in the first place.
Don't look at the muls, their number has already been reduced to its minimum by chosing the algorithm. Same goes for the adds (I think).
But there can be some unneeded moves, and it is possible that the register usage can be reduced as well.
I also strongly doubt it could be useful to replace those muls by tables, because there are just too many different constants.

Quote:
 Originally Posted by Thorham Looking good! Is it my imagination, or does the asm look a lot cleaner then the c code? Anyway, that code looks like it's going to be pretty good. so keep up the good work
Not hard to look a lot cleaner, as c code is always dirty

Quote:
 Originally Posted by Thorham Impossible to say at this stage
Oh, you didn't see a bug already ? Curious

17 December 2007, 17:18   #25
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf That might take some time before I come out with something usable, so don't worry if I remain silent for a few days
I hope it's been a productive few days Productivity on my side of the net has been zero, because I've had nasty cold (which is still not completely over).

Quote:
 Originally Posted by meynaf And you know who to blame if it doesn't work...
It's an advantage if every mistake is your own. No one else to rely on.

Quote:
 Originally Posted by meynaf Don't look at the muls, their number has already been reduced to its minimum by chosing the algorithm. Same goes for the adds (I think). But there can be some unneeded moves, and it is possible that the register usage can be reduced as well. I also strongly doubt it could be useful to replace those muls by tables, because there are just too many different constants.
That's what I was thinking, too. Ridding yourself of those muls is probably not possible. The numbers they multiply can't be optimized to a bunch of adds, either, so you're definitely right. Still a shame, though...

Quote:
 Originally Posted by meynaf Not hard to look a lot cleaner, as c code is always dirty
You got that right

Quote:
 Originally Posted by meynaf Oh, you didn't see a bug already ? Curious
So am I. Also looking forward to what you did this weekend.

21 December 2007, 11:47   #26
meynaf
son of 68k

Join Date: Nov 2007
Location: Lyon / France
Age: 44
Posts: 2,459
Quote:
 Originally Posted by Thorham I hope it's been a productive few days Productivity on my side of the net has been zero, because I've had nasty cold (which is still not completely over).
I've had one too. Still coughing a little, but nothing more.

Quote:
 Originally Posted by Thorham It's an advantage if every mistake is your own. No one else to rely on.
Not entirely an advantage : you also have no one else to blame

Quote:
 Originally Posted by Thorham That's what I was thinking, too. Ridding yourself of those muls is probably not possible. The numbers they multiply can't be optimized to a bunch of adds, either, so you're definitely right. Still a shame, though...
Multiplies that could be removed already have, thanks to the IJG... The constants are all derived of cosine/square root stuff, so, yes, they look like random values.

Quote:
 Originally Posted by Thorham So am I. Also looking forward to what you did this weekend.
I finished it, but I found a mistake (register confusion) in the code.
This :
Code:
```move.w d6,d1
muls #9633,d1

muls #-16069,d6
muls #-3196,d7
Should be replaced by this :
Code:
```move.w d6,d4
muls #9633,d4

muls #-16069,d6
muls #-3196,d7
Now it works. Gone from 233 to 197 frames for my test image
If you want to have a look at it, it's in the zone, along with all modified files.

In the archive you'll find jidctint.s - the asm version of jidctint.c, which now is nothing but a wrapper for the asm version.

You'll also find a pre-compiled version ; after the c-code for dct has vanished the exe's size has dropped.

That code is probably tougher than the ham code you're used to, so to make things easier I've kept some (modified) c code as comments, and translated my comments for you.

Hint : try to free regs by moving things around, that is, output something right after it is computed, to free a reg for the next computation.
Then there could be a lot of possible opts if you have free regs (not only Dn but also An).

Last edited by meynaf; 21 December 2007 at 15:07.

 21 December 2007, 15:20 #27 meynaf son of 68k   Join Date: Nov 2007 Location: Lyon / France Age: 44 Posts: 2,459 For the next step - the colorspace conversion may be my next victim - I'm looking for jpeg files with unusual color spaces, to test them, and to check whether they're worth supporting or not (certainly not if they are extremely rare). Can someone fire up a photoshop and save RGB/YCCK/CMYK encoded jpeg files for me ? (as I don't have photoshop and those are apparently adobe specific) Please... Last edited by meynaf; 21 December 2007 at 15:32.
23 December 2007, 00:02   #28
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf Now it works. Gone from 233 to 197 frames for my test image
Impressive
Quote:
 Originally Posted by meynaf If you want to have a look at it, it's in the zone, along with all modified files.
Thanks man
Quote:
 Originally Posted by meynaf That code is probably tougher than the ham code you're used to, so to make things easier I've kept some (modified) c code as comments, and translated my comments for you.
Yes, it is. But that's only logical, considering the ham renderer is mathematically a lot simpler then an idct routine. Still a shame that I don't under stand those (i)dct formulas.

Thanks for the translation, greatly appreciated Using those translators works, but they are annoying to use, and of course, having to fill in the context manually doesn't always help, either.
Quote:
 Originally Posted by meynaf Hint : try to free regs by moving things around, that is, output something right after it is computed, to free a reg for the next computation. Then there could be a lot of possible opts if you have free regs (not only Dn but also An).
Oh yes, I can definitely give that a try.

I guess it was a bad idea to search for a full explanation. Even when you do completely understand the subject, it's going to be very tough to optimize the idct routine.

Anyway, good job, and great looking code. Keep up the good work.
Quote:
 Originally Posted by meynaf For the next step - the colorspace conversion may be my next victim - I'm looking for jpeg files with unusual color spaces, to test them, and to check whether they're worth supporting or not (certainly not if they are extremely rare). Can someone fire up a photoshop and save RGB/YCCK/CMYK encoded jpeg files for me ? (as I don't have photoshop and those are apparently adobe specific) Please...
Aren't all jpeg images, except gray scale, yuv color space encoded? Further more, those adobe specific formats are probably very rare indeed, so if you ask me, only gray scale and yuv are needed (as far as I can tell from the explanations). But hey, thats just my opinion

Last edited by Thorham; 23 December 2007 at 00:06. Reason: Forgot something...

24 December 2007, 11:29   #29
meynaf
son of 68k

Join Date: Nov 2007
Location: Lyon / France
Age: 44
Posts: 2,459
Quote:
 Originally Posted by Thorham Yes, it is. But that's only logical, considering the ham renderer is mathematically a lot simpler then an idct routine. Still a shame that I don't under stand those (i)dct formulas.
What you have here is the algorithm from C. Loeffler, A. Ligtenberg and G. Moschytz (LL&M). But don't ask me more about it
If someone wants to optimize that, then it's better to just do what they do, regardless of what it mathematically means.
As in the original IJG code, it must also perform the dequantization.

Maybe the comments in jidctint.c (the original one) can be useful for you.
Quote:
 Originally Posted by Thorham Thanks for the translation, greatly appreciated Using those translators works, but they are annoying to use, and of course, having to fill in the context manually doesn't always help, either. Oh yes, I can definitely give that a try.
There is some important register pressure in here, but maybe it's possible to free one (I already did it with the variable z5). Did you find something already ?

Quote:
 Originally Posted by Thorham I guess it was a bad idea to search for a full explanation. Even when you do completely understand the subject, it's going to be very tough to optimize the idct routine.
The basis is that the dct is (a particular case of) a Fourier transform, and its inverse is the idct. It takes you from spacial values from frequencies (well, here we're going from frequencies to values).
In jpegs, the high frequencies are stored with less precision (-> less bits) than the lower ones, because they are less visible. That's why they can look somewhat blurred.

Quote:
 Originally Posted by Thorham Anyway, good job, and great looking code. Keep up the good work.
I sure will

Quote:
 Originally Posted by Thorham Aren't all jpeg images, except gray scale, yuv color space encoded? Further more, those adobe specific formats are probably very rare indeed, so if you ask me, only gray scale and yuv are needed (as far as I can tell from the explanations). But hey, thats just my opinion
Most jpegs are yuv encoded (though they call it YCbCr in the code), but the format itself supports a bigger set of color spaces (as I've seen in the code).
However if nothing using them can be found, then they're not worth supporting in my asm code (except by throwing an error message in the face of the unfortunate user who accidentally stumbled upon such a file ).

24 December 2007, 16:38   #30
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf There is some important register pressure in here, but maybe it's possible to free one (I already did it with the variable z5). Did you find something already ?
Yes, I think I did. It seems that the part which writes the columns can have addi.l #1<<10,d7 replaced by add.l d4,d7 if you add move.l #1024,d4 to the code, as d4 isn't used in this part of the routine, and gets set to another value in the beginning of the first loop. This is the first thing I've found, and I'm still looking for more.
Quote:
 Originally Posted by meynaf Most jpegs are yuv encoded (though they call it YCbCr in the code), but the format itself supports a bigger set of color spaces (as I've seen in the code). However if nothing using them can be found, then they're not worth supporting in my asm code (except by throwing an error message in the face of the unfortunate user who accidentally stumbled upon such a file ).
I'm completely convinced most jpegs are yuv encoded, and that you really don't need any other color spaces except gray, because that's what I keep reading in jpeg docs. YCbCrC is just the digital variant of yuv, as yuv is for analog video.

I'll let you know if I find more, and I'm pretty sure I will, since the first versions of a piece of code are usually not completely optimized.

24 December 2007, 16:50   #31
alexh
Thalion Webshrine

Join Date: Jan 2004
Location: Oxford
Posts: 11,942
Quote:
 Originally Posted by Thorham Y'CbCr is just the digital variant of yuv, as yuv is for analog video.
Wow, someone who actually knows the truth. I've been telling people this for years but they never listen.

Charles Poynton is my hero when it comes to this stuff.

Last edited by alexh; 24 December 2007 at 16:56.

24 December 2007, 17:03   #32
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by alexh Wow, someone who actually knows the truth. I've been telling people this for years but they never listen.
Really Even wikipedia's explanation about yuv clearly states this. It's amazing how people can just dismiss these facts.
Quote:
 Originally Posted by alexh Charles Poynton is my hero when it comes to this stuff.
Thanks for the name I've been interested in this kind of stuff for a while, and he seems to have a pretty cool site about video related stuff. Great!

 24 December 2007, 17:14 #33 meynaf son of 68k   Join Date: Nov 2007 Location: Lyon / France Age: 44 Posts: 2,459 Does that guy have asm code for YCbCr -> RGB conversion ? What, no 68k version ? Well, ok, I'll do it... That's the next thing I've spotted that's not too difficult and can give us an important speed increase. I promise I won't use the term yuv if it's digital video
29 December 2007, 03:18   #34
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf Does that guy have asm code for YCbCr -> RGB conversion ? What, no 68k version ? Well, ok, I'll do it... That's the next thing I've spotted that's not too difficult and can give us an important speed increase. I promise I won't use the term yuv if it's digital video
That code is a cake walk, as you've probably found out by now I've done a version in free basic, and it was pretty simple.

Going off-topic a bit now. Ultimately I'm still wondering if there isn't a plain and simple way to effectively crunch gfx, something which doesn't require 'advanced' math, and can be implemented algorithmically. Surely something is possible, it's not as if everything has been thought of in the wonderful world of algorithms (your ham rendering engine seems to be a good example, haven't seen it before).

Last edited by Thorham; 29 December 2007 at 03:36.

 04 January 2008, 12:12 #35 meynaf son of 68k   Join Date: Nov 2007 Location: Lyon / France Age: 44 Posts: 2,459 Of course the YCbCr->RGB conversion is simple. But it becomes more interesting if you try to do it without multiplies The problem of gfx data is that it doesn't crunch very well. The more colourful it is, the worse the compression will be. The one and only thing I see here is that we could exploit the fact that adjacent pixels are often close in color ; they're closer in YCbCr than in RGB but changing color spaces is a lossy process because of rounding errors. My ham engine is more standard issue than it looks. I'm pretty sure nearly all high quality renderers use the very same algorithm.
04 January 2008, 19:36   #36
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf Of course the YCbCr->RGB conversion is simple. But it becomes more interesting if you try to do it without multiplies
Now that's a very good idea, meynaf. Can't wait to try it! Of course I'm going to try it in freebasic first (more convenient because of easy to use gfx functions which all just work in 24bit).
Quote:
 Originally Posted by meynaf The problem of gfx data is that it doesn't crunch very well. The more colourful it is, the worse the compression will be.
True, but it also applies to gray scale: The more visible detail the less compression.
Quote:
 Originally Posted by meynaf The one and only thing I see here is that we could exploit the fact that adjacent pixels are often close in color ; they're closer in YCbCr than in RGB but changing color spaces is a lossy process because of rounding errors.
That's true. There is a problem with this, though. I've tried (in freebasic, what else ) scaling the color information by 50%x50%, then, even with interpolation, the final rendered image is not as good as the original because aliasing is introduced. Could be fixed by somehow not scaling the whole image, just parts of it. And, of course, my interpolation algorithm isn't the best, although it does yield better results then none at all.
Quote:
 Originally Posted by meynaf My ham engine is more standard issue than it looks. I'm pretty sure nearly all high quality renderers use the very same algorithm.
Oh, I hadn't realized this. Probably because I use Adpro as a quality reference, and this program is terribly slow, as you know, while your program is very fast

14 January 2008, 16:30   #37
meynaf
son of 68k

Join Date: Nov 2007
Location: Lyon / France
Age: 44
Posts: 2,459
Quote:
 Originally Posted by Thorham Now that's a very good idea, meynaf. Can't wait to try it! Of course I'm going to try it in freebasic first (more convenient because of easy to use gfx functions which all just work in 24bit).
You could do it right in asm, because else it's already done. Look in jdcolor.c to see how.

I tried this already, but it was quite deceiving. I ran into a lot of bugs (you can't imagine the -ahem- beautiful images I've seen) and it didn't give a good speed gain. Too many tables to peek (4), 3 data sources for 1 destination -> too many address registers used -> too much swapping. Gosh !
Maybe you'll be more lucky if you give it a try...

Quote:
 Originally Posted by Thorham True, but it also applies to gray scale: The more visible detail the less compression.
Of course.

Quote:
 Originally Posted by Thorham That's true. There is a problem with this, though. I've tried (in freebasic, what else ) scaling the color information by 50%x50%, then, even with interpolation, the final rendered image is not as good as the original because aliasing is introduced. Could be fixed by somehow not scaling the whole image, just parts of it. And, of course, my interpolation algorithm isn't the best, although it does yield better results then none at all.
You're doing here something jpeg already does. They're using a triangle filter for upsampling, maybe you can try that too (look in jdsample.c for more info).

I'm more interested in lossless compression, though. What if you could predict what the image will be by computing it with whatever you've already decoded, and only store the difference between the reality and your prediction ?
(this has already been applied to audio, but afaik not to gfx data)
(and, oh, yes, I don't have a clue on the predictors to use )

Quote:
 Originally Posted by Thorham Oh, I hadn't realized this. Probably because I use Adpro as a quality reference, and this program is terribly slow, as you know, while your program is very fast
FastJpeg also uses a similar algorithm, but probably not Visage because its rendering is quite ugly (though it is fast).

Adpro probably makes a lot of analysis to adapt its palette before rendering, which is terribly slow (that's why I didn't want to do it too).
Furthermore, if it's 100% compiled code then it's likely to be up to 4x slower...

Answer here for the viewer options (off-topic in the mpega thread ) : just list a few here and we'll see. Maybe they're already planned.

About the scaling, there is something that annoys me quite a lot : what to do on a palettized display ? Skipping pixels will be ugly and we don't have enough colors to get all the rgb combinations an average would make, but ham display may be even uglier than pixel skipping on some images.

15 January 2008, 04:50   #38
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by maynaf right in asm, because else it's already done. Look in jdcolor.c to see how. I tried this already, but it was quite deceiving. I ran into a lot of bugs (you can't imagine the -ahem- beautiful images I've seen) and it didn't give a good speed gain. Too many tables to peek (4), 3 data sources for 1 destination -> too many address registers used -> too much swapping. Gosh ! Maybe you'll be more lucky if you give it a try...
I gave the color space conversion a go, and this is what I came up with:
Code:
```    move.l In_Y,a0
move.l In_CB,a1
move.l In_CR,a2
move.l Out,a3
move.l CB_Table,a4
move.l CR_Table,a5
move.l Range_Table,a6
move.l YCBCR_Buf_Size,d5

;Free regs: d6 and d7

.lp moveq  #0,d0
moveq  #0,d1
moveq  #0,d2

move.b (a0)+,d0      ;Y
move.b (a1)+,d1      ;CB
move.b (a2)+,d2      ;CR

move.b (a4,d1.l),d1  ;0.34414*Cb-128
move.b (a5,d2.l),d2  ;0.71414*Cr-128

move.l d0,d4         ;Calc green
sub.l  d1,d4
sub.l  d2,d4
move.b (a6,d4.l),d4  ;Clip green
move.b d4,(a3)+      ;Write green

move.l d2,d4         ;Calc blue
move.b (a6,d4.l),d4  ;Clip blue
move.b d4,(a3)+      ;Write blue

move.b (a6,d0.l),d0  ;Clip red
move.b d0,(a3)+      ;Write red

subq.l #1,d5
bpl    .lp```
The code is pretty simple. It uses two multiplication tables each with a subtraction (-128). It then just calculates green first and uses the old Cb and Cr values again by multiplying them by respectively 5 and 2. This gives a good approximation of the values needed to calculate red and blue, and get's rid of some table reading. After that the rgb values just need to be clipped to fit in the range 0-255. Like the c code, I use a table for this, but it might be faster to use compares instead. I don't know, because I can't test it!

About the approximations: I've tested these with my YCbCr program in freebasic (damned handy), and the color differences are quite small, meaning the images look great, you literary have to see the original next to the encoded version to see any difference, otherwise you'd think it's the original! I've tested it with a straight gray scale in rgb as well, and there is no difference what so ever. If this code is faster then what you've tried, it's perfect for fast viewing in high quality.

However, you will have to integrate the code yourself, and although I've tested the approximation, the asm code itself is untested and may contain a bug here and there. Nothing serious, though. Should be easy to fix if there are any. Also due to data format differences the code may not work as is, but I suppose this should still give you a good idea of what can be done.
Quote:
 Originally Posted by meynaf You're doing here something jpeg already does. They're using a triangle filter for upsampling, maybe you can try that too (look in jdsample.c for more info). I'm more interested in lossless compression, though. What if you could predict what the image will be by computing it with whatever you've already decoded, and only store the difference between the reality and your prediction ? (this has already been applied to audio, but afaik not to gfx data) (and, oh, yes, I don't have a clue on the predictors to use )
Ah, triangular interpolation eh? I'm going to do some yahooing on that one.

Lossless is indeed interesting as an extra option. Trying to predict the data is interesting, too. Hadn't thought of that. I'm going to have to see if I can come up with anything.
Quote:
 Originally Posted by meynaf Adpro probably makes a lot of analysis to adapt its palette before rendering, which is terribly slow (that's why I didn't want to do it too). Furthermore, if it's 100% compiled code then it's likely to be up to 4x slower...
Yes, it does. And it's indeed compiled as far as I know, with a compiler from 1992... Yep, it doesn't get any slower than that

Quote:
 Originally Posted by meynaf Answer here for the viewer options (off-topic in the mpega thread ) : just list a few here and we'll see. Maybe they're already planned.
As I think of options, I'll post them here. No problem.
Quote:
 Originally Posted by meynaf About the scaling, there is something that annoys me quite a lot : what to do on a palettized display ? Skipping pixels will be ugly and we don't have enough colors to get all the rgb combinations an average would make, but ham display may be even uglier than pixel skipping on some images.
Skipping just sucks rocks. One way, is to convert the data to rgb while scaling, then just render to ham. Should be ok. Another one is to do the same and count how many times each color is used during scaling. Then quick sort the table. Since the table is max 256 entries, this should be fast. Once that's done, use the 64 most frequent colors as the ham palette. Obviously, the first method is faster, but will never look as good as the original 256 color image. I do doubt it will look bad, though. This is the best I can come up with, since ham will be the only way, unless you want to do high quality rgb to 256 color conversion, which will never be as fast.

15 January 2008, 11:58   #39
meynaf
son of 68k

Join Date: Nov 2007
Location: Lyon / France
Age: 44
Posts: 2,459
Quote:
 Originally Posted by Thorham I gave the color space conversion a go, and this is what I came up with
You came up with something very interesting at your first try

Quote:
 Originally Posted by Thorham The code is pretty simple. It uses two multiplication tables each with a subtraction (-128). It then just calculates green first and uses the old Cb and Cr values again by multiplying them by respectively 5 and 2. This gives a good approximation of the values needed to calculate red and blue, and get's rid of some table reading. After that the rgb values just need to be clipped to fit in the range 0-255.
Hmm... I admit I don't like losses... I noted that there is further accuracy loss as compared to the original code, because the adds for green pixels (a*Cb + b*Cr) were done with 32-bit fixed-point values, not bytes.

Quote:
 Originally Posted by Thorham Like the c code, I use a table for this, but it might be faster to use compares instead. I don't know, because I can't test it!
I didn't really test it, but from the timings I know, compares would be slower. Or can you do a range-limit in less than 11 clock cycles ???

Quote:
 Originally Posted by Thorham About the approximations: I've tested these with my YCbCr program in freebasic (damned handy), and the color differences are quite small, meaning the images look great, you literary have to see the original next to the encoded version to see any difference, otherwise you'd think it's the original! I've tested it with a straight gray scale in rgb as well, and there is no difference what so ever. If this code is faster then what you've tried, it's perfect for fast viewing in high quality.
To be acceptable, such a loss must make the thing much faster.
I've counted clock cycles (including pipeline) and you should get something like 123 of them per pixel.
My actual code runs in 125 or so, with full accuracy. Not worth changing.
But you can get rid of 8 of them, going down to 115, if you replace :
- subq/bpl by a dbf (-2)
- move.b (a6,dn.l),dn / move.b dn,(a3)+ by move.b (a6,dn.l),(a3)+ (-2, 3 times)

However it's still not enough IMHO (a bit less than 10%, but much less for the overall speed). A 40% gain could be good though.

Quote:
 Originally Posted by Thorham However, you will have to integrate the code yourself, and although I've tested the approximation, the asm code itself is untested and may contain a bug here and there. Nothing serious, though. Should be easy to fix if there are any. Also due to data format differences the code may not work as is, but I suppose this should still give you a good idea of what can be done.
The data formats are the same, except that you must write red, then green, then blue, not red last.

Here is my code, should you spot something that can be done to accelerate it :
Code:
```; parameters :
;    a0=input_buf[0]+input_row
;    a1=input_buf[1]+input_row
;    a2=input_buf[2]+input_row
;    a3=output_buf
;    d7=num_rows
;    d6=cinfo->output_width
ycc_rgb_convert
subq.w #1,d6
moveq #0,d0
move.l #\$100,d1     ; with *8, we will go to \$800 bytes after the 1st array
moveq #0,d2         ; (which will make us point on the 2nd array)
.yloop
move.l d6,d5
move.l (a0)+,a4
move.l (a1)+,a5
move.l (a2)+,a6
movem.l a0-a3,-(a7)
lea cxtab,a0
lea range_limit2+\$180,a2
move.l (a3),a3
.xloop
; inner loop
move.b (a4)+,d0
move.b (a5)+,d1
move.b (a6)+,d2
lea (a0,d2.w*8),a1
move.l (a1)+,d4
move.l (a1),d3
move.b (a2,d3.w),(a3)+
lea (a0,d1.w*8),a1
swap d2
move.b (a2,d2.w),(a3)+
move.l d0,d3
moveq #0,d2
move.b (a2,d3.w),(a3)+
dbf d5,.xloop
; end of inner loop
movem.l (a7)+,a0-a3
subq.l #1,d7
bne.s .yloop
rts```
Note : I'm using 4 arrays, of which 2 are interleaved, all with the same pointer. Of course only the inner loop really has to be optimized.
Other note : this one has been tested and works. But it doesn't give as much gain as we could have expected...

Quote:
 Originally Posted by Thorham Ah, triangular interpolation eh? I'm going to do some yahooing on that one.
I've started this one in asm. Very funny to do (*cough*).

Quote:
 Originally Posted by Thorham Lossless is indeed interesting as an extra option. Trying to predict the data is interesting, too. Hadn't thought of that. I'm going to have to see if I can come up with anything.
Sure, this is an area where little has been done.
Quote:
 Originally Posted by Thorham Yes, it does. And it's indeed compiled as far as I know, with a compiler from 1992... Yep, it doesn't get any slower than that
I'm sure it can be made slower

Quote:
 Originally Posted by Thorham As I think of options, I'll post them here. No problem.
I'm waiting...

Quote:
 Originally Posted by Thorham Skipping just sucks rocks. One way, is to convert the data to rgb while scaling, then just render to ham. Should be ok. Another one is to do the same and count how many times each color is used during scaling. Then quick sort the table. Since the table is max 256 entries, this should be fast. Once that's done, use the 64 most frequent colors as the ham palette. Obviously, the first method is faster, but will never look as good as the original 256 color image. I do doubt it will look bad, though. This is the best I can come up with, since ham will be the only way, unless you want to do high quality rgb to 256 color conversion, which will never be as fast.
What I fear is the brutal color changes when rendering to ham : they're often too much visible.
And when scaling down an iff, you have to p2c it, then scale it, then count colors to get the most used ones, then write them back into a buffer, then c2p it. Ouch ! Ilbm displaying has never been so slow !

15 January 2008, 21:32   #40
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 41
Posts: 2,972
Quote:
 Originally Posted by meynaf You came up with something very interesting at your first try
Thank you!
Quote:
 Originally Posted by meynaf Hmm... I admit I don't like losses... I noted that there is further accuracy loss as compared to the original code, because the adds for green pixels (a*Cb + b*Cr) were done with 32-bit fixed-point values, not bytes.
I know you don't, but you relly have to see this in action, so I've made a bunch of test images, and put them in the zone. The images are in 24bit png format. Each image has the original image on the right and the approximation on the left. Both images are 640x512 and have been fitted in 1280x512 as to make it easy to compare them. Note that the originals where all in the jpeg format.
Quote:
 Originally Posted by meynaf I didn't really test it, but from the timings I know, compares would be slower. Or can you do a range-limit in less than 11 clock cycles ???
No, I don't think it can be done since that would require a cmp , three branches and two moves.
Quote:
 Originally Posted by meynaf To be acceptable, such a loss must make the thing much faster. I've counted clock cycles (including pipeline) and you should get something like 123 of them per pixel. My actual code runs in 125 or so, with full accuracy. Not worth changing. But you can get rid of 8 of them, going down to 115, if you replace : - subq/bpl by a dbf (-2) - move.b (a6,dn.l),dn / move.b dn,(a3)+ by move.b (a6,dn.l),(a3)+ (-2, 3 times) However it's still not enough IMHO (a bit less than 10%, but much less for the overall speed). A 40% gain could be good though.
As you have seen, the loss is hardly noticeable , if at all. IMHO this is quite acceptable.

Silly me, I forgot about the moves The subq/bpl can not be changed to dbf since dbf works on words, and the input can be larger than 64kb. Unless 68030 can handle 32bit dbf (wouldn't be surprised). On the other hand you could just use two of them, since there are two unused data regs.
Quote:
 Originally Posted by meynaf The data formats are the same, except that you must write red, then green, then blue, not red last.
Since we're rendering to ham, the order of the gun colors is not important:
Code:
```;Code in ham rendering routine:
move.b (a0)+,d1  ;red
move.b (a0)+,d2  ;green
move.b (a0)+,d3  ;blue

;Changes to:
move.b (a0)+,d2  ;green
move.b (a0)+,d3  ;blue
move.b (a0)+,d1  ;red```
This doesn't affect the rest of the ham rendering routine at all.

I am a bit surprised I got the data formats just right, I was quite unsure about it. Cool
Quote:
 Originally Posted by meynaf Here is my code, should you spot something that can be done to accelerate it :
Before I pain my brain, I want to know what you think about the losses acceptability. If you like it, you might be able to come up with a faster method than the one used in the c code!
Quote:
 Originally Posted by meynaf I've started this one in asm. Very funny to do (*cough*).
Good luck
Quote:
 Originally Posted by meynaf Sure, this is an area where little has been done.
After thinking about it, I came to the conclusion that this is a bit like adaptive interpolation. But I still have to try it.
Quote:
 Originally Posted by meynaf What I fear is the brutal color changes when rendering to ham : they're often too much visible.
The only way to know, is to try it, you might be in for a surprise. The thing is, you have to convert to full rgb anyway. This just makes the image 24bit. There will be differences, but I really doubt they're going to be very big. But again, only way to know is to try it.

Edited: Testing will be easy. All you have to do is convert 256 color images to jpeg in the highest quality setting, and use your viewer to see what it looks like! Further more, I tried adpros ham rendering on 256 color images, and although there is a loss, it's really not bad. However, that is to be expected, and it can't be helped. I've also tried it with visage, and that is just plain ugly Since your ham rendering routine is much better, it might just be ok. If you don't have the time, I can make some test images for you, since I've got a whole bunch of 256 color bmps which I ripped from the Final Fantasy 6 Playstation cd edition.

Quote:
 Originally Posted by meynaf And when scaling down an iff, you have to p2c it, then scale it, then count colors to get the most used ones, then write them back into a buffer, then c2p it. Ouch ! Ilbm displaying has never been so slow !
Yep, that's true (although counting can be done while scaling). However, don't you agree 24bit iffs are a silly format? I mean, planar 24bit I don't think there's any hardware capable of displaying this directly. It's all chunky rgb. IMHO 24bit iff should have never been created, and is not worthy of being supported. I know some amiga software uses it, but it's far better to store images as bmp or png.

As for iffs up to 8bit per pixel, these are typical amiga format images, and probably none of them need scaling. For those, scaling would be optional, and I seriously doubt anyone would use it.

Last edited by Thorham; 15 January 2008 at 22:20.

 Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 Similar Threads Thread Thread Starter Forum Replies Last Post robinsonb5 Hardware mods 3 30 June 2013 12:09 amiga_user support.Apps 3 28 November 2011 12:50 D4Ni3L3 request.Apps 8 04 November 2009 18:58 andreas Coders. General 10 02 November 2009 23:18 Photon Coders. General 14 16 March 2006 12:24

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home News Main     Amiga scene     Retrogaming General Discussion     Nostalgia & memories Support     New to Emulation or Amiga scene         Member Introductions     support.WinUAE     support.WinFellow     support.OtherUAE     support.FS-UAE     support.Hardware         Hardware mods         Hardware pics     support.Games     support.Demos     support.Apps     support.Amiga Forever     support.Amix     support.Other Requests     request.UAE Wishlist     request.Old Rare Games     request.Demos     request.Apps     request.Modules     request.Music     request.Other     Looking for a game name ?     Games images which need to be WHDified abime.net - Hall Of Light     HOL news     HOL suggestions and feedback     HOL data problems     HOL contributions abime.net - Amiga Magazine Rack     AMR news     AMR suggestions and feedback     AMR data problems     AMR contributions abime.net - Home Projects     project.Amiga Lore     project.EAB     project.IRC     project.Mods Jukebox     project.Wiki abime.net - Hosted Projects     project.aGTW     project.APoV     project.ClassicWB     project.Jambo!     project.Green Amiga Alien GUIDES     project.Maptapper     project.Sprites     project.WinUAE - Kaillera Other Projects     project.Amiga Demo DVD     project.Amiga Game Factory     project.CARE     project.EAB File Server     project.CD32 Conversion     project.Game Cover Art         GCA.Feedback and Suggestions         GCA.Work in Progress         GCA.Cover Requests         GCA.Usefull Programs         GCA.Helpdesk     project.KGLoad     project.MAGE     project.Missing Full Shareware Games     project.SPS (was CAPS)     project.TOSEC (amiga only)     project.WHDLoad         project.Killergorilla's WHD packs Misc     Amiga websites reviews     MarketPlace         Swapshop     EAB's competition Coders     Coders. General         Coders. Releases         Coders. Tutorials     Coders. Asm / Hardware     Coders. System         Coders. Scripting         Coders. Nextgen     Coders. Language         Coders. C/C++         Coders. AMOS         Coders. Blitz Basic Off Topic     OT - General     OT - Technical     OT - Entertainment     OT - Sports     OT - Gaming

All times are GMT +2. The time now is 18:46.

 -- EAB3 skin ---- EAB2 skin ---- Mobile skin Archive - Top