jpeg decoding in full asm - Page 3

meynaf · 16 January 2008, 13:35

Quote:

Originally Posted by Thorham

I know you don't, but you relly have to see this in action, so I've made a bunch of test images, and put them in the zone. The images are in 24bit png format. Each image has the original image on the right and the approximation on the left. Both images are 640x512 and have been fitted in 1280x512 as to make it easy to compare them. Note that the originals where all in the jpeg format.

I've seen some differences, but not many. Will we see the speed gain ?

Quote:

Originally Posted by Thorham

No, I don't think it can be done since that would require a cmp , three branches and two moves.

Branches aren't necessary, but it's still slower than a table lookup (in the following code, assume we've just read the value to d0 and the N flag is positioned, and d6 contains the $100 constant) :

Code:

spl d5
cmp.l d6,d0
subx.b d7,d7
and.b d5,d0
or.b d7,d0

Not much slower but it is (12 cycles), and with a table lookup we can directly write the value (2 cycles gain), here we can't.

Range-limit contest : who can do a faster one ?

Quote:

Originally Posted by Thorham

As you have seen, the loss is hardly noticeable , if at all. IMHO this is quite acceptable.

One acceptable loss + one acceptable loss + one acceptable loss = inacceptable losses.
Jpeg does enough losses by itself, and the ham rendering has its own. I want quality, you know. Frankly I would accept the deal without hesitation... if it was the other way (a little slower -> a little more quality).

Quote:

Originally Posted by Thorham

Silly me, I forgot about the moves

The subq/bpl can not be changed to dbf since dbf works on words, and the input can be larger than 64kb.

Unless 68030 can handle 32bit dbf (wouldn't be surprised). On the other hand you could just use two of them, since there are two unused data regs.

The input will never be larger than 64kb because we'll never do more than one scanline at a time, and the jpeg lib limits those at 65500 pixels. So a dbf is ok.
And, no, the 030 can't handle 32-bit dbfs, but you can still do dbf followed by sub.l #$10000/bcc on the same register.

Quote:

Originally Posted by Thorham

Since we're rendering to ham, the order of the gun colors is not important:

Code:

;Code in ham rendering routine:
    move.b (a0)+,d1  ;red
    move.b (a0)+,d2  ;green
    move.b (a0)+,d3  ;blue

;Changes to:
    move.b (a0)+,d2  ;green
    move.b (a0)+,d3  ;blue
    move.b (a0)+,d1  ;red

This doesn't affect the rest of the ham rendering routine at all.

Sure. Now try to display a 24-bit bmp.
I don't want to kill the modularity of my code for a little gain in one codec.

Quote:

Originally Posted by Thorham

I am a bit surprised I got the data formats just right, I was quite unsure about it. Cool

What else could they have been ?

Quote:

Originally Posted by Thorham

Before I pain my brain, I want to know what you think about the losses acceptability. If you like it, you might be able to come up with a faster method than the one used in the c code!

What I think is simple :

the colorspace conversion amounts for 10 to 15 % of the overall cpu taken (from counting clock cycles vs the global time taken)
the speed gain is something like 10% at best on that part
so we'll end up with a 1.5% (max) overall gain, which is not noticeable either.

When making my viewer it was clear for me that I wanted reliability (-> no crash whatever bogus file is fed in) over everything else, quality over speed (unless the quality gain is too little and the speed suffers much) and speed over code size (while remaining under 32k or so for the exe). I ought to precise this before.

Now I can't accept the trade of quality, even hardly noticeable, for such a little gain. Sorry again.

Quote:

Originally Posted by Thorham

Good luck

I sure need it

Maybe you could have a look too if you want to have fun

Quote:

Originally Posted by Thorham

After thinking about it, I came to the conclusion that this is a bit like adaptive interpolation. But I still have to try it.
The only way to know, is to try it, you might be in for a surprise. The thing is, you have to convert to full rgb anyway. This just makes the image 24bit. There will be differences, but I really doubt they're going to be very big. But again, only way to know is to try it.

Oh, for a 256 color image there will be no problem. But try this with a grabbed workbench screen

Quote:

Originally Posted by Thorham

Edited: Testing will be easy. All you have to do is convert 256 color images to jpeg in the highest quality setting, and use your viewer to see what it looks like! Further more, I tried adpros ham rendering on 256 color images, and although there is a loss, it's really not bad. However, that is to be expected, and it can't be helped. I've also tried it with visage, and that is just plain ugly

Since your ham rendering routine is much better, it might just be ok. If you don't have the time, I can make some test images for you, since I've got a whole bunch of 256 color bmps which I ripped from the Final Fantasy 6 Playstation cd edition.

I have several megabytes of 256 color gifs, so that's no big deal. But I don't worry for the quality of 256 color images.
It's the <=32 color ones that could be really nasty looking in ham.

Quote:

Originally Posted by Thorham

Yep, that's true (although counting can be done while scaling). However, don't you agree 24bit iffs are a silly format? I mean, planar 24bit

I don't think there's any hardware capable of displaying this directly. It's all chunky rgb. IMHO 24bit iff should have never been created, and is not worthy of being supported. I know some amiga software uses it, but it's far better to store images as bmp or png.

Silly format it is, sure !

Maybe they thought about a possible future hardware when they made it.

But the png isn't a clean format either from what I've read in the specs. Too complex for what it does. Iff is much simpler.

Quote:

Originally Posted by Thorham

As for iffs up to 8bit per pixel, these are typical amiga format images, and probably none of them need scaling. For those, scaling would be optional, and I seriously doubt anyone would use it.

Well... maybe I can just say in the docs that scaling is supported only for true color images

Thorham · 16 January 2008, 18:01

Quote:

Originally Posted by meynaf

I've seen some differences, but not many. Will we see the speed gain ?

Well, seeing that you are the more experienced coder here, you might be able to put this to your advantage. On the other hand, the color space conversion seems to not allow a whole lot of playing around. It's just that you can get away with lowering the accuracy, without reducing the quality. If I where you, I'd still give it a shot, though. May turn out to be worth it.

Quote:

Originally Posted by meynaf

Branches aren't necessary, but it's still slower than a table lookup (in the following code, assume we've just read the value to d0 and the N flag is positioned, and d6 contains the $100 constant) :

Code:

 spl d5
 cmp.l d6,d0
 subx.b d7,d7
 and.b d5,d0
 or.b d7,d0

Not much slower but it is (12 cycles), and with a table lookup we can directly write the value (2 cycles gain), here we can't.

I really wouldn't have ever come up with that

I've never, ever used subx (hardly know what it does) and have never even seen spl (68020+ maybe???).... Goes to show I should learn my 680x0 better.

Quote:

Originally Posted by meynaf

Range-limit contest : who can do a faster one ?

Without a table, you win hands down, man. I can't beat that.

Quote:

Originally Posted by meynaf

One acceptable loss + one acceptable loss + one acceptable loss = inacceptable losses.
Jpeg does enough losses by itself, and the ham rendering has its own. I want quality, you know. Frankly I would accept the deal without hesitation... if it was the other way (a little slower -> a little more quality).

Ah, yes, that's funny, because for me speed is most important, as long as the quality doesn't drop below a certain point, e.g. becomes like a lot of other viewers. Only when the user would want plain 24bit output to do highest quality rendering with, say, adpro, would I go for maximum accuracy, by not making a single concession. To me, small color variations you can only see when comparing to the original directly, are of no importance. Of course, it's not my viewer, but your's, so it really isn't my call, is it

Quote:

Originally Posted by meynaf

The input will never be larger than 64kb because we'll never do more than one scanline at a time, and the jpeg lib limits those at 65500 pixels. So a dbf is ok.

Oh, man, I forgot the way your rendering function works! I'm so used to just doing everything in one go, I had failed to take this in account. Doh

I can tell you I'm not making that mistake again.

Quote:

Originally Posted by meynaf

Sure. Now try to display a 24-bit bmp.
I don't want to kill the modularity of my code for a little gain in one codec.

Doesn't your bmp code convert the data from bgr to rgb format? Or am I confused

If so, just output gbr. However, I'm probably wrong again

Even if I'm not, a little further down my reply, you'll see why I do agree that all this is useless, anyway.

Quote:

Originally Posted by meynaf

What else could they have been ?

Well, I can't test it, and I've only read a small part of the source code, so I really could have gotten it wrong, since I don't know what's going on exactly.

Quote:

Originally Posted by meynaf

What I think is simple :

the colorspace conversion amounts for 10 to 15 % of the overall cpu taken (from counting clock cycles vs the global time taken)
the speed gain is something like 10% at best on that part
so we'll end up with a 1.5% (max) overall gain, which is not noticeable either.

When making my viewer it was clear for me that I wanted reliability (-> no crash whatever bogus file is fed in) over everything else, quality over speed (unless the quality gain is too little and the speed suffers much) and speed over code size (while remaining under 32k or so for the exe). I ought to precise this before.

Now I can't accept the trade of quality, even hardly noticeable, for such a little gain. Sorry again.

That's ok, meynaf. As said, it's not my call anyway, and it's just how I would've done it. On the other hand, seeing how little the speed gain is, and the taking in account the fact you'd have to change other codecs just for this one, I actually have to agree that there is no point in doing it my way.

Quote:

Originally Posted by meynaf

I sure need it

Maybe you could have a look too if you want to have fun

For now, I'll have to pass on that one, as I'm still experimenting in freebasic (I love that compiler). Once I'm done with the experiments, it becomes a whole different ball game all together. Ultimately, I do want some sort of useful program to come out of this, and this must be written in assembler.

Quote:

Originally Posted by meynaf

Oh, for a 256 color image there will be no problem. But try this with a grabbed workbench screen

Oh, I will. I can do that very easily, indeed, takes only a few seconds.

Quote:

Originally Posted by meynaf

But the png isn't a clean format either from what I've read in the specs. Too complex for what it does. Iff is much simpler.

They should have made 24bit iffs chunky. That would have actually made sense . Is png really that complex? Going to take a look at it (would be cool to have support for it in your viewer).

Quote:

Originally Posted by meynaf

Well... maybe I can just say in the docs that scaling is supported only for true color images

Or warn the user that the quality can vary between different images. Some users may still want to use it in cases where they don't care about the quality. Just a thought.

meynaf · 17 January 2008, 11:18

Quote:

Originally Posted by Thorham

Well, seeing that you are the more experienced coder here, you might be able to put this to your advantage. On the other hand, the color space conversion seems to not allow a whole lot of playing around. It's just that you can get away with lowering the accuracy, without reducing the quality. If I where you, I'd still give it a shot, though. May turn out to be worth it.

I will, but on the Miga, not on the pc. Flipping screens is the best method I know.

Quote:

Originally Posted by Thorham

I really wouldn't have ever come up with that I've never, ever used subx (hardly know what it does) and have never even seen spl (68020+ maybe???).... Goes to show I should learn my 680x0 better.

You don't know the s<cc> instructions ? sne, seq ? That's not 020+, will work on a plain 68000.

spl is pretty like bpl, but it sets the byte to FF if condition is true, 00 else.

subx is like sub, but it also subtracts the X flag.
And so, subx on a register with itself will give 0 if X=0, -1 if X=1.

Quote:

Originally Posted by Thorham

Without a table, you win hands down, man. I can't beat that.

No other coder in here ? Don't tell me I'm the best out there

Quote:

Originally Posted by Thorham

Ah, yes, that's funny, because for me speed is most important, as long as the quality doesn't drop below a certain point, e.g. becomes like a lot of other viewers. Only when the user would want plain 24bit output to do highest quality rendering with, say, adpro, would I go for maximum accuracy, by not making a single concession. To me, small color variations you can only see when comparing to the original directly, are of no importance. Of course, it's not my viewer, but your's, so it really isn't my call, is it

Speed is important to me also, maybe the "certain point" simply isn't at the same place

Quote:

Originally Posted by Thorham

Oh, man, I forgot the way your rendering function works! I'm so used to just doing everything in one go, I had failed to take this in account. Doh I can tell you I'm not making that mistake again.

I generally don't mix rows, doing two loops (one for x, one for y). Anyway with the current data representation of the jpeg library it wouldn't have been possible to do otherwise, because there's no insurance they're contiguous in memory.

Quote:

Originally Posted by Thorham

Doesn't your bmp code convert the data from bgr to rgb format? Or am I confused If so, just output gbr. However, I'm probably wrong again Even if I'm not, a little further down my reply, you'll see why I do agree that all this is useless, anyway.

Yes, of course, I convert bgr to rgb (silly bmp's !). Would be easy to change. But I love modularity, so...
Furthermore, I've added a pbmplus module...

Quote:

Originally Posted by Thorham

They should have made 24bit iffs chunky. That would have actually made sense. Is png really that complex? Going to take a look at it (would be cool to have support for it in your viewer).

Maybe the 24 bit iffs made more sense in planar than to chunky for the designers by the time they did it. Of course now it's senseless.

I looked at the png for a possible support in my viewer, and what I saw didn't please me. It's not very complex but it's sure more complex than it really needs. If you have a look at the specs (they're easy to find), please tell me what you think about it.

Quote:

Originally Posted by Thorham

Or warn the user that the quality can vary between different images. Some users may still want to use it in cases where they don't care about the quality. Just a thought.

Good idea.

Sometimes I wanted that scale option just to get an answer to "what's this awfully big image ?" - and this doesn't need quality.

meynaf · 17 January 2008, 18:26

While profiling my code, I found out that the biggest part (in terms of cpu use) is undoubtedly the dct.
Of course it is due to all those muls, but maybe it can still be optimized.

That thing amounts to 33% of the overall time on ordinary images (much more on grayscale ones because there is no upsample/colorspace passes). And it will be more (in percentage only !) when the rest will be optimized, no doubt.

So I have posted my latest version here, with translated comments, hoping someone could find something...

Thorham · 18 January 2008, 09:44

Quote:

Originally Posted by meynaf

I will, but on the Miga, not on the pc. Flipping screens is the best method I know.

Yes, it is! Displaying the images next to each other, like in my test pics, can make it difficult to see those small color differences. Especially with winxp's image viewer with huge white borders

Quote:

Originally Posted by meynaf

You don't know the s<cc> instructions ? sne, seq ? That's not 020+, will work on a plain 68000.

spl is pretty like bpl, but it sets the byte to FF if condition is true, 00 else.

subx is like sub, but it also subtracts the X flag.
And so, subx on a register with itself will give 0 if X=0, -1 if X=1.

Noooo

Scc is indeed plain 68000. I have a book called 'Assembly language programming for the 68000', and guess what: I've looked up spl as a stand alone instruction

I must admit I've overlooked both scc and subx, probably 'never' needed them!

Quote:

Originally Posted by meynaf

No other coder in here ? Don't tell me I'm the best out there

Yeah, come on guys/girls! Hey, maybe we should have an eab asm coding contest to see who can write the most optimized code

Quote:

Originally Posted by meynaf

Speed is important to me also, maybe the "certain point" simply isn't at the same place

Definitely!

Quote:

Originally Posted by meynaf

I generally don't mix rows, doing two loops (one for x, one for y). Anyway with the current data representation of the jpeg library it wouldn't have been possible to do otherwise, because there's no insurance they're contiguous in memory.

Right, I didn't know that. That does make it rather impossible.

Quote:

Originally Posted by meynaf

Yes, of course, I convert bgr to rgb (silly bmp's !). Would be easy to change. But I love modularity, so...
Furthermore, I've added a pbmplus module...

Yes, the bmp format, although really simple to use, is very silly indeed: Rounding off pixel rows with extra padding bytes, wtf

And to make it even sillier: They're all upside down

Quote:

Originally Posted by meynaf

Maybe the 24 bit iffs made more sense in planar than to chunky for the designers by the time they did it. Of course now it's senseless.

I wonder. Does planer 24bit hardware even exist? If so, it would've been uselessly slow! They must've known 24bit planer would be completely useless

Quote:

Originally Posted by meynaf

looked at the png for a possible support in my viewer, and what I saw didn't please me. It's not very complex but it's sure more complex than it really needs. If you have a look at the specs (they're easy to find), please tell me what you think about it.

There sure seems to be a lot of crap in there. Bmp may be a silly format, but at least it's very simple to implement all the important stuff. I'll let you know when I've got some time to seriously look into it (today or this weekend).

Quote:

Originally Posted by meynaf

Good idea.

Sometimes I wanted that scale option just to get an answer to "what's this awfully big image ?" - and this doesn't need quality.

Thanks. Me, too, and on more than one occasion. Quality really doesn't matter then. By the way, have you tried the 256 color -> 24bit -> ham conversion? I really want to know what you think.

Quote:

Originally Posted by meynaf

While profiling my code, I found out that the biggest part (in terms of cpu use) is undoubtedly the dct.
Of course it is due to all those muls, but maybe it can still be optimized.

That thing amounts to 33% of the overall time on ordinary images (much more on grayscale ones because there is no upsample/colorspace passes). And it will be more (in percentage only !) when the rest will be optimized, no doubt.

So I have posted my latest version here, with translated comments, hoping someone could find something...

Great. Looking good, indeed

And thanks a lot for translating, I really appreciate it

Yesterday I only had a quick peek, but I'll let you know if I find anything. From what I have seen, it may not be that easy to optimize, though... Hope I can find something useful!

meynaf · 18 January 2008, 10:33

Quote:

Originally Posted by Thorham

Yes, it is! Displaying the images next to each other, like in my test pics, can make it difficult to see those small color differences. Especially with winxp's image viewer with huge white borders

And the colors aren't as good as should be on an LCD screen such as the one I'm looking right now...

Quote:

Originally Posted by Thorham

Noooo

Scc is indeed plain 68000. I have a book called 'Assembly language programming for the 68000', and guess what: I've looked up spl as a stand alone instruction

I must admit I've overlooked both scc and subx, probably 'never' needed them!

There are very few instructions I've never used, such as chk, trapv, rtr.
If you like strange 020+ instructions I have some code with a bfffo

Quote:

Originally Posted by Thorham

Yeah, come on guys/girls! Hey, maybe we should have an eab asm coding contest to see who can write the most optimized code

I would love an eab asm coding contest

Quote:

Originally Posted by Thorham

Right, I didn't know that. That does make it rather impossible.

Yes, even though I plan on changing the data representation if it's helpful, it's far from done, and since dbf is (slightly) faster, I'll sure keep it.

Quote:

Originally Posted by Thorham

Yes, the bmp format, although really simple to use, is very silly indeed: Rounding off pixel rows with extra padding bytes, wtf

And to make it even sillier: They're all upside down

I think they made it upside down to keep the higher coordinates on the top... how useless

Quote:

Originally Posted by Thorham

I wonder. Does planer 24bit hardware even exist? If so, it would've been uselessly slow! They must've known 24bit planer would be completely useless

They thought it would exist some day... but it never existed to the best of my knowledge.
I'm not sure it would have been slow if using a correct hardware. At least it's not slow up to 8 bits, so I don't see why it would for 24.

The planar format may look completely stupid, but it has its advantages. You don't need to write as many graphics routines as you have different number of bits per pixel ; just loop on the planes and you're done.

Quote:

Originally Posted by Thorham

There sure seems to be a lot of crap in there. Bmp may be a silly format, but at least it's very simple to implement all the important stuff. I'll let you know when I've got some time to seriously look into it (today or this weekend).

I'm curious to know what you think of all this. I'm waiting...

That thing is on my todolist since the beginning, but if it ends up too complex I'll bail out.

Quote:

Originally Posted by Thorham

Thanks. Me, too, and on more than one occasion. Quality really doesn't matter then. By the way, have you tried the 256 color -> 24bit -> ham conversion? I really want to know what you think.

I tried converting some gifs into jpegs for some tests some time ago, and they looked good.

Quote:

Originally Posted by Thorham

Great. Looking good, indeed

And thanks a lot for translating, I really appreciate it

Yesterday I only had a quick peek, but I'll let you know if I find anything. From what I have seen, it may not be that easy to optimize, though... Hope I can find something useful!

I thought about you when I translated the comments, so you can't come without something

Thorham · 18 January 2008, 16:44

Quote:

Originally Posted by meynaf

And the colors aren't as good as should be on an LCD screen such as the one I'm looking right now...

Ouch

On my CRT screen the colors are almost identical

I sure hate this, as it means I can't experiment with this stuff reliably. Damn.

Quote:

Originally Posted by meynaf

There are very few instructions I've never used, such as chk, trapv, rtr.
If you like strange 020+ instructions I have some code with a bfffo

Oh, there are plenty of instructions I've never used before

Isn't bfffo a bitfield instruction? And guess what, I've never used bitfields, either! Actually I never really use '020+ instructions. This is a habit I need to get rid of on aga, since aga always has at least a '020. I geuss I'm stuck in the past a bit

Quote:

Originally Posted by meunaf

I would love an eab asm coding contest

So would I. But I really don't want to organize one for some reason. A cool compo would be an old school intro contest of some sort. The sky is the limit. Now talk about off-topic

Quote:

Originally Posted by meynaf

Yes, even though I plan on changing the data representation if it's helpful, it's far from done, and since dbf is (slightly) faster, I'll sure keep it.

Yep, on '030 every cycle counts for idct and other heavy stuff, and I definitely can't blame you.

Quote:

Originally Posted by meynaf

I think they made it upside down to keep the higher coordinates on the top... how useless

Really? You got to be kidding me, man, what a bunch of complete and utter lamers. Then again, isn't bmp a Micro$oft format? Figures...

Quote:

Originally Posted by meynaf

I'm not sure it would have been slow if using a correct hardware. At least it's not slow up to 8 bits, so I don't see why it would for 24.

It's slow when you compare it to a simple chunky buffer, or we wouldn't need transposes for c2p.

Quote:

Originally Posted by meynaf

The planar format may look completely stupid, but it has its advantages. You don't need to write as many graphics routines as you have different number of bits per pixel ; just loop on the planes and you're done.

Yes, it has it's advantages. One is when using black and white text; you'd only have to scroll one plain, for example. But it's not implemented in AmigaOs that way, of course (Commodere's coders aren't exactly geniuses). When having to draw chunky graphics, however, bit planes do become quite slow.

Quote:

Originally Posted by meynaf

I'm curious to know what you think of all this. I'm waiting...

Don't worry, I will tell you. It's just that I've got some stuff to take care of which really can't wait. I'm only just able to fit in this reply in between chores. I'll look at it tonight.

Quote:

Originally Posted by meynaf

That thing is on my todolist since the beginning, but if it ends up too complex I'll bail out.

Baffling: How difficult is it to come up with a clean and simple, but powerful and versatile non-lossy format? I could probably come up with something good in a few minutes. But no, they always have to come up with all sorts of crap

Then again, this seems to be the trend these days. Same for software: Computers keep getting faster, while software keeps getting slower

Quote:

Originally Posted by meynaf

I tried converting some gifs into jpegs for some tests some time ago, and they looked good.

Then I don't think you'll have anything to worry about as far as scaling fixed palette images go. I've tried something like that, too, and the quality wasn't at all bad, although everything depends on the quality of the ham rendering system used. Furthermore, for images with up to 64 colors it's simple; just use the palette as the base palette

Quote:

Originally Posted by meynaf

I thought about you when I translated the comments, so you can't come without something

Thanks again! But don't worry, I won't. Unless I can't find anything, of course! However, I really do like doing that sort of 'work', so I'll be sure to pain my brain as much as humanly possible

meynaf · 18 January 2008, 17:58

Quote:

Originally Posted by Thorham

Ouch

On my CRT screen the colors are almost identical

I sure hate this, as it means I can't experiment with this stuff reliably. Damn.

Maybe you should pay a little bit more attention to what happened to the greenish areas in your images...

Quote:

Originally Posted by Thorham

Oh, there are plenty of instructions I've never used before

Isn't bfffo a bitfield instruction? And guess what, I've never used bitfields, either!

What's funny about that one is that I'm not using it for bitfields at all !
(but for a bit counter it's not bad)

Quote:

Originally Posted by Thorham

Actually I never really use '020+ instructions. This is a habit I need to get rid of on aga, since aga always has at least a '020. I geuss I'm stuck in the past a bit

Strange... I've got the habit of 030 coding since I have one... I was even eager to discover (and use) the new features of it...

Quote:

Originally Posted by Thorham

So would I. But I really don't want to organize one for some reason. A cool compo would be an old school intro contest of some sort. The sky is the limit. Now talk about off-topic

You would lose, that's the reason

(oops, sorry - please, don't hurt me !)

Quote:

Originally Posted by Thorham

Really? You got to be kidding me, man, what a bunch of complete and utter lamers. Then again, isn't bmp a Micro$oft format? Figures...

That's my guess about it, but I can't see any other reason !
And, yes, it's an m$ format. Moreover, it's the windows native format. Well, at least it's coherent with the rest

Quote:

Originally Posted by Thorham

It's slow when you compare it to a simple chunky buffer, or we wouldn't need transposes for c2p.

c2p is useful for mapped 3d or other platform's gfx viewing ; but it's not of much use for anything else.

Quote:

Originally Posted by Thorham

Yes, it has it's advantages. One is when using black and white text; you'd only have to scroll one plain, for example. But it's not implemented in AmigaOs that way, of course (Commodere's coders aren't exactly geniuses). When having to draw chunky graphics, however, bit planes do become quite slow.

Yes, but when having to draw planar graphics, however, chunky do become quite slow too

Quote:

Originally Posted by Thorham

Don't worry, I will tell you. It's just that I've got some stuff to take care of which really can't wait. I'm only just able to fit in this reply in between chores. I'll look at it tonight.

Do it whenever you wish, no problem

Quote:

Originally Posted by Thorham

Baffling: How difficult is it to come up with a clean and simple, but powerful and versatile non-lossy format? I could probably come up with something good in a few minutes. But no, they always have to come up with all sorts of crap

Then again, this seems to be the trend these days. Same for software: Computers keep getting faster, while software keeps getting slower

Software no longer exists now. Only bloatware

Quote:

Originally Posted by Thorham

Then I don't think you'll have anything to worry about as far as scaling fixed palette images go. I've tried something like that, too, and the quality wasn't at all bad, although everything depends on the quality of the ham rendering system used. Furthermore, for images with up to 64 colors it's simple; just use the palette as the base palette

Using the palette as the base palette isn't that easy. Computing the 12-bit table isn't fast...

Quote:

Originally Posted by Thorham

Thanks again! But don't worry, I won't. Unless I can't find anything, of course! However, I really do like doing that sort of 'work', so I'll be sure to pain my brain as much as humanly possible

If you do like it... well, okay, I like it too

So what did you find ?

Thorham · 21 January 2008, 10:49

Quote:

Originally Posted by meynaf

Maybe you should pay a little bit more attention to what happened to the greenish areas in your images...

Yes, I've noticed, but still, the difference is quite small on crt. However, seeing how ridiculously small the speed gain is, it's quite useless, and even more so when it's that obvious on tft.

Quote:

Originally Posted by meynaf

What's funny about that one is that I'm not using it for bitfields at all !
(but for a bit counter it's not bad)

Very creative

Quote:

Originally Posted by meynaf

Strange... I've got the habit of 030 coding since I have one... I was even eager to discover (and use) the new features of it...

Not really when you consider the fact that I didn't have access to any kind of 68030 docs at the time. It's only since I've had access to the net in various ways that I've obtained the docs. And guess what, I never came around to using them

Quote:

Originally Posted by meynaf

You would lose, that's the reason

(oops, sorry - please, don't hurt me !)

Yes I will

Quote:

Originally Posted by meynaf

That's my guess about it, but I can't see any other reason !
And, yes, it's an m$ format. Moreover, it's the windows native format. Well, at least it's coherent with the rest

Quote:

Originally Posted by meynaf

c2p is useful for mapped 3d or other platform's gfx viewing ; but it's not of much use for anything else.

That's true. It's applications are very limited. Still, for true color displays you'd have to change 24 planes just to write a single pixel

meaning you'd still have to use c2p. The iff designers should have figured that no one would even think of making planar 24bit hardware.

Quote:

Originally Posted by meynaf

Yes, but when having to draw planar graphics, however, chunky do become quite slow too

These formats are just incompatible, and converting one to the other will always use up too much extra time.

Quote:

Originally Posted by meynaf

Do it whenever you wish, no problem

I've taken a good look at the png format, and have to agree that there is a lot of unnecessary stuff in there. None of it is hard, but it's just a bit much. Seems like a lot of work. You'll have to determine for yourself if including this format is worth the effort. I've got an idea: If you do want to include it, we could split the work up, I'll do some, and you do some. Let me know if you're interested.

Quote:

Originally Posted by meynaf

Software no longer exists now. Only bloatware

You sure got that right

Example: On my p3 550mhz using adobe's pdf viewer is a nightmare, and completely unusable, foxit on the other hand zips along at great speed. It's not that it can't be done, they just refuse to

Quote:

Originally Posted by meynaf

Using the palette as the base palette isn't that easy. Computing the 12-bit table isn't fast...

True! I forgot you use a fast look-up table, and calculating this really isn't that fast. My guess would be that it takes at least 0.25-0.50 seconds to calculate it. On a standard a1200 that's at least eight times slower.

Quote:

Originally Posted by meynaf

If you do like it... well, okay, I like it too

So what did you find ?

Well, I have some disappointing news: After attempting to optimize the idct, I came to the conclusion that it's going to be very tough, indeed. Of course I didn't find a single thing... Either you've done an optimal job already, or I just need to try again. Did you find anything, I'm quite curious...

meynaf · 21 January 2008, 14:01

Quote:

Originally Posted by Thorham

Not really when you consider the fact that I didn't have access to any kind of 68030 docs at the time. It's only since I've had access to the net in various ways that I've obtained the docs. And guess what, I never came around to using them

Such docs were just too hard to find by the time

Quote:

Originally Posted by Thorham

The iff designers should have figured that no one would even think of making planar 24bit hardware.

Sure. But because a merge (in a c2p) is its own inverse, I may end up with a p2c and finally support those iffs anyway. The trick is that it's not 8 bits but 24.

Quote:

Originally Posted by Thorham

I've taken a good look at the png format, and have to agree that there is a lot of unnecessary stuff in there. None of it is hard, but it's just a bit much. Seems like a lot of work. You'll have to determine for yourself if including this format is worth the effort. I've got an idea: If you do want to include it, we could split the work up, I'll do some, and you do some. Let me know if you're interested.

I dunno if splitting will be easy, but that's a good idea.
However I'm currently on the jpeg part and won't start the png right now (of course you can !).

If you intend to start something, then it's quite easy to do. A codec in my viewer consists of 3 parts :
- the check routine : simply says if (or not) it's the right filetype
- the init routine : gets the image dimensions (and palette if needed)
- the decode routine : decrunches and calls the final output function
bmp.s is very simple and can be used as a skeleton project.
Of course, don't hesitate to ask me if you need some info.

Quote:

Originally Posted by Thorham

You sure got that right

Example: On my p3 550mhz using adobe's pdf viewer is a nightmare, and completely unusable, foxit on the other hand zips along at great speed. It's not that it can't be done, they just refuse to

Fast machines -> lazy programmers

Quote:

Originally Posted by Thorham

True! I forgot you use a fast look-up table, and calculating this really isn't that fast. My guess would be that it takes at least 0.25-0.50 seconds to calculate it. On a standard a1200 that's at least eight times slower.

I have the code somewhere which computes that. I don't remember how much time it took, but it sure wasn't fast. Doh I'll have to fetch it someday...

Quote:

Originally Posted by Thorham

Well, I have some disappointing news: After attempting to optimize the idct, I came to the conclusion that it's going to be very tough, indeed. Of course I didn't find a single thing... Either you've done an optimal job already, or I just need to try again.

I did what I could, maybe optimal, maybe not. I hoped different eyes could spot different things...

Hmm... I found out that the range-limit code I posted here doesn't work because cmp won't set the x bit, only c

However I have a better one now :

Code:

 cmp.l a2,d0
 blo.s .nope
 sge d0
.nope
 move.b d0,(a1)+

This code supposes a2=$100 (it previously pointed to the range-limit table), d0=value and a1=destination to be written. Same register usage at the end

Unfortunately I can't use it in the dct, because here the table access does more than range-limiting : it also rounds off the last bit and adds $80...

Thorham · 22 January 2008, 15:18

Quote:

Originally Posted by meynaf

Such docs were just too hard to find by the time

You can say that again. And the big library in Rotterdam didn't even have anything beyond 68000 documentation...

Quote:

Originally Posted by meynaf

Sure. But because a merge (in a c2p) is its own inverse, I may end up with a p2c and finally support those iffs anyway. The trick is that it's not 8 bits but 24.

If the iffs are in the basic rgb order, it would just be a matter of doing the 32bit c2p for eight bits per pixel in the opposite direction, since you're still dealing with eight bit chunks. Or am I missing something?

Quote:

Originally Posted by meynaf

I dunno if splitting will be easy, but that's a good idea.
However I'm currently on the jpeg part and won't start the png right now (of course you can !).

Then I'll probably start with the deflate algorithm, because that would be the most important part.

Quote:

Originally Posted by meynaf

If you intend to start something, then it's quite easy to do. A codec in my viewer consists of 3 parts :
- the check routine : simply says if (or not) it's the right filetype
- the init routine : gets the image dimensions (and palette if needed)
- the decode routine : decrunches and calls the final output function
bmp.s is very simple and can be used as a skeleton project.
Of course, don't hesitate to ask me if you need some info.

That sounds simple enough. The biggest thing is of course the deflate routine, but this is probably very well documented, so it shouldn't be that much of a problem.

Quote:

Originally Posted by meynaf

Fast machines -> lazy programmers

Yep, that's right. And I can't stand it. Think about it, my pc must be about 100 times faster than my miggy, yet for some strange reason some of the software on my miggy makes my pc seem like a bloody commodore 64

Quote:

Originally Posted by meynaf

I did what I could, maybe optimal, maybe not. I hoped different eyes could spot different things...

I'm not done yet, but I really doubt I'll be able to optimize this one.

meynaf · 22 January 2008, 16:59

Quote:

Originally Posted by Thorham

You can say that again.

If you ask for it, here it is :
Such docs were just too hard to find by the time.
(sorry, it was just too tempting

)

Quote:

Originally Posted by Thorham

And the big library in Rotterdam didn't even have anything beyond 68000 documentation...

Ah, they had something about 68000 ? Strange

Quote:

Originally Posted by Thorham

If the iffs are in the basic rgb order, it would just be a matter of doing the 32bit c2p for eight bits per pixel in the opposite direction, since you're still dealing with eight bit chunks. Or am I missing something?

Huh ? in the opposite direction ? You mean the same operations in the reverse order ?

Anyway, the trick here is to write one byte out of 3 instead of all bytes like you would do for an 8-bit p2c. That is, one pass for red, one for green and one for blue.

Quote:

Originally Posted by Thorham

Then I'll probably start with the deflate algorithm, because that would be the most important part.

It sure is !

Quote:

Originally Posted by Thorham

That sounds simple enough. The biggest thing is of course the deflate routine, but this is probably very well documented, so it shouldn't be that much of a problem.

You can start there :
http://www.w3.org/Graphics/PNG/RFC-1951

Quote:

Originally Posted by Thorham

Yep, that's right. And I can't stand it. Think about it, my pc must be about 100 times faster than my miggy, yet for some strange reason some of the software on my miggy makes my pc seem like a bloody commodore 64

But, man, your pc is a bl... errrh... never mind

Quote:

Originally Posted by Thorham

I'm not done yet, but I really doubt I'll be able to optimize this one.

No hardcore coder in here to remove clock cycles ?

Thorham · 23 January 2008, 20:17

Quote:

Originally Posted by meynaf

If you ask for it, here it is :
Such docs were just too hard to find by the time.
(sorry, it was just too tempting

)

No comment...

Quote:

Originally Posted by meynaf

Ah, they had something about 68000 ? Strange

They also had the rkms and a good hardware book. All this was back in the days when the Amiga was still big.

Quote:

Originally Posted by meynaf

Huh ? in the opposite direction ? You mean the same operations in the reverse order ?

Erm, yes, I did mean that. I didn't formulate that very well...

Quote:

Originally Posted by meynaf

Anyway, the trick here is to write one byte out of 3 instead of all bytes like you would do for an 8-bit p2c. That is, one pass for red, one for green and one for blue.

It would be faster to just do it all in one pass. Shouldn't really be a problem, since you can see 24bit graphics data as 8bit, where each color component is just viewed as a separate pixel.

Quote:

Originally Posted by meynaf

You can start there :
http://www.w3.org/Graphics/PNG/RFC-1951

Good link, thanks

Quote:

Originally Posted by meynaf

But, man, your pc is a bl... errrh... never mind

Hey, don't blame the pc for the mess 'programmers' create these days!

Quote:

Originally Posted by meynaf

No hardcore coder in here to remove clock cycles ?

Yeah, very strange, indeed.

meynaf · 24 January 2008, 14:22

Quote:

Originally Posted by Thorham

They also had the rkms and a good hardware book. All this was back in the days when the Amiga was still big.

The good old days <sigh>

Quote:

Originally Posted by Thorham

Erm, yes, I did mean that. I didn't formulate that very well...

Ok. Now we're sure we're talking about the same thing. But it isn't that simple. The c2p I have cuts the work in two halves : high 4 bits and low 4 bits. I can't do the exact same thing in the reverse order, else half of the data would have been missing, and writing 4 bits would be slower than writing a full byte.
I dunno if this is clear, at least I understand myself

Quote:

Originally Posted by Thorham

It would be faster to just do it all in one pass. Shouldn't really be a problem, since you can see 24bit graphics data as 8bit, where each color component is just viewed as a separate pixel.

Ahem... it looks like if I don't have enough registers to keep the 24 plane data at once...

Quote:

Originally Posted by Thorham

Good link, thanks

Any time.

Quote:

Originally Posted by Thorham

Hey, don't blame the pc for the mess 'programmers' create these days!

They don't produce mess by pure laziness. The machine's architecture has its part.

Quote:

Originally Posted by Thorham

Yeah, very strange, indeed.

Hopefully you're still here. Else I would have felt soooooo alone.

Thorham · 25 January 2008, 06:14

Quote:

Originally Posted by meynaf

The good old days <sigh>

Yep, gone for ever

Quote:

Originally Posted by meynaf

Ok. Now we're sure we're talking about the same thing. But it isn't that simple. The c2p I have cuts the work in two halves : high 4 bits and low 4 bits. I can't do the exact same thing in the reverse order, else half of the data would have been missing, and writing 4 bits would be slower than writing a full byte.
I dunno if this is clear, at least I understand myself

Yes, it's crystal clear now! What I mean is: I use a five stage c2p (same as the 32bit c2p in the doc I've sent you). In my implementation the work is split up, too. It's just that I think you can simply do those five stages in the reverse order. The routine would have to be rewritten, of course.

Quote:

Originally Posted by meynaf

Ahem... it looks like if I don't have enough registers to keep the 24 plane data at once...

If you could tell me how iff24 stores the rgb data, then I could give it a go. I really wouldn't mind doing the p2c for this. Warning: I don't know yet how to do the fastest optimizations, but at least I would be able to tell you if my theory works.

Quote:

Originally Posted by meynaf

They don't produce mess by pure laziness. The machine's architecture has its part.

But how can a 550mhz p3 be slow with most normal applications? The thing doesn't even have a mips rating, as it's about 1.8bips! Some architectures may be clumsy, like pc, but that's no excuse for all the slow crap out there, especially if some amiga software beats the p3 while running on a '030 50mhz, it's just unheard of.

Quote:

Originally Posted by meynaf

Hopefully you're still here. Else I would have felt soooooo alone.

Of course I'm still here, don't worry, meynaf, I'll never leave. I'm just not the most hardcore of all coders, that's all.

meynaf · 25 January 2008, 11:06

Quote:

Originally Posted by Thorham

Yep, gone for ever

Maybe the time will return, with another machine... who knows...

Quote:

Originally Posted by Thorham

Yes, it's crystal clear now! What I mean is: I use a five stage c2p (same as the 32bit c2p in the doc I've sent you). In my implementation the work is split up, too. It's just that I think you can simply do those five stages in the reverse order. The routine would have to be rewritten, of course.

A 5-stage c2p ? For me the p2c requires 6 stages : one for each nibble (1 nibble = 4-bits). Hopefully I've found out that it's possible to keep one nibble in registers while you're doing the other one. After that, putting them together is just another merge to do.

If reading 32 bits at once (and it's better to do so !) you can get the nibble data in 4 registers, which gives the following :
d0 data 0
d1 data 1
d2 data 2
d3 data 3
d4 temporary
d5 current AND value
d6 addy for next plane
d7 loop counter
a0 source ptr
a1 destination ptr (= rgb)
a2 save data 0
a3 save data 1
a4 save data 2
a5 save data 3
a6 (free)

Note that I didn't write it yet, I just looked how it could be done.

Quote:

Originally Posted by Thorham

If you could tell me how iff24 stores the rgb data, then I could give it a go. I really wouldn't mind doing the p2c for this. Warning: I don't know yet how to do the fastest optimizations, but at least I would be able to tell you if my theory works.

iff24 stores data pretty much like iff does. Instead of having 8 planes, you have 24, and instead of meaning a palette entry#, the resulting value means direct rgb data.
It has to be checked, but I think it's red bit 0 to bit 7, green bit 0 to bit 7, and blue bit 0 to bit 7. Not hard to reorder if it's something else, though.

The p2c may end up easier to do than a c2p, because you're not writing (nor reading) in chipmem, which requires great care. For this reason it's worthless to attempt a blitter p2c.
Also, it only has to be reasonably fast ; optimizing it to death isn't really useful.

Quote:

Originally Posted by Thorham

But how can a 550mhz p3 be slow with most normal applications? The thing doesn't even have a mips rating, as it's about 1.8bips! Some architectures may be clumsy, like pc, but that's no excuse for all the slow crap out there, especially if some amiga software beats the p3 while running on a '030 50mhz, it's just unheard of.

On the P3 you simply can't code like you would on the Amiga.
You never know through how many software layers will your data travel before it finally reaches the hardware.

Quote:

Originally Posted by Thorham

Of course I'm still here, don't worry, meynaf, I'll never leave. I'm just not the most hardcore of all coders, that's all.

Thanks to be there

But then who - and where - is the most hardcore of all coders ???

Thorham · 25 January 2008, 11:25

Quote:

Originally Posted by meynaf

Maybe the time will return, with another machine... who knows...

I certainly hope so!

Quote:

Originally Posted by meynaf

A 5-stage c2p ? For me the p2c requires 6 stages : one for each nibble (1 nibble = 4-bits). Hopefully I've found out that it's possible to keep one nibble in registers while you're doing the other one. After that, putting them together is just another merge to do.

If reading 32 bits at once (and it's better to do so !) you can get the nibble data in 4 registers, which gives the following :
d0 data 0
d1 data 1
d2 data 2
d3 data 3
d4 temporary
d5 current AND value
d6 addy for next plane
d7 loop counter
a0 source ptr
a1 destination ptr (= rgb)
a2 save data 0
a3 save data 1
a4 save data 2
a5 save data 3
a6 (free)

Note that I didn't write it yet, I just looked how it could be done.

Yep, five stage c2p! Check out the doc I sent you to see how they do it, or look at my c2p code if you still have it (still needs optimizing). The doc does a great job at explaining how to do it in just five passes, and I've based my code on it. The author says he's been busy with this stuff for years, so his method is probably one of the best.

Quote:

Originally Posted by meynaf

iff24 stores data pretty much like iff does. Instead of having 8 planes, you have 24, and instead of meaning a palette entry#, the resulting value means direct rgb data.
It has to be checked, but I think it's red bit 0 to bit 7, green bit 0 to bit 7, and blue bit 0 to bit 7. Not hard to reorder if it's something else, though.

Well, I hope it's in that order. It actually wouldn't make any sense if it was in another order, but who knows, right!

Quote:

Originally Posted by meynaf

The p2c may end up easier to do than a c2p, because you're not writing (nor reading) in chipmem, which requires great care. For this reason it's worthless to attempt a blitter p2c.
Also, it only has to be reasonably fast ; optimizing it to death isn't really useful.

And it's already going to be fast because everything is done in fastmem anyway. Of course, the faster the better

Quote:

Originally Posted by meynaf

On the P3 you simply can't code like you would on the Amiga.
You never know through how many software layers will your data travel before it finally reaches the hardware.

That's just the operating systems fault. IMHO you can't really blame the pc hardware for anything, except for being a little irritating to start coding on. Have you read the ia32 docs from intel? Great read, and shows that it's not that hard to start coding hit the hardware programs on the pc. Just use dos for booting!

Quote:

Originally Posted by meynaf

Thanks to be there

But then who - and where - is the most hardcore of all coders ???

You're welcome. But where this illustrious person is, who knows

meynaf · 25 January 2008, 12:07

Quote:

Originally Posted by Thorham

I certainly hope so!

Maybe we could start making it ourselves

Quote:

Originally Posted by Thorham

Yep, five stage c2p! Check out the doc I sent you to see how they do it, or look at my c2p code if you still have it (still needs optimizing). The doc does a great job at explaining how to do it in just five passes, and I've based my code on it. The author says he's been busy with this stuff for years, so his method is probably one of the best.

I don't have this ready at hand, but I still have it somewhere.
However, could you explain it a little ? My current c2p is a 2 pass one, and there's no need for more ?

Quote:

Originally Posted by Thorham

Well, I hope it's in that order. It actually wouldn't make any sense if it was in another order, but who knows, right!

Who knows... but swapping pointers isn't difficult anyway.

Quote:

Originally Posted by Thorham

And it's already going to be fast because everything is done in fastmem anyway. Of course, the faster the better

With a correctly done c2p, it's not much faster when done in fastmem, provided it's faster at all.

Quote:

Originally Posted by Thorham

That's just the operating systems fault. IMHO you can't really blame the pc hardware for anything, except for being a little irritating to start coding on.
Have you read the ia32 docs from intel? Great read, and shows that it's not that hard to start coding hit the hardware programs on the pc. Just use dos for booting!

Oh yes, just use dos for booting and start coding. Please do that...
(I promise I won't laugh)

Quote:

Originally Posted by Thorham

You're welcome. But where this illustrious person is, who knows

Where, who knows, but it's apparently not on EAB.

Thorham · 28 January 2008, 13:20

Quote:

Originally Posted by meynaf

Maybe we could start making it ourselves

I wish we could

Quote:

Originally Posted by meynaf

I don't have this ready at hand, but I still have it somewhere.
However, could you explain it a little ? My current c2p is a 2 pass one, and there's no need for more ?

Sure, the doc says you need to perform the following transposes in that order (everything is bits x bits): 8x2, 4x1, 16x4, 2x4, 1x2 For p2c they have to be done in the opposite order. If you're saying your c2p uses only two transposes and not five as above then I'd really like to know how it's done.

Quote:

Originally Posted by meynaf

Who knows... but swapping pointers isn't difficult anyway.

Yeah, that's true. Besides, how brain dead would they make it, anyway?

Quote:

Originally Posted by meynaf

With a correctly done c2p, it's not much faster when done in fastmem, provided it's faster at all.

It should be faster, if you have a 1280x1024 24bit image, you'd have to copy almost four megabytes around. I'm pretty sure fastmem will make things faster here. Of course that only really applies to big images.

Quote:

Originally Posted by meynaf

Oh yes, just use dos for booting and start coding. Please do that...
(I promise I won't laugh)

Erm, well, I'm serious here

I was actually talking about hitting the hardware. Just try to turn off windows xp, it should be a little bit easier to do in ms-dos then under windows. With dos you can easily setup a whole custom environment.

Quote:

Originally Posted by meynaf

Where, who knows, but it's apparently not on EAB.

They're probably not interested. Shame on them

meynaf · 28 January 2008, 14:54

Quote:

Originally Posted by Thorham

I wish we could

We can.

Quote:

Originally Posted by Thorham

Sure, the doc says you need to perform the following transposes in that order (everything is bits x bits): 8x2, 4x1, 16x4, 2x4, 1x2 For p2c they have to be done in the opposite order. If you're saying your c2p uses only two transposes and not five as above then I'd really like to know how it's done.

We weren't speaking about the same things

A 5-pass c2p is indeed 5 blocs of merges (per 1,2,4,8,16 bits).
What I meant was completely different : do the whole merge blocs 6 times (twice for 8 bits)...

Quote:

Originally Posted by Thorham

Yeah, that's true. Besides, how brain dead would they make it, anyway?

Who knows

Quote:

Originally Posted by Thorham

It should be faster, if you have a 1280x1024 24bit image, you'd have to copy almost four megabytes around. I'm pretty sure fastmem will make things faster here. Of course that only really applies to big images.

You're not only copying, you're also performing a lot of other operations on this data. Those operations are pipelined during the chipmem writes ; using fastmem instead can't be slower of course but it won't be much faster.

Quote:

Originally Posted by Thorham

Erm, well, I'm serious here

I was actually talking about hitting the hardware. Just try to turn off windows xp, it should be a little bit easier to do in ms-dos then under windows. With dos you can easily setup a whole custom environment.

Obviously you don't know what you're talking about.
What can I say ? Just do it. Then you'll know the gruesome truth.

Alternatively, if you want to hit the hardware on a pc, then I suggest you use a hammer, as it's a much easier way (and it's a lot of fun). The OS makes no difference here

Quote:

Originally Posted by Thorham

They're probably not interested. Shame on them

Yeah. They deserve public humiliation.

To go back to the topic, I have the upsample code in asm. If you like bunches of incomprehensible move/add series with an occasional lsr in them, then you'll love it.
I thought I've past the age to write such code, but no

17 January 2008, 18:26	#44
meynaf son of 68k Join Date: Nov 2007 Location: Lyon / France Age: 51 Posts: 5,323	While profiling my code, I found out that the biggest part (in terms of cpu use) is undoubtedly the dct. Of course it is due to all those muls, but maybe it can still be optimized. That thing amounts to 33% of the overall time on ordinary images (much more on grayscale ones because there is no upsample/colorspace passes). And it will be more (in percentage only !) when the rest will be optimized, no doubt. So I have posted my latest version here, with translated comments, hoping someone could find something... Last edited by meynaf; 12 May 2011 at 08:32.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
JPEG to IFF Coverter	W4r3DeV1L	request.Apps	15	14 February 2020 17:21
Overzealous Kickstart ROM - address decoding?	robinsonb5	Hardware mods	3	30 June 2013 11:09
JPEG to PNG (via CLI)	amiga_user	support.Apps	3	28 November 2011 11:50
Decoding algorithm(s) for encoded disk sectors (ADOS)	andreas	Coders. General	10	02 November 2009 22:18
Blitter MFM decoding	Photon	Coders. General	14	16 March 2006 11:24

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)