16 January 2008, 13:35 | #41 | ||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Quote:
Code:
spl d5 cmp.l d6,d0 subx.b d7,d7 and.b d5,d0 or.b d7,d0 Range-limit contest : who can do a faster one ? Quote:
Jpeg does enough losses by itself, and the ham rendering has its own. I want quality, you know. Frankly I would accept the deal without hesitation... if it was the other way (a little slower -> a little more quality). Quote:
And, no, the 030 can't handle 32-bit dbfs, but you can still do dbf followed by sub.l #$10000/bcc on the same register. Quote:
I don't want to kill the modularity of my code for a little gain in one codec. Quote:
Quote:
Now I can't accept the trade of quality, even hardly noticeable, for such a little gain. Sorry again. I sure need it Maybe you could have a look too if you want to have fun Quote:
Quote:
It's the <=32 color ones that could be really nasty looking in ham. Quote:
Maybe they thought about a possible future hardware when they made it. But the png isn't a clean format either from what I've read in the specs. Too complex for what it does. Iff is much simpler. Well... maybe I can just say in the docs that scaling is supported only for true color images |
||||||||||
16 January 2008, 18:01 | #42 | ||||||||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||||||||
17 January 2008, 11:18 | #43 | |||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Quote:
spl is pretty like bpl, but it sets the byte to FF if condition is true, 00 else. subx is like sub, but it also subtracts the X flag. And so, subx on a register with itself will give 0 if X=0, -1 if X=1. No other coder in here ? Don't tell me I'm the best out there Quote:
Quote:
Quote:
Furthermore, I've added a pbmplus module... Quote:
I looked at the png for a possible support in my viewer, and what I saw didn't please me. It's not very complex but it's sure more complex than it really needs. If you have a look at the specs (they're easy to find), please tell me what you think about it. Quote:
Sometimes I wanted that scale option just to get an answer to "what's this awfully big image ?" - and this doesn't need quality. |
|||||||
17 January 2008, 18:26 | #44 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
While profiling my code, I found out that the biggest part (in terms of cpu use) is undoubtedly the dct.
Of course it is due to all those muls, but maybe it can still be optimized. That thing amounts to 33% of the overall time on ordinary images (much more on grayscale ones because there is no upsample/colorspace passes). And it will be more (in percentage only !) when the rest will be optimized, no doubt. So I have posted my latest version here, with translated comments, hoping someone could find something... Last edited by meynaf; 12 May 2011 at 08:32. |
18 January 2008, 09:44 | #45 | ||||||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||||||
18 January 2008, 10:33 | #46 | ||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Quote:
If you like strange 020+ instructions I have some code with a bfffo Quote:
Yes, even though I plan on changing the data representation if it's helpful, it's far from done, and since dbf is (slightly) faster, I'll sure keep it. Quote:
Quote:
I'm not sure it would have been slow if using a correct hardware. At least it's not slow up to 8 bits, so I don't see why it would for 24. The planar format may look completely stupid, but it has its advantages. You don't need to write as many graphics routines as you have different number of bits per pixel ; just loop on the planes and you're done. Quote:
That thing is on my todolist since the beginning, but if it ends up too complex I'll bail out. Quote:
Quote:
|
||||||||
18 January 2008, 16:44 | #47 | |||||||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||||||||
18 January 2008, 17:58 | #48 | |||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Quote:
(but for a bit counter it's not bad) Quote:
Quote:
(oops, sorry - please, don't hurt me !) Quote:
And, yes, it's an m$ format. Moreover, it's the windows native format. Well, at least it's coherent with the rest Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
So what did you find ? |
|||||||||||
21 January 2008, 10:49 | #49 | |||||||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Last edited by Thorham; 21 January 2008 at 13:58. |
|||||||||||
21 January 2008, 14:01 | #50 | ||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Quote:
Quote:
However I'm currently on the jpeg part and won't start the png right now (of course you can !). If you intend to start something, then it's quite easy to do. A codec in my viewer consists of 3 parts : - the check routine : simply says if (or not) it's the right filetype - the init routine : gets the image dimensions (and palette if needed) - the decode routine : decrunches and calls the final output function bmp.s is very simple and can be used as a skeleton project. Of course, don't hesitate to ask me if you need some info. Quote:
Quote:
Quote:
Hmm... I found out that the range-limit code I posted here doesn't work because cmp won't set the x bit, only c However I have a better one now : Code:
cmp.l a2,d0 blo.s .nope sge d0 .nope move.b d0,(a1)+ Unfortunately I can't use it in the dct, because here the table access does more than range-limiting : it also rounds off the last bit and adds $80... |
||||||
22 January 2008, 15:18 | #51 | ||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
22 January 2008, 16:59 | #52 | |||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
If you ask for it, here it is :
Such docs were just too hard to find by the time. (sorry, it was just too tempting ) Quote:
Quote:
Anyway, the trick here is to write one byte out of 3 instead of all bytes like you would do for an 8-bit p2c. That is, one pass for red, one for green and one for blue. Quote:
Quote:
http://www.w3.org/Graphics/PNG/RFC-1951 Quote:
No hardcore coder in here to remove clock cycles ? |
|||||
23 January 2008, 20:17 | #53 | |||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||||
24 January 2008, 14:22 | #54 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Ok. Now we're sure we're talking about the same thing. But it isn't that simple. The c2p I have cuts the work in two halves : high 4 bits and low 4 bits. I can't do the exact same thing in the reverse order, else half of the data would have been missing, and writing 4 bits would be slower than writing a full byte. I dunno if this is clear, at least I understand myself Quote:
Any time. Quote:
Hopefully you're still here. Else I would have felt soooooo alone. |
|||
25 January 2008, 06:14 | #55 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||
25 January 2008, 11:06 | #56 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Maybe the time will return, with another machine... who knows...
Quote:
If reading 32 bits at once (and it's better to do so !) you can get the nibble data in 4 registers, which gives the following : d0 data 0 d1 data 1 d2 data 2 d3 data 3 d4 temporary d5 current AND value d6 addy for next plane d7 loop counter a0 source ptr a1 destination ptr (= rgb) a2 save data 0 a3 save data 1 a4 save data 2 a5 save data 3 a6 (free) Note that I didn't write it yet, I just looked how it could be done. Quote:
It has to be checked, but I think it's red bit 0 to bit 7, green bit 0 to bit 7, and blue bit 0 to bit 7. Not hard to reorder if it's something else, though. The p2c may end up easier to do than a c2p, because you're not writing (nor reading) in chipmem, which requires great care. For this reason it's worthless to attempt a blitter p2c. Also, it only has to be reasonably fast ; optimizing it to death isn't really useful. Quote:
You never know through how many software layers will your data travel before it finally reaches the hardware. Quote:
But then who - and where - is the most hardcore of all coders ??? |
||||
25 January 2008, 11:25 | #57 | ||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
25 January 2008, 12:07 | #58 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Maybe we could start making it ourselves
Quote:
However, could you explain it a little ? My current c2p is a 2 pass one, and there's no need for more ? Quote:
Quote:
Quote:
(I promise I won't laugh) Where, who knows, but it's apparently not on EAB. |
||||
28 January 2008, 13:20 | #59 | ||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
28 January 2008, 14:54 | #60 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
We can.
Quote:
A 5-pass c2p is indeed 5 blocs of merges (per 1,2,4,8,16 bits). What I meant was completely different : do the whole merge blocs 6 times (twice for 8 bits)... Quote:
Quote:
Quote:
What can I say ? Just do it. Then you'll know the gruesome truth. Alternatively, if you want to hit the hardware on a pc, then I suggest you use a hammer, as it's a much easier way (and it's a lot of fun). The OS makes no difference here Yeah. They deserve public humiliation. To go back to the topic, I have the upsample code in asm. If you like bunches of incomprehensible move/add series with an occasional lsr in them, then you'll love it. I thought I've past the age to write such code, but no |
||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
JPEG to IFF Coverter | W4r3DeV1L | request.Apps | 15 | 14 February 2020 17:21 |
Overzealous Kickstart ROM - address decoding? | robinsonb5 | Hardware mods | 3 | 30 June 2013 11:09 |
JPEG to PNG (via CLI) | amiga_user | support.Apps | 3 | 28 November 2011 11:50 |
Decoding algorithm(s) for encoded disk sectors (ADOS) | andreas | Coders. General | 10 | 02 November 2009 22:18 |
Blitter MFM decoding | Photon | Coders. General | 14 | 16 March 2006 11:24 |
|
|