13 December 2007, 17:50 | #21 | |||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
Quote:
|
|||
13 December 2007, 18:13 | #22 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
Quote:
Well, it turned out not to be that hard (once all those constants and macros have been replaced by what they mean). But... muls, muls, muls and more muls (might remind you of something you've read recently ). Oh, and muls again. Did I forget muls ? What I have now isn't a carbon copy of the original code, I had to move things to reduce register usage. There is no init/exit code for now, and I only have the first half (columns). I've included it here, so that you can have a look at it. Not quite optimized already, there are some unneeded data movement. Of course I dunno if it works Last edited by meynaf; 12 May 2011 at 08:32. |
|||
14 December 2007, 16:55 | #23 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||
14 December 2007, 17:25 | #24 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
Quote:
But there can be some unneeded moves, and it is possible that the register usage can be reduced as well. I also strongly doubt it could be useful to replace those muls by tables, because there are just too many different constants. Quote:
Oh, you didn't see a bug already ? Curious |
||||
17 December 2007, 16:18 | #25 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||
21 December 2007, 10:47 | #26 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
Quote:
I finished it, but I found a mistake (register confusion) in the code. This : Code:
move.w d6,d1 add.w d7,d1 muls #9633,d1 muls #-16069,d6 muls #-3196,d7 add.l d1,d6 add.l d1,d7 Code:
move.w d6,d4 add.w d7,d4 muls #9633,d4 muls #-16069,d6 muls #-3196,d7 add.l d4,d6 add.l d4,d7 If you want to have a look at it, it's in the zone, along with all modified files. In the archive you'll find jidctint.s - the asm version of jidctint.c, which now is nothing but a wrapper for the asm version. You'll also find a pre-compiled version ; after the c-code for dct has vanished the exe's size has dropped. That code is probably tougher than the ham code you're used to, so to make things easier I've kept some (modified) c code as comments, and translated my comments for you. Hint : try to free regs by moving things around, that is, output something right after it is computed, to free a reg for the next computation. Then there could be a lot of possible opts if you have free regs (not only Dn but also An). Last edited by meynaf; 21 December 2007 at 14:07. |
|||
21 December 2007, 14:20 | #27 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
For the next step - the colorspace conversion may be my next victim - I'm looking for jpeg files with unusual color spaces, to test them, and to check whether they're worth supporting or not (certainly not if they are extremely rare).
Can someone fire up a photoshop and save RGB/YCCK/CMYK encoded jpeg files for me ? (as I don't have photoshop and those are apparently adobe specific) Please... Last edited by meynaf; 21 December 2007 at 14:32. |
22 December 2007, 23:02 | #28 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
Quote:
Thanks for the translation, greatly appreciated Using those translators works, but they are annoying to use, and of course, having to fill in the context manually doesn't always help, either. Quote:
I guess it was a bad idea to search for a full explanation. Even when you do completely understand the subject, it's going to be very tough to optimize the idct routine. Anyway, good job, and great looking code. Keep up the good work. Quote:
Last edited by Thorham; 22 December 2007 at 23:06. Reason: Forgot something... |
|||||
24 December 2007, 10:29 | #29 | |||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
If someone wants to optimize that, then it's better to just do what they do, regardless of what it mathematically means. As in the original IJG code, it must also perform the dequantization. Maybe the comments in jidctint.c (the original one) can be useful for you. Quote:
Quote:
In jpegs, the high frequencies are stored with less precision (-> less bits) than the lower ones, because they are less visible. That's why they can look somewhat blurred. Quote:
Quote:
However if nothing using them can be found, then they're not worth supporting in my asm code (except by throwing an error message in the face of the unfortunate user who accidentally stumbled upon such a file ). |
|||||
24 December 2007, 15:38 | #30 | ||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
I'll let you know if I find more, and I'm pretty sure I will, since the first versions of a piece of code are usually not completely optimized. |
||
24 December 2007, 15:50 | #31 | |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,448
|
Quote:
Charles Poynton is my hero when it comes to this stuff. Last edited by alexh; 24 December 2007 at 15:56. |
|
24 December 2007, 16:03 | #32 | ||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
|
||
24 December 2007, 16:14 | #33 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Does that guy have asm code for YCbCr -> RGB conversion ?
What, no 68k version ? Well, ok, I'll do it... That's the next thing I've spotted that's not too difficult and can give us an important speed increase. I promise I won't use the term yuv if it's digital video |
29 December 2007, 02:18 | #34 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Going off-topic a bit now. Ultimately I'm still wondering if there isn't a plain and simple way to effectively crunch gfx, something which doesn't require 'advanced' math, and can be implemented algorithmically. Surely something is possible, it's not as if everything has been thought of in the wonderful world of algorithms (your ham rendering engine seems to be a good example, haven't seen it before). Last edited by Thorham; 29 December 2007 at 02:36. |
|
04 January 2008, 11:12 | #35 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Of course the YCbCr->RGB conversion is simple. But it becomes more interesting if you try to do it without multiplies
The problem of gfx data is that it doesn't crunch very well. The more colourful it is, the worse the compression will be. The one and only thing I see here is that we could exploit the fact that adjacent pixels are often close in color ; they're closer in YCbCr than in RGB but changing color spaces is a lossy process because of rounding errors. My ham engine is more standard issue than it looks. I'm pretty sure nearly all high quality renderers use the very same algorithm. |
04 January 2008, 18:36 | #36 | ||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
Quote:
Quote:
|
||||
14 January 2008, 15:30 | #37 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
I tried this already, but it was quite deceiving. I ran into a lot of bugs (you can't imagine the -ahem- beautiful images I've seen) and it didn't give a good speed gain. Too many tables to peek (4), 3 data sources for 1 destination -> too many address registers used -> too much swapping. Gosh ! Maybe you'll be more lucky if you give it a try... Quote:
Quote:
I'm more interested in lossless compression, though. What if you could predict what the image will be by computing it with whatever you've already decoded, and only store the difference between the reality and your prediction ? (this has already been applied to audio, but afaik not to gfx data) (and, oh, yes, I don't have a clue on the predictors to use ) Quote:
Adpro probably makes a lot of analysis to adapt its palette before rendering, which is terribly slow (that's why I didn't want to do it too). Furthermore, if it's 100% compiled code then it's likely to be up to 4x slower... Answer here for the viewer options (off-topic in the mpega thread ) : just list a few here and we'll see. Maybe they're already planned. About the scaling, there is something that annoys me quite a lot : what to do on a palettized display ? Skipping pixels will be ugly and we don't have enough colors to get all the rgb combinations an average would make, but ham display may be even uglier than pixel skipping on some images. |
||||
15 January 2008, 03:50 | #38 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Code:
move.l In_Y,a0 move.l In_CB,a1 move.l In_CR,a2 move.l Out,a3 move.l CB_Table,a4 move.l CR_Table,a5 move.l Range_Table,a6 move.l YCBCR_Buf_Size,d5 ;Free regs: d6 and d7 .lp moveq #0,d0 moveq #0,d1 moveq #0,d2 move.b (a0)+,d0 ;Y move.b (a1)+,d1 ;CB move.b (a2)+,d2 ;CR move.b (a4,d1.l),d1 ;0.34414*Cb-128 move.b (a5,d2.l),d2 ;0.71414*Cr-128 move.l d0,d4 ;Calc green sub.l d1,d4 sub.l d2,d4 move.b (a6,d4.l),d4 ;Clip green move.b d4,(a3)+ ;Write green move.l d2,d4 ;Calc blue add.l d2,d2 ;5*Cb=1.7202*Cb instead of 1.772*Cb add.l d2,d2 add.l d2,d4 add.l d0,d4 move.b (a6,d4.l),d4 ;Clip blue move.b d4,(a3)+ ;Write blue add.l d3,d0 ;Calc red add.l d3,d0 ;2*Cr=1.42828*Cb instead of 1.402*Cr move.b (a6,d0.l),d0 ;Clip red move.b d0,(a3)+ ;Write red subq.l #1,d5 bpl .lp About the approximations: I've tested these with my YCbCr program in freebasic (damned handy), and the color differences are quite small, meaning the images look great, you literary have to see the original next to the encoded version to see any difference, otherwise you'd think it's the original! I've tested it with a straight gray scale in rgb as well, and there is no difference what so ever. If this code is faster then what you've tried, it's perfect for fast viewing in high quality. However, you will have to integrate the code yourself, and although I've tested the approximation, the asm code itself is untested and may contain a bug here and there. Nothing serious, though. Should be easy to fix if there are any. Also due to data format differences the code may not work as is, but I suppose this should still give you a good idea of what can be done. Quote:
Lossless is indeed interesting as an extra option. Trying to predict the data is interesting, too. Hadn't thought of that. I'm going to have to see if I can come up with anything. Quote:
Quote:
Quote:
|
|||||
15 January 2008, 10:58 | #39 | |||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
Quote:
Quote:
I've counted clock cycles (including pipeline) and you should get something like 123 of them per pixel. My actual code runs in 125 or so, with full accuracy. Not worth changing. But you can get rid of 8 of them, going down to 115, if you replace : - subq/bpl by a dbf (-2) - move.b (a6,dn.l),dn / move.b dn,(a3)+ by move.b (a6,dn.l),(a3)+ (-2, 3 times) However it's still not enough IMHO (a bit less than 10%, but much less for the overall speed). A 40% gain could be good though. Quote:
Here is my code, should you spot something that can be done to accelerate it : Code:
; parameters : ; a0=input_buf[0]+input_row ; a1=input_buf[1]+input_row ; a2=input_buf[2]+input_row ; a3=output_buf ; d7=num_rows ; d6=cinfo->output_width ycc_rgb_convert subq.w #1,d6 moveq #0,d0 move.l #$100,d1 ; with *8, we will go to $800 bytes after the 1st array moveq #0,d2 ; (which will make us point on the 2nd array) .yloop move.l d6,d5 move.l (a0)+,a4 move.l (a1)+,a5 move.l (a2)+,a6 movem.l a0-a3,-(a7) lea cxtab,a0 lea range_limit2+$180,a2 move.l (a3),a3 .xloop ; inner loop move.b (a4)+,d0 move.b (a5)+,d1 move.b (a6)+,d2 lea (a0,d2.w*8),a1 move.l (a1)+,d4 move.l (a1),d3 add.l d0,d3 move.b (a2,d3.w),(a3)+ lea (a0,d1.w*8),a1 add.l (a1)+,d2 add.l d4,d2 swap d2 add.w d0,d2 move.b (a2,d2.w),(a3)+ move.l d0,d3 moveq #0,d2 add.l (a1),d3 move.b (a2,d3.w),(a3)+ dbf d5,.xloop ; end of inner loop movem.l (a7)+,a0-a3 addq.l #4,a3 subq.l #1,d7 bne.s .yloop rts Other note : this one has been tested and works. But it doesn't give as much gain as we could have expected... Quote:
Quote:
Quote:
I'm waiting... Quote:
And when scaling down an iff, you have to p2c it, then scale it, then count colors to get the most used ones, then write them back into a buffer, then c2p it. Ouch ! Ilbm displaying has never been so slow ! |
|||||||||
15 January 2008, 20:32 | #40 | ||||||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,831
|
Quote:
Quote:
Quote:
Quote:
Silly me, I forgot about the moves The subq/bpl can not be changed to dbf since dbf works on words, and the input can be larger than 64kb. Unless 68030 can handle 32bit dbf (wouldn't be surprised). On the other hand you could just use two of them, since there are two unused data regs. Quote:
Code:
;Code in ham rendering routine: move.b (a0)+,d1 ;red move.b (a0)+,d2 ;green move.b (a0)+,d3 ;blue ;Changes to: move.b (a0)+,d2 ;green move.b (a0)+,d3 ;blue move.b (a0)+,d1 ;red I am a bit surprised I got the data formats just right, I was quite unsure about it. Cool Quote:
Quote:
Quote:
Quote:
Edited: Testing will be easy. All you have to do is convert 256 color images to jpeg in the highest quality setting, and use your viewer to see what it looks like! Further more, I tried adpros ham rendering on 256 color images, and although there is a loss, it's really not bad. However, that is to be expected, and it can't be helped. I've also tried it with visage, and that is just plain ugly Since your ham rendering routine is much better, it might just be ok. If you don't have the time, I can make some test images for you, since I've got a whole bunch of 256 color bmps which I ripped from the Final Fantasy 6 Playstation cd edition. Quote:
As for iffs up to 8bit per pixel, these are typical amiga format images, and probably none of them need scaling. For those, scaling would be optional, and I seriously doubt anyone would use it. Last edited by Thorham; 15 January 2008 at 21:20. |
||||||||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
JPEG to IFF Coverter | W4r3DeV1L | request.Apps | 15 | 14 February 2020 17:21 |
Overzealous Kickstart ROM - address decoding? | robinsonb5 | Hardware mods | 3 | 30 June 2013 11:09 |
JPEG to PNG (via CLI) | amiga_user | support.Apps | 3 | 28 November 2011 11:50 |
Decoding algorithm(s) for encoded disk sectors (ADOS) | andreas | Coders. General | 10 | 02 November 2009 22:18 |
Blitter MFM decoding | Photon | Coders. General | 14 | 16 March 2006 11:24 |
|
|