28 January 2008, 16:55 | #61 | |||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Anyway, have you found anything to optimize in the jpeg part? |
|||||||
28 January 2008, 18:20 | #62 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Quote:
Quote:
Cool. Here it is. Triangular 2x2 upsampling code in asm. See the original jdsample.c in the jpeg library for more info. Code:
_asm_upsample22 movem.l d0-d7/a0-a6,-(a7) ; +60 move.l 4+60(a7),a5 ; a5 = input_data move.l 8+60(a7),a6 ; a6 = output_data move.l 12+60(a7),d7 ; d7 = compptr->downsampled_width move.l 16+60(a7),d6 ; d6 = cinfo->max_v_samp_factor bsr.s h2v2_fancy_upsample movem.l (a7)+,d0-d7/a0-a6 rts ; upsample "fancy" 2x2 : (the most frequent case) ; a5=input_data, a6=output_data, d7=nb cols, d6=nb rows h2v2_fancy_upsample lsr.w #1,d6 ; we're doing two of them at once .yloop move.l -4(a5),a1 ; a1 = input_data[inrow-1] move.l (a5)+,a0 ; a0 = input_data[inrow] move.l (a5),a2 ; a2 = input_data[inrow+1] move.l (a6)+,a3 ; a3 = output_data[outrow] move.l (a6)+,a4 ; a4 = output_data[outrow+1] movem.l d6-d7/a5-a6,-(a7) ; here we have a0=src, a1=src-1, a2=src+1, a3=dest1, a4=dest2, d7=nb cols ; particular case of the 1st column ("old" values also needed for after) moveq #0,d1 ; this can be out of the loop subq.w #3,d7 ; remove first/last colums and 1 for dbf moveq #0,d5 move.b (a1)+,d5 ; a move.l d5,a5 ; [ a5 ok ] move.b (a1)+,d1 ; b [ d1 ok ] moveq #0,d2 move.b (a2)+,d2 ; u move.l d2,a6 ; [ a6 ok ] moveq #0,d6 move.b (a2)+,d6 ; v [ d6 ok ] moveq #0,d3 move.b (a0)+,d3 ; k move.l d3,d0 add.l d0,d0 add.l d0,d3 ; 3k [ d3 ok ] moveq #0,d4 move.b (a0)+,d4 ; l move.l d4,d0 add.l d0,d0 add.l d0,d4 ; 3l [ d4 ok ] add.l d3,d5 ; 3k + 1a (this-up) add.l d3,d2 ; 3k + 1u (this-dn) move.l d1,d0 ; 1b add.l d4,d0 ; 3l + 1b (next-up) add.l d5,d0 ; this + next (up) add.l d5,d5 ; this *2 add.l d5,d0 ; this *3 + next *1 add.l d5,d5 ; this *4 addq.l #8,d5 ; rounding with 8 lsr.l #4,d5 ; /16 move.b d5,(a3)+ ; top-left pixel addq.l #7,d0 ; rounding with 7 lsr.l #4,d0 ; /16 move.b d0,(a3)+ ; top-right pixel move.l d4,d0 ; 3l add.l d6,d0 ; 3l + 1v (next-dn) add.l d2,d0 ; this + next (dn) add.l d2,d2 ; this *2 add.l d2,d0 ; this *3 + next *1 add.l d2,d2 ; this *4 addq.l #7,d2 ; rounding with 7 lsr.l #4,d2 ; /16 move.b d2,(a4)+ ; bottom-left pixel addq.l #8,d0 ; rounding with 8 lsr.l #4,d0 move.b d0,(a4)+ ; bottom-right pixel ; general case .loop move.l d1,d2 ; b add.l d2,d2 add.l d1,d2 ; 3b move.l d3,d0 ; (save 3k) move.l d3,d5 ; (oops... forgot this one) add.l a5,d5 ; 3k + 1a move.l d1,a5 ; b [ a5 ok ] move.b (a1)+,d1 ; c [ d1 ok ] move.l d4,d3 ; 3l [ d3 ok ] add.l d4,d4 add.l d3,d4 ; *3 -> 9l add.l d4,d2 ; 9l + 3b add.l d2,d5 ; 9l + 3b + 3k + 1a ; here : d4=9l, d2=9l+3b, d0=3k addq.l #8,d5 ; +8 to round lsr.l #4,d5 ; >>4 move.b d5,(a3)+ ; and here is our top-left pixel add.l a6,d0 ; 3k + 1u move.l d6,a6 ; v [ a6 ok ] add.l d6,d6 add.l a6,d6 ; 3v add.l d4,d6 ; 9l + 3v moveq #0,d5 move.b (a0)+,d5 ; m move.l d5,d4 add.l d5,d5 add.l d5,d4 ; 3m [ d4 ok ] add.l d4,d2 ; 9l + 3b + 3m add.l d1,d2 ; 9l + 3b + 3m + 1c addq.l #7,d2 ; +7 to round lsr.l #4,d2 ; >>4 move.b d2,(a3)+ ; and here is our top-right pixel ; here : d0=3k+1u, d4=3m, d6=9l+3v add.l d6,d0 ; 9l + 3v + 3k + 1u addq.l #7,d0 ; +7 to round lsr.l #4,d0 move.b d0,(a4)+ ; and here is our bottom-left pixel ; here : d4=3m, d6=9l+3v move.l d6,d2 ; 9l + 3v add.l d4,d2 ; 9l + 3v + 3m moveq #0,d6 move.b (a2)+,d6 ; w [ d6 ok ] add.l d6,d2 addq.l #8,d2 ; +8 to round lsr.l #4,d2 move.b d2,(a4)+ ; et voilĂ notre pixel bottom-right dbf d7,.loop ; particular case of the last column add.l d4,d1 ; 3m + 1c (this-up) add.l d4,d6 ; 3m + 1w (this-dn) move.l d3,d0 add.l a5,d0 ; 3l + 1b (last-up) add.l a6,d3 ; 3l + 1v (last-dn) add.l d1,d0 ; this + last (up) add.l d1,d1 ; this *2 add.l d1,d0 ; this *3 + last *1 add.l d1,d1 ; this *4 addq.l #8,d0 ; rounding with 8 lsr.l #4,d0 ; /16 move.b d0,(a3)+ ; pixel top-left addq.l #7,d1 ; rounding with 7 lsr.l #4,d1 ; /16 move.b d1,(a3)+ ; pixel top-right add.l d6,d3 ; this + last (dn) add.l d6,d6 ; this *2 add.l d6,d3 ; this *3 + last *1 add.l d6,d6 ; this *4 addq.l #7,d3 ; rounding with 7 lsr.l #4,d3 ; /16 move.b d3,(a4)+ ; pixel bottom-left addq.l #8,d6 ; rounding with 8 lsr.l #4,d6 move.b d6,(a4)+ ; pixel bottom-right ; line loop movem.l (a7)+,d6-d7/a5-a6 subq.w #1,d6 bne .yloop rts Yep. Those interested can find the code here : http://meynaf.free.fr/tmp/v.zip |
|||
29 January 2008, 10:01 | #63 | ||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
29 January 2008, 11:05 | #64 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
That would require a new thread, if not a whole site !
Basically, it's making an open, lightweight, efficient, cool to code, computer architecture, which remains in adequation with todays requirements. Remember : architectures persist longer than implementations. The machine has to be some sort of "generic box" like PCs are meant to be. It must be user friendly as well as programmer friendly, like Amigas are. I am unsure an Amiga board is the right place to discuss of this. Quote:
Quote:
See how it's easy to bang on the hardware, how well it is documented... See the poetry of the various memory models... Good. I'm sure you feel much better now. Quote:
. 9/16 of current pixel value . 3/16 of left or right pixel value . 3/16 of top or bottom pixel value . 1/16 of diagonal pixel value How would a bilinear filter do that ? (you have 1 pixel and want to output 4) A box filter would simply copy them around ; not good. Quote:
I don't have Dice or VBCC, and I don't like SasC's command-line stuff. However I have StormC, maybe not the latest version, but I may look if I can do a project file, so you can compile the project. The jpeg library compiles litteraly everywhere, but linking with asm (especially mine ) is something else (I wouldn't try this with gcc). On the other hand you can simply disable the jpeg support and assemble the program (you would notice a major exe size drop then). |
||||
29 January 2008, 11:46 | #65 | |||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Code:
For yy=0 To 511 Step 2 If InKey$<>"" Then Stop For xx=0 To 639 Step 2 xxx=xx\2+640:yyy=yy\2 p1=pointg(xxx-1,yyy-1):p2=Pointg(xxx,yyy-1):p3=Pointg(xxx+1,yyy-1) p4=pointg(xxx-1,yyy):p5=Pointg(xxx,yyy):p6=Pointg(xxx+1,yyy) p7=pointg(xxx-1,yyy+1):p8=Pointg(xxx,yyy+1):p9=Pointg(xxx+1,yyy+1) p=(p1+p2+p4+p5)\4:plot(xx,yy+512,p) p=(p2+p3+p5+p6)\4:plot(xx+1,yy+512,p) p=(p4+p5+p7+p8)\4:plot(xx,yy+1+512,p) p=(p5+p6+p8+p9)\4:plot(xx+1,yy+1+512,p) Next Next Quote:
|
|||||||
29 January 2008, 12:41 | #66 | ||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
http://eab.abime.net/showthread.php?t=34571 Quote:
Quote:
But finally you did it. Now you're a Man Quote:
You have 4 pixels to write : up-left, up-right, down-left, down-right. All of them get 9/16 of (x,y), and : - For up-left : 1/16 of (x-1,y-1), 3/16 of (x-1,y), 3/16 of (x,y-1) - For up-right : 1/16 of (x+1, y-1), 3/16 of (x+1,y), 3/16 of (x,y-1) - For down-left : 1/16 of (x-1,y+1), 3/16 of (x-1,y), 3/16 of (x,y+1) - For down-right : 1/16 of (x+1,y+1), 3/16 of (x+1,y), 3/16 of (x,y+1) Quote:
Quote:
|
||||||
29 January 2008, 13:56 | #67 | |||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
By the way, you may still try bilinear for a speed gain on a plain a1200. I'll try it with my ycbcr program in basic. Quote:
Code:
p=(p1*1+p2*3+p4*3+p5*9)\16:plot(xx+640,yy+512,p) p=(p2*3+p3*1+p5*9+p6*3)\16:plot(xx+1+640,yy+512,p) p=(p4*3+p5*9+p7*1+p8*3)\16:plot(xx+640,yy+1+512,p) p=(p5*9+p6*3+p8*3+p9*1)\16:plot(xx+1+640,yy+1+512,p) Quote:
Last edited by Thorham; 29 January 2008 at 14:02. |
|||||||
29 January 2008, 14:54 | #68 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
(do I sound credible ?) Quote:
Yep. Now you are allowed to bash other people's peecees Quote:
(mine is actually 118 if I counted right) Quote:
At worse you can compile sources separately, then link them manually with e.g. phxlnk. Not very practical but better than nothing. |
||||
29 January 2008, 16:05 | #69 | ||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Code:
;Bilinear 2x2 ; ;For triangular the averageing blocks should ;look something like this: ; ; move.l d0,d7 ; lsl.l #3,d7 ; add.l d0,d7 ; add.l d1,d7 ; add.l d1,d1 ; add.l d1,d7 ; add.l d2,d7 ; add.l d2,d2 ; add.l d2,d7 ; add.l d3,d7 ; lsr.l #4,d7 ; ;Note that for equal weights of 1, the order ;is not important. For triangular in the above ;example they have to be done in the right order. ;But, of course, you knew that, lol. ; Filter move.l In,a0 sub.l #Width,a0 move.l In,a1 move.l In,a2 add.l #Width,a2 move.l Out,a3 move.l Out,a4 add.l #Width*2,a4 move.l #Width-1,d6 moveq #0,d0 moveq #0,d1 moveq #0,d2 moveq #0,d3 moveq #0,d4 moveq #0,d5 .lp move.b (a0)+,d0 ;Read block 1 move.b (a0)+,d1 move.b (a1)+,d2 move.b (a1)+,d3 move.l d0,d7 ;Calc averages add.l d1,d7 add.l d2,d7 add.l d3,d7 lsr.l #2,d7 move.b d7,(a3)+ ;Write pixel 1 move.b (a0)+,d4 ;Read block 2 move.b (a1)+,d5 move.l d1,d7 ;Calc averages add.l d4,d7 add.l d3,d7 add.l d5,d7 lsr.l #2,d7 move.b d7,(a3)+ ;Write pixel 2 move.b (a2)+,d0 ;Read block 3 move.b (a2)+,d1 move.l d2,d7 ;Calc averages add.l d3,d7 add.l d0,d7 add.l d1,d7 lsr.l #2,d7 move.b d7,(a4)+ ;Write pixel 3 move.b (a2)+,d0 ;Read block 4 move.l d3,d7 ;Calc averages add.l d5,d7 add.l d1,d7 add.l d0,d7 lsr.l #2,d7 move.b d7,(a4)+ ;Write pixel 4 dbra d6,.lp Quote:
Quote:
|
||||||
29 January 2008, 17:23 | #70 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
And lots of in return. Even more fun Quote:
You are reading 3 bytes for each source line in each loop ; you should only read 1 or adjust pointers afterwards (or the funniest way : keep the old values). But, pal, people having a plain a1200 are already prepared to wait ages before the image shows up, so a very slightly faster version won't fit them. (said otherwise : when you have to wait a century, you're not after a few years...) And, oh, yes, I've counted the clock cycles of your version and ended up with 144/loop (slower than mine, heheh). How will the optimized version look like ? Quote:
Quote:
But even if you're successful in that way, I will try the StormC project. |
||||
29 January 2008, 17:51 | #71 | ||||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
29 January 2008, 18:26 | #72 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Quote:
(/me tries to look innocent and fails) Quote:
Quote:
The compiler which produces the best code on 68k is gcc, but you can't use it to link with asm because of its incompatible object format (with hunk2gcc and gcc's linker it might be possible though). For the others I frankly don't know. They are the same (1) to me. (1) : add the "crap" word here if you like, else leave it blank No problem. |
||||
30 January 2008, 16:32 | #73 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Code:
Filter move.l In,a0 sub.l #Width,a0 move.l In,a1 move.l In,a2 add.l #Width,a2 move.l Out,a3 move.l Out,a4 add.l #Width*2,a4 move.l #Width/2-1,d6 moveq #0,d0 moveq #0,d1 moveq #0,d2 moveq #0,d3 moveq #0,d4 moveq #0,d5 .lpen ;Entry code (unoptimized) move.b (a0)+,d0 move.b (a0)+,d1 move.b (a1)+,d2 move.b (a1)+,d3 move.l d0,d7 lsl.l #3,d7 add.l d3,d7 add.l d3,d7 add.l d1,d7 add.l d1,d7 add.l d1,d7 add.l d2,d7 add.l d2,d7 add.l d2,d7 lsr.l #4,d7 move.b d7,(a3)+ ;Write top-left move.l d0,d7 add.l d0,d7 add.l d0,d7 add.l d1,d7 add.l d1,d7 add.l d1,d7 add.l d2,d7 add.l d3,d7 lsr.l #3,d7 move.b d7,(a3)+ ;Write top-right move.l d0,d7 add.l d0,d7 add.l d0,d7 add.l d1,d7 add.l d2,d7 add.l d2,d7 add.l d2,d7 add.l d3,d7 lsr.l #3,d7 move.b d7,(a4)+ ;Write bottom-left move.l d0,d7 add.l d1,d7 add.l d2,d7 add.l d3,d7 lsr.l #2,d7 move.b d7,(a4)+ ;Write bottom-right .lp ;Rest of row. Here d1 and d2 contain old values move.b (a0)+,d0 move.b (a1)+,d2 move.l d1,d7 ;x8 x3 x3 x2 lsl.l #3,d7 add.l d2,d7 add.l d2,d7 move.l d0,a5 add.l a5,a5 add.l d0,a5 add.l a5,d7 move.l d3,d4 add.l d4,d4 add.l d3,d4 add.l d4,d7 lsr.l #4,d7 move.b d7,(a3)+ ;Write top-left move.l d1,d5 ;x3 x3 x1 x1 add.l d5,d5 add.l d1,d5 move.l d5,d7 add.l a5,d7 add.l d3,d7 add.l d2,d7 lsr.l #3,d7 move.b d7,(a3)+ ;Write top-right add.l d0,d5 ;x3 x1 x3 x1 add.l d4,d5 add.l d2,d5 lsr.l #3,d5 move.b d5,(a4)+ ;Write bottom-left move.l d1,d7 ;x1 x1 x1 x1 add.l d0,d7 add.l d3,d7 add.l d2,d7 lsr.l #2,d7 move.b d7,(a4)+ ;Write bottom-right ;Next four pixels. Here d0 and d2 contain old values. move.b (a0)+,d1 move.b (a1)+,d3 move.l d0,d7 ;x8 x3 x3 x2 lsl.l #3,d7 add.l d3,d7 add.l d3,d7 move.l d1,a5 add.l d5,a5 add.l d1,a5 add.l a5,d7 move.l d2,d4 add.l d4,d4 add.l d2,d4 add.l d4,d7 lsr.l #4,d7 move.b d7,(a3)+ ;Write top-left move.l d0,d5 ;x3 x3 x1 x1 add.l d5,d5 add.l d0,d5 move.l d5,d7 add.l a5,d7 add.l d2,d7 add.l d3,d7 lsr.l #3,d7 move.b d7,(a3)+ ;Write top-right add.l d1,d5 ;x3 x1 x3 x1 add.l d4,d5 add.l d3,d5 lsr.l #3,d5 move.b d5,(a4)+ ;Write bottom-left move.l d0,d7 ;x1 x1 x1 x1 add.l d1,d7 add.l d2,d7 add.l d3,d7 lsr.l #2,d7 move.b d7,(a4)+ ;Write bottom-right dbf d6,.lp ;Here some exit code for the last pixels in the row is needed. Furthermore the inner loop is somewhat optimized, while the entry code is not, although it can be optimized in the same way as I did for the rest of the code. If this is as good as it's supposed to be (which I can't tell now) then try to beat it |
|
30 January 2008, 17:29 | #74 | |||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Maybe you could get your hands on a 1083S or similar monitor... Oh yes some fresh code to look at ! At first glance I'd say that you're reading from 2 sources, not 3. Shouldn't you access 3 lines (previous, current, next) ? Quote:
Quote:
All lemm... errrh... clock cycles accounted for : 100 per 4-pixel write. Ok it's fast (18% as compared to mine). But for the quality I have serious doubts (see my remark above about reading only 2 lines). Anyway it doesn't perform the exact same work. |
|||
30 January 2008, 17:43 | #75 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
Man, this sucks. I didn't want to do this today, but I'm going to try and repair the cable. I really can't work like this, argh |
|||||
30 January 2008, 18:06 | #76 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Quote:
However if it saves you some moves then it's ok. Quote:
Quote:
I'm sad for your cable. R.I.P. |
||||
30 January 2008, 19:07 | #77 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||
31 January 2008, 09:51 | #78 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Quote:
(Well, in chipmem things are a little bit more complex, but here we're accessing fastmem only.) Quote:
(Btw why do you always write "then" instead of "than" ?) Quote:
But, please tell me : where does your algorithm come from ? A broken cable with a working monitor, and a working cable with a broken monitor... so you'll end up with a broken cable and a broken monitor |
||||
31 January 2008, 11:01 | #79 | |||||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,839
|
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||
31 January 2008, 14:13 | #80 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
Quote:
Quote:
Quote:
Note that if you can't beat mine (and you won't, heheh ) there is still the 2:1 version to check (also triangular interpolation but writes 2 horizontal pixels and 1 vertical). A more common case than I first expected. Resurrected ! It's miraculous. You're a wizard, man |
||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
JPEG to IFF Coverter | W4r3DeV1L | request.Apps | 15 | 14 February 2020 17:21 |
Overzealous Kickstart ROM - address decoding? | robinsonb5 | Hardware mods | 3 | 30 June 2013 11:09 |
JPEG to PNG (via CLI) | amiga_user | support.Apps | 3 | 28 November 2011 11:50 |
Decoding algorithm(s) for encoded disk sectors (ADOS) | andreas | Coders. General | 10 | 02 November 2009 22:18 |
Blitter MFM decoding | Photon | Coders. General | 14 | 16 March 2006 11:24 |
|
|