jpeg decoding in full asm - Page 4

Thorham · 28 January 2008, 16:55

Quote:

Originally Posted by meynaf

We can.

Well, my electronics knowledge is just way too limited for building a computer, or did you have something else in mind?

Quote:

Originally Posted by meynaf

We weren't speaking about the same things

A 5-pass c2p is indeed 5 blocs of merges (per 1,2,4,8,16 bits).
What I meant was completely different : do the whole merge blocs 6 times (twice for 8 bits)...

Oops

That can happen!

Quote:

Originally Posted by meynaf

You're not only copying, you're also performing a lot of other operations on this data. Those operations are pipelined during the chipmem writes ; using fastmem instead can't be slower of course but it won't be much faster.

Hmm, yes, if you look at it from that angle, I have to agree with you. I'll have to get into this whole pipeline thing. Boy, am I stuck with that plain 68000 code, or what

Quote:

Originally Posted by meynaf

Obviously you don't know what you're talking about.
What can I say ? Just do it. Then you'll know the gruesome truth.

Doesn't it involve setting up things like the segment table so that you can switch from real mode to protected mode? It would be a pain to write, I guess. With the ia32 docs from intel, it couldn't be that hard, though. And it would only have to be done once if you do it properly. But sure, I don't know a lot about it.

Quote:

Originally Posted by meynaf

Alternatively, if you want to hit the hardware on a pc, then I suggest you use a hammer, as it's a much easier way (and it's a lot of fun). The OS makes no difference here

Yay, I'm going to try that today

Quote:

Originally Posted by meynaf

Yeah. They deserve public humiliation.

Indeed

Quote:

Originally Posted by meynaf

To go back to the topic, I have the upsample code in asm. If you like bunches of incomprehensible move/add series with an occasional lsr in them, then you'll love it.
I thought I've past the age to write such code, but no

Oh, cool. Post it, please

Anyway, have you found anything to optimize in the jpeg part?

meynaf · 28 January 2008, 18:20

Quote:

Originally Posted by Thorham

Well, my electronics knowledge is just way too limited for building a computer, or did you have something else in mind?

A new computer, child of the Amiga, is a dream of mine. I have some specs on how it could look like from a programmer's point of view, but I'm no electronician either.

Quote:

Originally Posted by Thorham

Hmm, yes, if you look at it from that angle, I have to agree with you. I'll have to get into this whole pipeline thing. Boy, am I stuck with that plain 68000 code, or what

Basically, when the cpu writes something, it doesn't wait for the write to complete before doing something else in registers. So it's a good habit to put register-only code after memory writes if you can.

Quote:

Originally Posted by Thorham

Doesn't it involve setting up things like the segment table so that you can switch from real mode to protected mode? It would be a pain to write, I guess. With the ia32 docs from intel, it couldn't be that hard, though. And it would only have to be done once if you do it properly. But sure, I don't know a lot about it.

As I already said : just do it.

Quote:

Originally Posted by Thorham

Yay, I'm going to try that today

Cool.

Quote:

Originally Posted by Thorham

Oh, cool. Post it, please

Here it is. Triangular 2x2 upsampling code in asm. See the original jdsample.c in the jpeg library for more info.

Code:

_asm_upsample22
 movem.l d0-d7/a0-a6,-(a7)        ; +60
 move.l 4+60(a7),a5            ; a5 = input_data
 move.l 8+60(a7),a6            ; a6 = output_data
 move.l 12+60(a7),d7            ; d7 = compptr->downsampled_width
 move.l 16+60(a7),d6            ; d6 = cinfo->max_v_samp_factor
 bsr.s h2v2_fancy_upsample
 movem.l (a7)+,d0-d7/a0-a6
 rts

; upsample "fancy" 2x2 : (the most frequent case)
; a5=input_data, a6=output_data, d7=nb cols, d6=nb rows
h2v2_fancy_upsample
 lsr.w #1,d6            ; we're doing two of them at once
.yloop
 move.l -4(a5),a1        ; a1 = input_data[inrow-1]
 move.l (a5)+,a0        ; a0 = input_data[inrow]
 move.l (a5),a2            ; a2 = input_data[inrow+1]
 move.l (a6)+,a3        ; a3 = output_data[outrow]
 move.l (a6)+,a4        ; a4 = output_data[outrow+1]
 movem.l d6-d7/a5-a6,-(a7)

; here we have a0=src, a1=src-1, a2=src+1, a3=dest1, a4=dest2, d7=nb cols
; particular case of the 1st column ("old" values also needed for after)
 moveq #0,d1            ; this can be out of the loop
 subq.w #3,d7            ; remove first/last colums and 1 for dbf
 moveq #0,d5
 move.b (a1)+,d5        ; a
 move.l d5,a5            ;            [ a5 ok ]
 move.b (a1)+,d1        ; b            [ d1 ok ]
 moveq #0,d2
 move.b (a2)+,d2        ; u
 move.l d2,a6            ;            [ a6 ok ]
 moveq #0,d6
 move.b (a2)+,d6        ; v            [ d6 ok ]
 moveq #0,d3
 move.b (a0)+,d3        ; k
 move.l d3,d0
 add.l d0,d0
 add.l d0,d3            ; 3k            [ d3 ok ]
 moveq #0,d4
 move.b (a0)+,d4        ; l
 move.l d4,d0
 add.l d0,d0
 add.l d0,d4            ; 3l            [ d4 ok ]
 add.l d3,d5            ; 3k + 1a    (this-up)
 add.l d3,d2            ; 3k + 1u    (this-dn)
 move.l d1,d0            ; 1b
 add.l d4,d0            ; 3l + 1b    (next-up)
 add.l d5,d0            ; this + next (up)
 add.l d5,d5            ; this *2
 add.l d5,d0            ; this *3 + next *1
 add.l d5,d5            ; this *4
 addq.l #8,d5            ; rounding with 8
 lsr.l #4,d5            ; /16
 move.b d5,(a3)+        ; top-left pixel
 addq.l #7,d0            ; rounding with 7
 lsr.l #4,d0            ; /16
 move.b d0,(a3)+        ; top-right pixel
 move.l d4,d0            ; 3l
 add.l d6,d0            ; 3l + 1v    (next-dn)
 add.l d2,d0            ; this + next (dn)
 add.l d2,d2            ; this *2
 add.l d2,d0            ; this *3 + next *1
 add.l d2,d2            ; this *4
 addq.l #7,d2            ; rounding with 7
 lsr.l #4,d2            ; /16
 move.b d2,(a4)+        ; bottom-left pixel
 addq.l #8,d0            ; rounding with 8
 lsr.l #4,d0
 move.b d0,(a4)+        ; bottom-right pixel

; general case
.loop
 move.l d1,d2            ; b
 add.l d2,d2
 add.l d1,d2            ; 3b
 move.l d3,d0            ; (save 3k)
 move.l d3,d5            ; (oops... forgot this one)
 add.l a5,d5            ; 3k + 1a
 move.l d1,a5            ; b            [ a5 ok ]
 move.b (a1)+,d1        ; c            [ d1 ok ]
 move.l d4,d3            ; 3l            [ d3 ok ]
 add.l d4,d4
 add.l d3,d4            ; *3 -> 9l
 add.l d4,d2            ; 9l + 3b
 add.l d2,d5            ; 9l + 3b + 3k + 1a
; here : d4=9l, d2=9l+3b, d0=3k
 addq.l #8,d5            ; +8 to round
 lsr.l #4,d5            ; >>4
 move.b d5,(a3)+        ; and here is our top-left pixel
 add.l a6,d0            ; 3k + 1u
 move.l d6,a6            ; v            [ a6 ok ]
 add.l d6,d6
 add.l a6,d6            ; 3v
 add.l d4,d6            ; 9l + 3v
 moveq #0,d5
 move.b (a0)+,d5        ; m
 move.l d5,d4
 add.l d5,d5
 add.l d5,d4            ; 3m            [ d4 ok ]
 add.l d4,d2            ; 9l + 3b + 3m
 add.l d1,d2            ; 9l + 3b + 3m + 1c
 addq.l #7,d2            ; +7 to round
 lsr.l #4,d2            ; >>4
 move.b d2,(a3)+        ; and here is our top-right pixel
; here : d0=3k+1u, d4=3m, d6=9l+3v
 add.l d6,d0            ; 9l + 3v + 3k + 1u
 addq.l #7,d0            ; +7 to round
 lsr.l #4,d0
 move.b d0,(a4)+        ; and here is our bottom-left pixel
; here : d4=3m, d6=9l+3v
 move.l d6,d2            ; 9l + 3v
 add.l d4,d2            ; 9l + 3v + 3m
 moveq #0,d6
 move.b (a2)+,d6        ; w            [ d6 ok ]
 add.l d6,d2
 addq.l #8,d2            ; +8 to round
 lsr.l #4,d2
 move.b d2,(a4)+        ; et voilà notre pixel bottom-right
 dbf d7,.loop

; particular case of the last column
 add.l d4,d1            ; 3m + 1c    (this-up)
 add.l d4,d6            ; 3m + 1w    (this-dn)
 move.l d3,d0
 add.l a5,d0            ; 3l + 1b    (last-up)
 add.l a6,d3            ; 3l + 1v    (last-dn)
 add.l d1,d0            ; this + last (up)
 add.l d1,d1            ; this *2
 add.l d1,d0            ; this *3 + last *1
 add.l d1,d1            ; this *4
 addq.l #8,d0            ; rounding with 8
 lsr.l #4,d0            ; /16
 move.b d0,(a3)+        ; pixel top-left
 addq.l #7,d1            ; rounding with 7
 lsr.l #4,d1            ; /16
 move.b d1,(a3)+        ; pixel top-right
 add.l d6,d3            ; this + last (dn)
 add.l d6,d6            ; this *2
 add.l d6,d3            ; this *3 + last *1
 add.l d6,d6            ; this *4
 addq.l #7,d3            ; rounding with 7
 lsr.l #4,d3            ; /16
 move.b d3,(a4)+        ; pixel bottom-left
 addq.l #8,d6            ; rounding with 8
 lsr.l #4,d6
 move.b d6,(a4)+        ; pixel bottom-right

; line loop
 movem.l (a7)+,d6-d7/a5-a6
 subq.w #1,d6
 bne .yloop
 rts

Note : I translated the comments, but they're not really readable even now...

Quote:

Originally Posted by Thorham

Anyway, have you found anything to optimize in the jpeg part?

Yep. Those interested can find the code here :
http://meynaf.free.fr/tmp/v.zip

Thorham · 29 January 2008, 10:01

Quote:

Originally Posted by meynaf

A new computer, child of the Amiga, is a dream of mine. I have some specs on how it could look like from a programmer's point of view, but I'm no electronician either.

I would love to see your ideas, could you post them, please?

Quote:

Originally Posted by meynaf

Basically, when the cpu writes something, it doesn't wait for the write to complete before doing something else in registers. So it's a good habit to put register-only code after memory writes if you can.

That's simple enough! Should be no problem to make this a habit.

Quote:

Originally Posted by meynaf

As I already said : just do it.

It's something I have wanting to try for a long time. Setting this thing up is probably a bit of a pain, but should be no problem, intels own docs explain this quite well. Of course, I'd first have to learn i386 assembler as a bare minimum, and that should be easy enough since there are a variety of tools out there (I even have one of them), which allow win32 coding in asm, and I have good docs on this.

Quote:

Originally Posted by meynaf

Cool.

Done

I'm now using my miggy to get online

Quote:

Originally Posted by meynaf

Here it is. Triangular 2x2 upsampling code in asm. See the original jdsample.c in the jpeg library for more info.
Note : I translated the comments, but they're not really readable even now...

Cool. I'll just read up on what's really happening for this type of up sampling. It seems a little different from bilinear.

Quote:

Originally Posted by meynaf

Yep. Those interested can find the code here :

There's a lot to look at, right now! Should I just concentrate on doing the png codec, and wait until you've done the jpeg codec in full asm? Or should I try to get the jpeg codec to compile with Storm/Sas/Dice/VBCC? Any pointers are appreciated!

meynaf · 29 January 2008, 11:05

Quote:

Originally Posted by Thorham

I would love to see your ideas, could you post them, please?

That would require a new thread, if not a whole site !

Basically, it's making an open, lightweight, efficient, cool to code, computer architecture, which remains in adequation with todays requirements.
Remember : architectures persist longer than implementations.

The machine has to be some sort of "generic box" like PCs are meant to be.
It must be user friendly as well as programmer friendly, like Amigas are.

I am unsure an Amiga board is the right place to discuss of this.

Quote:

Originally Posted by Thorham

That's simple enough! Should be no problem to make this a habit.

On fastmem it works as expected, however there are some (usually bad) surprises with chipmem ; see this thread : http://eab.abime.net/showthread.php?t=34481

Quote:

Originally Posted by Thorham

It's something I have wanting to try for a long time. Setting this thing up is probably a bit of a pain, but should be no problem, intels own docs explain this quite well. Of course, I'd first have to learn i386 assembler as a bare minimum, and that should be easy enough since there are a variety of tools out there (I even have one of them), which allow win32 coding in asm, and I have good docs on this.

Yeah, learn i386 assembler, and see its beautiful syntax, its numerous general-purpose registers, its powerful addressing modes...
See how it's easy to bang on the hardware, how well it is documented...
See the poetry of the various memory models...

Quote:

Originally Posted by Thorham

Done

I'm now using my miggy to get online

Good. I'm sure you feel much better now.

Quote:

Originally Posted by Thorham

Cool. I'll just read up on what's really happening for this type of up sampling. It seems a little different from bilinear.

For 2x2 you basically make an average with :
. 9/16 of current pixel value
. 3/16 of left or right pixel value
. 3/16 of top or bottom pixel value
. 1/16 of diagonal pixel value

How would a bilinear filter do that ? (you have 1 pixel and want to output 4)
A box filter would simply copy them around ; not good.

Quote:

Originally Posted by Thorham

There's a lot to look at, right now! Should I just concentrate on doing the png codec, and wait until you've done the jpeg codec in full asm? Or should I try to get the jpeg codec to compile with Storm/Sas/Dice/VBCC? Any pointers are appreciated!

Feel free to do whatever you wish.

I don't have Dice or VBCC, and I don't like SasC's command-line stuff.
However I have StormC, maybe not the latest version, but I may look if I can do a project file, so you can compile the project.
The jpeg library compiles litteraly everywhere, but linking with asm (especially mine

) is something else (I wouldn't try this with gcc).

On the other hand you can simply disable the jpeg support and assemble the program (you would notice a major exe size drop then).

Thorham · 29 January 2008, 11:46

Quote:

Originally Posted by meynaf

That would require a new thread, if not a whole site !

Basically, it's making an open, lightweight, efficient, cool to code, computer architecture, which remains in adequation with todays requirements.
Remember : architectures persist longer than implementations.

The machine has to be some sort of "generic box" like PCs are meant to be.
It must be user friendly as well as programmer friendly, like Amigas are.

I am unsure an Amiga board is the right place to discuss of this.

Yes, it is quite off-topic! But a discussion could go in the off-topic section. Basically it's the ot-stupidity forum where all the silly stuff goes, the ot-general section should do fine!

Quote:

Originally Posted by meynaf

On fastmem it works as expected, however there are some (usually bad) surprises with chipmem ; see this thread : http://eab.abime.net/showthread.php?t=34481

Yes, I've read it, and there are some mighty strange things happening. This really deserves a closer look.

Quote:

Originally Posted by meynaf

Yeah, learn i386 assembler, and see its beautiful syntax, its numerous general-purpose registers, its powerful addressing modes...
See how it's easy to bang on the hardware, how well it is documented...
See the poetry of the various memory models...

They've done it for linux/windows (and 4gw, if I got the name right) etc. If they can do it, then so can I

Of course, it won't be pretty. And that's probably an understatement. I still want to get into this, sooner or later, so I'm just going to have to cope!

Quote:

Originally Posted by meynaf

Good. I'm sure you feel much better now.

Yeah, man, I should've done this way sooner

Quote:

Originally Posted by meynaf

For 2x2 you basically make an average with :
. 9/16 of current pixel value
. 3/16 of left or right pixel value
. 3/16 of top or bottom pixel value
. 1/16 of diagonal pixel value

With diagonal I suppose you mean x+1,y+1, where the current pixel is just x,y?

Quote:

Originally Posted by meynaf

How would a bilinear filter do that ? (you have 1 pixel and want to output 4)
A box filter would simply copy them around ; not good.

If I've got it right, then this code should do bilinear interpolation for 2x2:

Code:

For yy=0 To 511 Step 2
    If InKey$<>"" Then Stop
    For xx=0 To 639 Step 2
    
        xxx=xx\2+640:yyy=yy\2
          p1=pointg(xxx-1,yyy-1):p2=Pointg(xxx,yyy-1):p3=Pointg(xxx+1,yyy-1)
          p4=pointg(xxx-1,yyy):p5=Pointg(xxx,yyy):p6=Pointg(xxx+1,yyy)
          p7=pointg(xxx-1,yyy+1):p8=Pointg(xxx,yyy+1):p9=Pointg(xxx+1,yyy+1)
  
        p=(p1+p2+p4+p5)\4:plot(xx,yy+512,p)
        p=(p2+p3+p5+p6)\4:plot(xx+1,yy+512,p)
          p=(p4+p5+p7+p8)\4:plot(xx,yy+1+512,p)
          p=(p5+p6+p8+p9)\4:plot(xx+1,yy+1+512,p)

    Next
Next

Of course, this is freebasic. Pointg is a procedure that simply returns the gray value for an rgb pixel. And plot writes a gray scale (0-255) pixel to screen. Then the image which has to be interpolated is located at 640,0 and is 320x256 gray scale pixels. The interpolated result is located at 0,512 and is 640x512 gray scale pixels. At location 0,0 there is the original which is scaled down. Of coarse the down scaling is not in this code. The \ is simply the same as /, except that \ does not round after dividing. I sure hope that's all clear...

Quote:

Originally Posted by meynaf

Feel free to do whatever you wish.

I don't have Dice or VBCC, and I don't like SasC's command-line stuff.
However I have StormC, maybe not the latest version, but I may look if I can do a project file, so you can compile the project.
The jpeg library compiles litteraly everywhere, but linking with asm (especially mine

) is something else (I wouldn't try this with gcc).

On the other hand you can simply disable the jpeg support and assemble the program (you would notice a major exe size drop then).

I would really appreciate it if you could do a Storm project. I have Storm 3, should be good enough, right? It would really be nice if I can compile the whole program as a simple Storm project; it would mean that when I find some optimizations that I can actually test them, instead of guessing if it's going to work or not. Not being able to run tests, just plain sucks, as you know. Again, if you could do it, then many, many thanks

meynaf · 29 January 2008, 12:41

Quote:

Originally Posted by Thorham

Yes, it is quite off-topic! But a discussion could go in the off-topic section. Basically it's the ot-stupidity forum where all the silly stuff goes, the ot-general section should do fine!

Ot-general thread opened :
http://eab.abime.net/showthread.php?t=34571

Quote:

Originally Posted by Thorham

Yes, I've read it, and there are some mighty strange things happening. This really deserves a closer look.

I think my program to measure execution times will run a lot next week-end.

Quote:

Originally Posted by Thorham

They've done it for linux/windows (and 4gw, if I got the name right) etc. If they can do it, then so can I

Of course, it won't be pretty. And that's probably an understatement. I still want to get into this, sooner or later, so I'm just going to have to cope!

Let me know when you achieve something...

Quote:

Originally Posted by Thorham

Yeah, man, I should've done this way sooner

But finally you did it. Now you're a Man

Quote:

Originally Posted by Thorham

With diagonal I suppose you mean x+1,y+1, where the current pixel is just x,y?

Not really but it's close.

You have 4 pixels to write : up-left, up-right, down-left, down-right.
All of them get 9/16 of (x,y), and :
- For up-left : 1/16 of (x-1,y-1), 3/16 of (x-1,y), 3/16 of (x,y-1)
- For up-right : 1/16 of (x+1, y-1), 3/16 of (x+1,y), 3/16 of (x,y-1)
- For down-left : 1/16 of (x-1,y+1), 3/16 of (x-1,y), 3/16 of (x,y+1)
- For down-right : 1/16 of (x+1,y+1), 3/16 of (x+1,y), 3/16 of (x,y+1)

Quote:

Originally Posted by Thorham

If I've got it right, then this code should do bilinear interpolation for 2x2:

Of course, this is freebasic. Pointg is a procedure that simply returns the gray value for an rgb pixel. And plot writes a gray scale (0-255) pixel to screen. Then the image which has to be interpolated is located at 640,0 and is 320x256 gray scale pixels. The interpolated result is located at 0,512 and is 640x512 gray scale pixels. At location 0,0 there is the original which is scaled down. Of coarse the down scaling is not in this code. The \ is simply the same as /, except that \ does not round after dividing. I sure hope that's all clear...

It's clear enough. Apparently bilinear interpolation is the same as triangular, but it weights the pixels with 1/1/1/1 instead of 9/3/3/1.

Quote:

Originally Posted by Thorham

I would really appreciate it if you could do a Storm project. I have Storm 3, should be good enough, right? It would really be nice if I can compile the whole program as a simple Storm project; it would mean that when I find some optimizations that I can actually test them, instead of guessing if it's going to work or not. Not being able to run tests, just plain sucks, as you know. Again, if you could do it, then many, many thanks

I don't know if I will succeed, but I'm going to try it this week-end.

Thorham · 29 January 2008, 13:56

Quote:

Originally Posted by meynaf

Ot-general thread opened :
http://eab.abime.net/showthread.php?t=34571

Cool

It's just a pity that posting there doesn't increase the post count!

Quote:

Originally Posted by meynaf

I think my program to measure execution times will run a lot next week-end.

I'd say. I've seen the thread and theres a lot to test.

Quote:

Originally Posted by meynaf

Let me know when you achieve something...

Well, I wouldn't wait on it if I were you. It's not going to happen any time soon. I have a little compiler project I'm doing on the amiga, and one of the goals is to get this to compile pc code. Only when the 680x0 part is done will I ever learn ia32 code. So, that really could take a while.

Quote:

Originally Posted by meynaf

But finally you did it. Now you're a Man

Yippy, I'm a MAN now, hurray

Quote:

Originally Posted by meynaf

Not really but it's close.

You have 4 pixels to write : up-left, up-right, down-left, down-right.
All of them get 9/16 of (x,y), and :
- For up-left : 1/16 of (x-1,y-1), 3/16 of (x-1,y), 3/16 of (x,y-1)
- For up-right : 1/16 of (x+1, y-1), 3/16 of (x+1,y), 3/16 of (x,y-1)
- For down-left : 1/16 of (x-1,y+1), 3/16 of (x-1,y), 3/16 of (x,y+1)
- For down-right : 1/16 of (x+1,y+1), 3/16 of (x+1,y), 3/16 of (x,y+1)

Right. I've implemented it, and is does a much better job then simple bilinear. I'm simply not using that anymore. The image with this one is much sharper! Another plus is that it's not even going to be much slower then bilinear when implemented in asm. I might just do an asm version for the fun of it, maybe I can beat your version

By the way, you may still try bilinear for a speed gain on a plain a1200. I'll try it with my ycbcr program in basic.

Quote:

Originally Posted by meynaf

It's clear enough. Apparently bilinear interpolation is the same as triangular, but it weights the pixels with 1/1/1/1 instead of 9/3/3/1.

Yes, it does seem very similar. Here it is in basic:

Code:

p=(p1*1+p2*3+p4*3+p5*9)\16:plot(xx+640,yy+512,p)
p=(p2*3+p3*1+p5*9+p6*3)\16:plot(xx+1+640,yy+512,p)
p=(p4*3+p5*9+p7*1+p8*3)\16:plot(xx+640,yy+1+512,p)
p=(p5*9+p6*3+p8*3+p9*1)\16:plot(xx+1+640,yy+1+512,p)

Was very simple to modify the code, and the results are very good, too!

Quote:

Originally Posted by meynaf

I don't know if I will succeed, but I'm going to try it this week-end.

While I do hope you succeed, even if you don't, I really appreciate the fact that you're going to try it. Thank you

meynaf · 29 January 2008, 14:54

Quote:

Originally Posted by Thorham

Cool

It's just a pity that posting there doesn't increase the post count!

I don't care a lot about the post count.

(do I sound credible ?)

Quote:

Originally Posted by Thorham

Well, I wouldn't wait on it if I were you. It's not going to happen any time soon. I have a little compiler project I'm doing on the amiga, and one of the goals is to get this to compile pc code. Only when the 680x0 part is done will I ever learn ia32 code. So, that really could take a while.

That could take more than a while ! I dunno what you intend to compile, but writing a compiler is one of the toughest thing there are. You've put yourself into a hard task

Quote:

Originally Posted by Thorham

Yippy, I'm a MAN now, hurray

Yep. Now you are allowed to bash other people's peecees

Quote:

Originally Posted by Thorham

Right. I've implemented it, and is does a much better job then simple bilinear. I'm simply not using that anymore. The image with this one is much sharper! Another plus is that it's not even going to be much slower then bilinear when implemented in asm. I might just do an asm version for the fun of it, maybe I can beat your version

If you can do it with the exact same quality in less than 120 clock cycles per source pixel (-> 4 destination pixels), then let me know, I'll include it in the project asap !
(mine is actually 118 if I counted right)

Quote:

Originally Posted by Thorham

By the way, you may still try bilinear for a speed gain on a plain a1200. I'll try it with my ycbcr program in basic.

You like damaging quality, don't you ?

Quote:

Originally Posted by Thorham

While I do hope you succeed, even if you don't, I really appreciate the fact that you're going to try it. Thank you

At worse you can compile sources separately, then link them manually with e.g. phxlnk. Not very practical but better than nothing.

Thorham · 29 January 2008, 16:05

Quote:

Originally Posted by meynaf

I don't care a lot about the post count.

(do I sound credible ?)

Quote:

Originally Posted by meynaf

That could take more than a while ! I dunno what you intend to compile, but writing a compiler is one of the toughest thing there are. You've put yourself into a hard task

It's for an object oriented language which I want to be able to handle both low-level and high-level programming properly. Some of the stuff in such a compiler is pretty easy, such as the object handling, other things, I haven't figured out, but some of them could be quite hard. Any comments on what the hard parts are?

Quote:

Originally Posted by meynaf

Yep. Now you are allowed to bash other people's peecees

Yeah, cool. Lot's of

Quote:

Originally Posted by meynaf

If you can do it with the exact same quality in less than 120 clock cycles per source pixel (-> 4 destination pixels), then let me know, I'll include it in the project asap !
(mine is actually 118 if I counted right)

Here's a quick version:

Code:

;Bilinear 2x2
;
;For triangular the averageing blocks should
;look something like this:
;
;    move.l    d0,d7
;    lsl.l    #3,d7
;    add.l    d0,d7
;    add.l    d1,d7
;    add.l    d1,d1
;    add.l    d1,d7
;    add.l    d2,d7
;    add.l    d2,d2
;    add.l    d2,d7
;    add.l    d3,d7
;    lsr.l    #4,d7
;
;Note that for equal weights of 1, the order
;is not important. For triangular in the above
;example they have to be done in the right order.
;But, of course, you knew that, lol.
;

Filter
    move.l    In,a0
    sub.l    #Width,a0
    move.l    In,a1
    move.l    In,a2
    add.l    #Width,a2
    move.l    Out,a3
    move.l    Out,a4
    add.l    #Width*2,a4

    move.l    #Width-1,d6
    
    moveq    #0,d0
    moveq    #0,d1
    moveq    #0,d2
    moveq    #0,d3
    moveq    #0,d4
    moveq    #0,d5
.lp
    move.b    (a0)+,d0    ;Read block 1
    move.b    (a0)+,d1
    move.b    (a1)+,d2
    move.b    (a1)+,d3
    
    move.l    d0,d7        ;Calc averages
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    
    move.b    d7,(a3)+    ;Write pixel 1

    move.b    (a0)+,d4    ;Read block 2
    move.b    (a1)+,d5
    
    move.l    d1,d7        ;Calc averages
    add.l    d4,d7
    add.l    d3,d7
    add.l    d5,d7
    lsr.l    #2,d7

    move.b    d7,(a3)+    ;Write pixel 2

    move.b    (a2)+,d0    ;Read block 3
    move.b    (a2)+,d1
    
    move.l    d2,d7        ;Calc averages
    add.l    d3,d7
    add.l    d0,d7
    add.l    d1,d7
    lsr.l    #2,d7
    
    move.b    d7,(a4)+    ;Write pixel 3
    
    move.b    (a2)+,d0    ;Read block 4
    
    move.l    d3,d7        ;Calc averages
    add.l    d5,d7
    add.l    d1,d7
    add.l    d0,d7
    lsr.l    #2,d7
    
    move.b    d7,(a4)+    ;Write pixel 4
    
    dbra    d6,.lp

The reason I posted it like this is that it could be pretty easy to have two versions of the jpeg codec: one for a plain 1200 with all the quality sacrifices, and one for fast amigas in full quality. Furthermore, even with the ycbcr accuracy reduced plus bilinear, the quality should still be more then acceptable on a plain 1200. Just a thought

Oh, by the way, this is still completely unoptimized

Quote:

Originally Posted by meynaf

You like damaging quality, don't you ?

Yes, but only if it doesn't get ugly. I've already tried it, and it really doesn't look bad. It's just the same with the ycbcr degradation; the loss is minimal, and should be great for a second version of the codec. This can be done easily when the hq codec is finished. See, there's actually a use for this

Quote:

Originally Posted by meynaf

At worse you can compile sources separately, then link them manually with e.g. phxlnk. Not very practical but better than nothing.

If you think that will work, I'll try that first. If it does, then you can skip making a storm project. Since it's only Tuesday, I've got plent of time left.

meynaf · 29 January 2008, 17:23

Quote:

Originally Posted by Thorham

It's for an object oriented language which I want to be able to handle both low-level and high-level programming properly. Some of the stuff in such a compiler is pretty easy, such as the object handling, other things, I haven't figured out, but some of them could be quite hard. Any comments on what the hard parts are?

The hard parts are probably in the code generation. You probably knew that compilers generate ugly code ; now you'll discover why

Quote:

Originally Posted by Thorham

Yeah, cool. Lot's of

And lots of

in return. Even more fun

Quote:

Originally Posted by Thorham

The reason I posted it like this is that it could be pretty easy to have two versions of the jpeg codec: one for a plain 1200 with all the quality sacrifices, and one for fast amigas in full quality. Furthermore, even with the ycbcr accuracy reduced plus bilinear, the quality should still be more then acceptable on a plain 1200. Just a thought

Oh, by the way, this is still completely unoptimized

Unoptimized, and certainly untested : it won't work

You are reading 3 bytes for each source line in each loop ; you should only read 1 or adjust pointers afterwards (or the funniest way : keep the old values).

But, pal, people having a plain a1200 are already prepared to wait ages before the image shows up, so a very slightly faster version won't fit them.
(said otherwise : when you have to wait a century, you're not after a few years...)

And, oh, yes, I've counted the clock cycles of your version and ended up with 144/loop (slower than mine, heheh). How will the optimized version look like ?

Quote:

Originally Posted by Thorham

Yes, but only if it doesn't get ugly. I've already tried it, and it really doesn't look bad. It's just the same with the ycbcr degradation; the loss is minimal, and should be great for a second version of the codec. This can be done easily when the hq codec is finished. See, there's actually a use for this

A quick-and-dirty version will be better with a box filter IMO. And yes, I like to contradict people

Quote:

Originally Posted by Thorham

If you think that will work, I'll try that first. If it does, then you can skip making a storm project. Since it's only Tuesday, I've got plent of time left.

That sure will work. The asm has to appear first in the list of objects and there mustn't be any C startup/cleanup code.

But even if you're successful in that way, I will try the StormC project.

Thorham · 29 January 2008, 17:51

Quote:

Originally Posted by meynaf

Unoptimized, and certainly untested : it won't work

You are reading 3 bytes for each source line in each loop ; you should only read 1 or adjust pointers afterwards (or the funniest way : keep the old values).

Hadn't thought of that one! Well, that's what you get from a plain version like mine. I think I posted it too quickly.

Quote:

Originally Posted by meynaf

But, pal, people having a plain a1200 are already prepared to wait ages before the image shows up, so a very slightly faster version won't fit them.
(said otherwise : when you have to wait a century, you're not after a few years...)

Good point, I rest my case, and will not suggest anything that sacrifices quality again. I swear it on my ....

Quote:

Originally Posted by meynaf

And, oh, yes, I've counted the clock cycles of your version and ended up with 144/loop (slower than mine, heheh). How will the optimized version look like ?

This was only a quick version to see what you think. I guess I'll try optimizing just for the fun of it. I guess I'll fix it first, and make sure I haven't made any mistakes before posting.

Quote:

Originally Posted by meynaf

A quick-and-dirty version will be better with a box filter IMO. And yes, I like to contradict people

Nearest neighbor, right? Yeah, that's the fastest! And if I were you, I wouldn't stop contradicting people, since they really aren't always right

Quote:

Originally Posted by meynaf

That sure will work. The asm has to appear first in the list of objects and there mustn't be any C startup/cleanup code.

Good. That's pretty easy, I guess. Or so I hope! Which compiler do you think produces the best code: storm, sas, dice or vbcc? Would be nice to use the best one.

Quote:

Originally Posted by meynaf

But even if you're successful in that way, I will try the StormC project.

Thanks again

meynaf · 29 January 2008, 18:26

Quote:

Originally Posted by Thorham

Hadn't thought of that one! Well, that's what you get from a plain version like mine. I think I posted it too quickly.

Anyway you know next time I won't miss you

Quote:

Originally Posted by Thorham

Good point, I rest my case, and will not suggest anything that sacrifices quality again. I swear it on my ....

You swear on your what ?

(/me tries to look innocent and fails)

Quote:

Originally Posted by Thorham

This was only a quick version to see what you think. I guess I'll try optimizing just for the fun of it. I guess I'll fix it first, and make sure I haven't made any mistakes before posting.

Damn. It will be harder for me next time then.

Quote:

Originally Posted by Thorham

Nearest neighbor, right? Yeah, that's the fastest! And if I were you, I wouldn't stop contradicting people, since they really aren't always right

Good. That's pretty easy, I guess. Or so I hope! Which compiler do you think produces the best code: storm, sas, dice or vbcc? Would be nice to use the best one.

Yes, the box filter simply replicates the pixels, so it ought to be fast...

The compiler which produces the best code on 68k is gcc, but you can't use it to link with asm because of its incompatible object format (with hunk2gcc and gcc's linker it might be possible though).

For the others I frankly don't know. They are the same (1) to me.

(1) : add the "crap" word here if you like, else leave it blank

Quote:

Originally Posted by Thorham

Thanks again

No problem.

Thorham · 30 January 2008, 16:32

Quote:

Originally Posted by meynaf

Damn. It will be harder for me next time then.

Well, it might be. I've redone the interpolation code. The method is tested in basic, and it seems to be as good as it's supposed to be, except for the fact that I'm testing on an old monitor. I had an accident with my LG Studioworks, and now it's cable is broken. Until I can sort that out, I can't test properly. I can make some test images if you want, though. Here's the new code:

Code:

Filter
    move.l    In,a0
    sub.l    #Width,a0
    move.l    In,a1
    move.l    In,a2
    add.l    #Width,a2
    move.l    Out,a3
    move.l    Out,a4
    add.l    #Width*2,a4

    move.l    #Width/2-1,d6
    
    moveq    #0,d0
    moveq    #0,d1
    moveq    #0,d2
    moveq    #0,d3
    moveq    #0,d4
    moveq    #0,d5
.lpen                ;Entry code (unoptimized)
    move.b    (a0)+,d0
    move.b    (a0)+,d1
    move.b    (a1)+,d2
    move.b    (a1)+,d3

    move.l    d0,d7
    lsl.l    #3,d7
    add.l    d3,d7
    add.l    d3,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d2,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left

    move.l    d0,d7
    add.l    d0,d7
    add.l    d0,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    
    move.l    d0,d7
    add.l    d0,d7
    add.l    d0,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a4)+    ;Write bottom-left

    move.l    d0,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right

.lp                ;Rest of row. Here d1 and d2 contain old values
    move.b    (a0)+,d0
    move.b    (a1)+,d2
    move.l    d1,d7        ;x8 x3 x3 x2
    lsl.l    #3,d7
    add.l    d2,d7
    add.l    d2,d7
    move.l    d0,a5
    add.l    a5,a5
    add.l    d0,a5
    add.l    a5,d7
    move.l    d3,d4
    add.l    d4,d4
    add.l    d3,d4
    add.l    d4,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left
    move.l    d1,d5        ;x3 x3 x1 x1
    add.l    d5,d5
    add.l    d1,d5
    move.l    d5,d7
    add.l    a5,d7
    add.l    d3,d7
    add.l    d2,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    add.l    d0,d5        ;x3 x1 x3 x1
    add.l    d4,d5
    add.l    d2,d5
    lsr.l    #3,d5
    move.b    d5,(a4)+    ;Write bottom-left
    move.l    d1,d7        ;x1 x1 x1 x1
    add.l    d0,d7
    add.l    d3,d7
    add.l    d2,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right

;Next four pixels. Here d0 and d2 contain old values.

    move.b    (a0)+,d1
    move.b    (a1)+,d3
    move.l    d0,d7        ;x8 x3 x3 x2
    lsl.l    #3,d7
    add.l    d3,d7
    add.l    d3,d7
    move.l    d1,a5
    add.l    d5,a5
    add.l    d1,a5
    add.l    a5,d7
    move.l    d2,d4
    add.l    d4,d4
    add.l    d2,d4
    add.l    d4,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left
    move.l    d0,d5        ;x3 x3 x1 x1
    add.l    d5,d5
    add.l    d0,d5
    move.l    d5,d7
    add.l    a5,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    add.l    d1,d5        ;x3 x1 x3 x1
    add.l    d4,d5
    add.l    d3,d5
    lsr.l    #3,d5
    move.b    d5,(a4)+    ;Write bottom-left
    move.l    d0,d7        ;x1 x1 x1 x1
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right
    dbf    d6,.lp

;Here some exit code for the last pixels in the row is needed.

Notice how the loop does eight pixels in one go now, for only four reads! It does eight pixels so some old values can be used easily, and it still fits in the cache easily.

Furthermore the inner loop is somewhat optimized, while the entry code is not, although it can be optimized in the same way as I did for the rest of the code. If this is as good as it's supposed to be (which I can't tell now) then try to beat it

meynaf · 30 January 2008, 17:29

Quote:

Originally Posted by Thorham

Well, it might be. I've redone the interpolation code. The method is tested in basic, and it seems to be as good as it's supposed to be, except for the fact that I'm testing on an old monitor. I had an accident with my LG Studioworks, and now it's cable is broken. Until I can sort that out, I can't test properly. I can make some test images if you want, though.

Too bad. To avoid this I have 2 working monitors and a 3rd which nearly works.
Maybe you could get your hands on a 1083S or similar monitor...

Quote:

Originally Posted by Thorham

Here's the new code:

Oh yes some fresh code to look at !

At first glance I'd say that you're reading from 2 sources, not 3.
Shouldn't you access 3 lines (previous, current, next) ?

Quote:

Originally Posted by Thorham

Notice how the loop does eight pixels in one go now, for only four reads! It does eight pixels so some old values can be used easily, and it still fits in the cache easily.

No need to unroll loops, speed would have been exactly the same if you didn't duplicate the code.

Quote:

Originally Posted by Thorham

Furthermore the inner loop is somewhat optimized, while the entry code is not, although it can be optimized in the same way as I did for the rest of the code. If this is as good as it's supposed to be (which I can't tell now) then try to beat it

How hard is it to beat it ? Let's see...

All lemm... errrh... clock cycles accounted for : 100 per 4-pixel write.
Ok it's fast (18% as compared to mine). But for the quality I have serious doubts (see my remark above about reading only 2 lines).

Anyway it doesn't perform the exact same work.

Thorham · 30 January 2008, 17:43

Quote:

Originally Posted by meynaf

Too bad. To avoid this I have 2 working monitors and a 3rd which nearly works.
Maybe you could get your hands on a 1083S or similar monitor...

Yeah, it is. I'll need a new cable. The monitor has to be for the pc, though, although I wouldn't mind having an amiga monitor...

Quote:

Originally Posted by meynaf

Oh yes some fresh code to look at !

At first glance I'd say that you're reading from 2 sources, not 3.
Shouldn't you access 3 lines (previous, current, next) ?

Not for this algorithm.

Quote:

Originally Posted by meynaf

No need to unroll loops, speed would have been exactly the same if you didn't duplicate the code.

Well, this actually saves a few instructions, and as a side effect, the dbf instruction get's executed only half the amount it would normally. So it is a little faster in this case, just look at the register usage.

Quote:

Originally Posted by meynaf

But for the quality I have serious doubts (see my remark above about reading only 2 lines).

It looks just as good on this end, but the monitor is a lot smaller and less sharp, so if there is a quality difference, I can't see it!

Quote:

Originally Posted by meynaf

Anyway it doesn't perform the exact same work.

If a replacement algorithm delivers equal quality, then it doesn't have to. Of course, this remains to be seen.

Man, this sucks. I didn't want to do this today, but I'm going to try and repair the cable. I really can't work like this, argh

meynaf · 30 January 2008, 18:06

Quote:

Originally Posted by Thorham

Yeah, it is. I'll need a new cable. The monitor has to be for the pc, though, although I wouldn't mind having an amiga monitor...

You use a pc monitor on your amiga ?

Quote:

Originally Posted by Thorham

Not for this algorithm.
Well, this actually saves a few instructions, and as a side effect, the dbf instruction get's executed only half the amount it would normally. So it is a little faster in this case, just look at the register usage.

The dbf instruction is pipelined in the last write and actually amounts for 0, so if you divide it by two it'll still be 0...
However if it saves you some moves then it's ok.

Quote:

Originally Posted by Thorham

It looks just as good on this end, but the monitor is a lot smaller and less sharp, so if there is a quality difference, I can't see it!

On a pc screen you have to damage an image quite a lot before seeing a real difference...

Quote:

Originally Posted by Thorham

If a replacement algorithm delivers equal quality, then it doesn't have to. Of course, this remains to be seen.

This has to be seen, for sure.

Quote:

Originally Posted by Thorham

Man, this sucks. I didn't want to do this today, but I'm going to try and repair the cable. I really can't work like this, argh

I'm sad for your cable. R.I.P.

Thorham · 30 January 2008, 19:07

Quote:

Originally Posted by meynaf

You use a pc monitor on your amiga ?

Yep, unfortunately. When my last 1084 broke down, I decided not to buy a 'new' one, because they're all old, of course. Around the same time I also got a pc, and so I decided to buy an svga monitor and a video/tv box. Now I have to make do with using my miggy's video out. Not ideal.

Quote:

Originally Posted by meynaf

The dbf instruction is pipelined in the last write and actually amounts for 0, so if you divide it by two it'll still be 0...
However if it saves you some moves then it's ok.

Cool

That means I can pipeline some more stuff, actually. I can fit some register only instructions after the first read. Great!

Quote:

Originally Posted by meynaf

On a pc screen you have to damage an image quite a lot before seeing a real difference...

Not on my studioworks in 1280x1024x24bit! Although some color differences are a bit hard to spot. Also, testing things on the pc in super hires 24bit is much better then doing it on the amiga.

Quote:

Originally Posted by meynaf

This has to be seen, for sure.

Yes, and I can't wait. It shouldn't be to difficult to use an extra read, though.

Quote:

Originally Posted by meynaf

I'm sad for your cable. R.I.P.

Thank you

Anyway, someone I know has a broken monitor, so I can use his cable! Tomorrow I'll have it fixed

meynaf · 31 January 2008, 09:51

Quote:

Originally Posted by Thorham

Yep, unfortunately. When my last 1084 broke down, I decided not to buy a 'new' one, because they're all old, of course. Around the same time I also got a pc, and so I decided to buy an svga monitor and a video/tv box. Now I have to make do with using my miggy's video out. Not ideal.

So you have some sort of a scan doubler in your video/tv box ?

Quote:

Originally Posted by Thorham

Cool

That means I can pipeline some more stuff, actually. I can fit some register only instructions after the first read. Great!

After the first read ? You won't gain anything by doing so. It's the writes that can be pipelined.
(Well, in chipmem things are a little bit more complex, but here we're accessing fastmem only.)

Quote:

Originally Posted by Thorham

Not on my studioworks in 1280x1024x24bit! Although some color differences are a bit hard to spot. Also, testing things on the pc in super hires 24bit is much better then doing it on the amiga.

On yours probably. But on a good monitor, amiga colors are much brighter than on a pc. Anyway the code is intended to work on an amiga, not on a pc.
(Btw why do you always write "then" instead of "than" ?)

Quote:

Originally Posted by Thorham

Yes, and I can't wait. It shouldn't be to difficult to use an extra read, though.

I'm using 9 values in my code, where you're using 4, so it could be a little bit more than an extra read.

But, please tell me : where does your algorithm come from ?

Quote:

Originally Posted by Thorham

Thank you

Anyway, someone I know has a broken monitor, so I can use his cable! Tomorrow I'll have it fixed

A broken cable with a working monitor, and a working cable with a broken monitor... so you'll end up with a broken cable and a broken monitor

Thorham · 31 January 2008, 11:01

Quote:

Originally Posted by meynaf

So you have some sort of a scan doubler in your video/tv box ?

Unfortunately not. But it does deinterlace, and also, it 'uprades' everything to at least 60 herz. Plus, even in that mode, super smooth bugs

It's a crappy b-grade product

Quote:

Originally Posted by meynaf

After the first read ? You won't gain anything by doing so. It's the writes that can be pipelined.
(Well, in chipmem things are a little bit more complex, but here we're accessing fastmem only.)

Ok, very odd though. Perhaps I should read the original Motorola docs

Quote:

Originally Posted by meynaf

On yours probably. But on a good monitor, amiga colors are much brighter than on a pc. Anyway the code is intended to work on an amiga, not on a pc.
(Btw why do you always write "then" instead of "than" ?)

Really? I haven't noticed! About the 'than' thing: It's probably just a stupid typo. Of course in the Netherlands everyone speaks Dutch, so I never have to use English for anything here. Having been raised with English just makes it easy to understand, it doesn't make you flawless at using it. There are cases where it's 'then' and when it's 'than', I just don't know exactly when to use them

Quote:

Originally Posted by meynaf

I'm using 9 values in my code, where you're using 4, so it could be a little bit more than an extra read.

But, please tell me : where does your algorithm come from ?

True. The algorithm is quite different. I've come up with it myself. And while it's better than bilinear, now that I can see clearly again, I know now that it is not as good as triangular. Damn, what a shame

Quote:

Originally Posted by meynaf

A broken cable with a working monitor, and a working cable with a broken monitor... so you'll end up with a broken cable and a broken monitor

No, it works again, now. It was pretty easy, too. Nothing more than 20 minutes of work, haha

meynaf · 31 January 2008, 14:13

Quote:

Originally Posted by Thorham

Unfortunately not. But it does deinterlace, and also, it 'uprades' everything to at least 60 herz. Plus, even in that mode, super smooth bugs

It's a crappy b-grade product

I wondered if it couldn't be the source of your curious machine behaviors. No driver needed on the amiga side ?

Quote:

Originally Posted by Thorham

Ok, very odd though. Perhaps I should read the original Motorola docs

It can be understood like this : when you write a value to memory, you don't need it to be actually written before going on. On the other hand, how can the program continue without knowing what we have read ?

Quote:

Originally Posted by Thorham

Really? I haven't noticed! About the 'than' thing: It's probably just a stupid typo. Of course in the Netherlands everyone speaks Dutch, so I never have to use English for anything here. Having been raised with English just makes it easy to understand, it doesn't make you flawless at using it. There are cases where it's 'then' and when it's 'than', I just don't know exactly when to use them

So "than" and "then" are the same in Dutch, am I right ?

Quote:

Originally Posted by Thorham

True. The algorithm is quite different. I've come up with it myself. And while it's better than bilinear, now that I can see clearly again, I know now that it is not as good as triangular. Damn, what a shame

How unfortunate. Now what's left for you to do is to write a faster triangular one

Note that if you can't beat mine (and you won't, heheh

) there is still the 2:1 version to check (also triangular interpolation but writes 2 horizontal pixels and 1 vertical). A more common case than I first expected.

Quote:

Originally Posted by Thorham

No, it works again, now. It was pretty easy, too. Nothing more than 20 minutes of work, haha

Resurrected ! It's miraculous. You're a wizard, man

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
JPEG to IFF Coverter	W4r3DeV1L	request.Apps	15	14 February 2020 17:21
Overzealous Kickstart ROM - address decoding?	robinsonb5	Hardware mods	3	30 June 2013 11:09
JPEG to PNG (via CLI)	amiga_user	support.Apps	3	28 November 2011 11:50
Decoding algorithm(s) for encoded disk sectors (ADOS)	andreas	Coders. General	10	02 November 2009 22:18
Blitter MFM decoding	Photon	Coders. General	14	16 March 2006 11:24

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)