English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 28 January 2008, 16:55   #61
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
We can.
Well, my electronics knowledge is just way too limited for building a computer, or did you have something else in mind?
Quote:
Originally Posted by meynaf
We weren't speaking about the same things

A 5-pass c2p is indeed 5 blocs of merges (per 1,2,4,8,16 bits).
What I meant was completely different : do the whole merge blocs 6 times (twice for 8 bits)...
Oops That can happen!
Quote:
Originally Posted by meynaf
You're not only copying, you're also performing a lot of other operations on this data. Those operations are pipelined during the chipmem writes ; using fastmem instead can't be slower of course but it won't be much faster.
Hmm, yes, if you look at it from that angle, I have to agree with you. I'll have to get into this whole pipeline thing. Boy, am I stuck with that plain 68000 code, or what
Quote:
Originally Posted by meynaf
Obviously you don't know what you're talking about.
What can I say ? Just do it. Then you'll know the gruesome truth.
Doesn't it involve setting up things like the segment table so that you can switch from real mode to protected mode? It would be a pain to write, I guess. With the ia32 docs from intel, it couldn't be that hard, though. And it would only have to be done once if you do it properly. But sure, I don't know a lot about it.
Quote:
Originally Posted by meynaf
Alternatively, if you want to hit the hardware on a pc, then I suggest you use a hammer, as it's a much easier way (and it's a lot of fun). The OS makes no difference here
Yay, I'm going to try that today
Quote:
Originally Posted by meynaf
Yeah. They deserve public humiliation.
Indeed
Quote:
Originally Posted by meynaf
To go back to the topic, I have the upsample code in asm. If you like bunches of incomprehensible move/add series with an occasional lsr in them, then you'll love it.
I thought I've past the age to write such code, but no
Oh, cool. Post it, please

Anyway, have you found anything to optimize in the jpeg part?
Thorham is offline  
Old 28 January 2008, 18:20   #62
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Well, my electronics knowledge is just way too limited for building a computer, or did you have something else in mind?
A new computer, child of the Amiga, is a dream of mine. I have some specs on how it could look like from a programmer's point of view, but I'm no electronician either.

Quote:
Originally Posted by Thorham View Post
Hmm, yes, if you look at it from that angle, I have to agree with you. I'll have to get into this whole pipeline thing. Boy, am I stuck with that plain 68000 code, or what
Basically, when the cpu writes something, it doesn't wait for the write to complete before doing something else in registers. So it's a good habit to put register-only code after memory writes if you can.

Quote:
Originally Posted by Thorham View Post
Doesn't it involve setting up things like the segment table so that you can switch from real mode to protected mode? It would be a pain to write, I guess. With the ia32 docs from intel, it couldn't be that hard, though. And it would only have to be done once if you do it properly. But sure, I don't know a lot about it.
As I already said : just do it.

Quote:
Originally Posted by Thorham View Post
Yay, I'm going to try that today
Cool.

Quote:
Originally Posted by Thorham View Post
Oh, cool. Post it, please
Here it is. Triangular 2x2 upsampling code in asm. See the original jdsample.c in the jpeg library for more info.
Code:
_asm_upsample22
 movem.l d0-d7/a0-a6,-(a7)        ; +60
 move.l 4+60(a7),a5            ; a5 = input_data
 move.l 8+60(a7),a6            ; a6 = output_data
 move.l 12+60(a7),d7            ; d7 = compptr->downsampled_width
 move.l 16+60(a7),d6            ; d6 = cinfo->max_v_samp_factor
 bsr.s h2v2_fancy_upsample
 movem.l (a7)+,d0-d7/a0-a6
 rts

; upsample "fancy" 2x2 : (the most frequent case)
; a5=input_data, a6=output_data, d7=nb cols, d6=nb rows
h2v2_fancy_upsample
 lsr.w #1,d6            ; we're doing two of them at once
.yloop
 move.l -4(a5),a1        ; a1 = input_data[inrow-1]
 move.l (a5)+,a0        ; a0 = input_data[inrow]
 move.l (a5),a2            ; a2 = input_data[inrow+1]
 move.l (a6)+,a3        ; a3 = output_data[outrow]
 move.l (a6)+,a4        ; a4 = output_data[outrow+1]
 movem.l d6-d7/a5-a6,-(a7)

; here we have a0=src, a1=src-1, a2=src+1, a3=dest1, a4=dest2, d7=nb cols
; particular case of the 1st column ("old" values also needed for after)
 moveq #0,d1            ; this can be out of the loop
 subq.w #3,d7            ; remove first/last colums and 1 for dbf
 moveq #0,d5
 move.b (a1)+,d5        ; a
 move.l d5,a5            ;            [ a5 ok ]
 move.b (a1)+,d1        ; b            [ d1 ok ]
 moveq #0,d2
 move.b (a2)+,d2        ; u
 move.l d2,a6            ;            [ a6 ok ]
 moveq #0,d6
 move.b (a2)+,d6        ; v            [ d6 ok ]
 moveq #0,d3
 move.b (a0)+,d3        ; k
 move.l d3,d0
 add.l d0,d0
 add.l d0,d3            ; 3k            [ d3 ok ]
 moveq #0,d4
 move.b (a0)+,d4        ; l
 move.l d4,d0
 add.l d0,d0
 add.l d0,d4            ; 3l            [ d4 ok ]
 add.l d3,d5            ; 3k + 1a    (this-up)
 add.l d3,d2            ; 3k + 1u    (this-dn)
 move.l d1,d0            ; 1b
 add.l d4,d0            ; 3l + 1b    (next-up)
 add.l d5,d0            ; this + next (up)
 add.l d5,d5            ; this *2
 add.l d5,d0            ; this *3 + next *1
 add.l d5,d5            ; this *4
 addq.l #8,d5            ; rounding with 8
 lsr.l #4,d5            ; /16
 move.b d5,(a3)+        ; top-left pixel
 addq.l #7,d0            ; rounding with 7
 lsr.l #4,d0            ; /16
 move.b d0,(a3)+        ; top-right pixel
 move.l d4,d0            ; 3l
 add.l d6,d0            ; 3l + 1v    (next-dn)
 add.l d2,d0            ; this + next (dn)
 add.l d2,d2            ; this *2
 add.l d2,d0            ; this *3 + next *1
 add.l d2,d2            ; this *4
 addq.l #7,d2            ; rounding with 7
 lsr.l #4,d2            ; /16
 move.b d2,(a4)+        ; bottom-left pixel
 addq.l #8,d0            ; rounding with 8
 lsr.l #4,d0
 move.b d0,(a4)+        ; bottom-right pixel

; general case
.loop
 move.l d1,d2            ; b
 add.l d2,d2
 add.l d1,d2            ; 3b
 move.l d3,d0            ; (save 3k)
 move.l d3,d5            ; (oops... forgot this one)
 add.l a5,d5            ; 3k + 1a
 move.l d1,a5            ; b            [ a5 ok ]
 move.b (a1)+,d1        ; c            [ d1 ok ]
 move.l d4,d3            ; 3l            [ d3 ok ]
 add.l d4,d4
 add.l d3,d4            ; *3 -> 9l
 add.l d4,d2            ; 9l + 3b
 add.l d2,d5            ; 9l + 3b + 3k + 1a
; here : d4=9l, d2=9l+3b, d0=3k
 addq.l #8,d5            ; +8 to round
 lsr.l #4,d5            ; >>4
 move.b d5,(a3)+        ; and here is our top-left pixel
 add.l a6,d0            ; 3k + 1u
 move.l d6,a6            ; v            [ a6 ok ]
 add.l d6,d6
 add.l a6,d6            ; 3v
 add.l d4,d6            ; 9l + 3v
 moveq #0,d5
 move.b (a0)+,d5        ; m
 move.l d5,d4
 add.l d5,d5
 add.l d5,d4            ; 3m            [ d4 ok ]
 add.l d4,d2            ; 9l + 3b + 3m
 add.l d1,d2            ; 9l + 3b + 3m + 1c
 addq.l #7,d2            ; +7 to round
 lsr.l #4,d2            ; >>4
 move.b d2,(a3)+        ; and here is our top-right pixel
; here : d0=3k+1u, d4=3m, d6=9l+3v
 add.l d6,d0            ; 9l + 3v + 3k + 1u
 addq.l #7,d0            ; +7 to round
 lsr.l #4,d0
 move.b d0,(a4)+        ; and here is our bottom-left pixel
; here : d4=3m, d6=9l+3v
 move.l d6,d2            ; 9l + 3v
 add.l d4,d2            ; 9l + 3v + 3m
 moveq #0,d6
 move.b (a2)+,d6        ; w            [ d6 ok ]
 add.l d6,d2
 addq.l #8,d2            ; +8 to round
 lsr.l #4,d2
 move.b d2,(a4)+        ; et voilĂ  notre pixel bottom-right
 dbf d7,.loop

; particular case of the last column
 add.l d4,d1            ; 3m + 1c    (this-up)
 add.l d4,d6            ; 3m + 1w    (this-dn)
 move.l d3,d0
 add.l a5,d0            ; 3l + 1b    (last-up)
 add.l a6,d3            ; 3l + 1v    (last-dn)
 add.l d1,d0            ; this + last (up)
 add.l d1,d1            ; this *2
 add.l d1,d0            ; this *3 + last *1
 add.l d1,d1            ; this *4
 addq.l #8,d0            ; rounding with 8
 lsr.l #4,d0            ; /16
 move.b d0,(a3)+        ; pixel top-left
 addq.l #7,d1            ; rounding with 7
 lsr.l #4,d1            ; /16
 move.b d1,(a3)+        ; pixel top-right
 add.l d6,d3            ; this + last (dn)
 add.l d6,d6            ; this *2
 add.l d6,d3            ; this *3 + last *1
 add.l d6,d6            ; this *4
 addq.l #7,d3            ; rounding with 7
 lsr.l #4,d3            ; /16
 move.b d3,(a4)+        ; pixel bottom-left
 addq.l #8,d6            ; rounding with 8
 lsr.l #4,d6
 move.b d6,(a4)+        ; pixel bottom-right

; line loop
 movem.l (a7)+,d6-d7/a5-a6
 subq.w #1,d6
 bne .yloop
 rts
Note : I translated the comments, but they're not really readable even now...

Quote:
Originally Posted by Thorham View Post
Anyway, have you found anything to optimize in the jpeg part?
Yep. Those interested can find the code here :
http://meynaf.free.fr/tmp/v.zip
meynaf is offline  
Old 29 January 2008, 10:01   #63
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
A new computer, child of the Amiga, is a dream of mine. I have some specs on how it could look like from a programmer's point of view, but I'm no electronician either.
I would love to see your ideas, could you post them, please?
Quote:
Originally Posted by meynaf
Basically, when the cpu writes something, it doesn't wait for the write to complete before doing something else in registers. So it's a good habit to put register-only code after memory writes if you can.
That's simple enough! Should be no problem to make this a habit.
Quote:
Originally Posted by meynaf
As I already said : just do it.
It's something I have wanting to try for a long time. Setting this thing up is probably a bit of a pain, but should be no problem, intels own docs explain this quite well. Of course, I'd first have to learn i386 assembler as a bare minimum, and that should be easy enough since there are a variety of tools out there (I even have one of them), which allow win32 coding in asm, and I have good docs on this.
Quote:
Originally Posted by meynaf
Cool.
Done I'm now using my miggy to get online
Quote:
Originally Posted by meynaf
Here it is. Triangular 2x2 upsampling code in asm. See the original jdsample.c in the jpeg library for more info.
Note : I translated the comments, but they're not really readable even now...
Cool. I'll just read up on what's really happening for this type of up sampling. It seems a little different from bilinear.
Quote:
Originally Posted by meynaf
Yep. Those interested can find the code here :
There's a lot to look at, right now! Should I just concentrate on doing the png codec, and wait until you've done the jpeg codec in full asm? Or should I try to get the jpeg codec to compile with Storm/Sas/Dice/VBCC? Any pointers are appreciated!
Thorham is offline  
Old 29 January 2008, 11:05   #64
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
I would love to see your ideas, could you post them, please?
That would require a new thread, if not a whole site !

Basically, it's making an open, lightweight, efficient, cool to code, computer architecture, which remains in adequation with todays requirements.
Remember : architectures persist longer than implementations.

The machine has to be some sort of "generic box" like PCs are meant to be.
It must be user friendly as well as programmer friendly, like Amigas are.

I am unsure an Amiga board is the right place to discuss of this.
Quote:
Originally Posted by Thorham View Post
That's simple enough! Should be no problem to make this a habit.
On fastmem it works as expected, however there are some (usually bad) surprises with chipmem ; see this thread : http://eab.abime.net/showthread.php?t=34481

Quote:
Originally Posted by Thorham View Post
It's something I have wanting to try for a long time. Setting this thing up is probably a bit of a pain, but should be no problem, intels own docs explain this quite well. Of course, I'd first have to learn i386 assembler as a bare minimum, and that should be easy enough since there are a variety of tools out there (I even have one of them), which allow win32 coding in asm, and I have good docs on this.
Yeah, learn i386 assembler, and see its beautiful syntax, its numerous general-purpose registers, its powerful addressing modes...
See how it's easy to bang on the hardware, how well it is documented...
See the poetry of the various memory models...

Quote:
Originally Posted by Thorham View Post
Done I'm now using my miggy to get online
Good. I'm sure you feel much better now.

Quote:
Originally Posted by Thorham View Post
Cool. I'll just read up on what's really happening for this type of up sampling. It seems a little different from bilinear.
For 2x2 you basically make an average with :
. 9/16 of current pixel value
. 3/16 of left or right pixel value
. 3/16 of top or bottom pixel value
. 1/16 of diagonal pixel value

How would a bilinear filter do that ? (you have 1 pixel and want to output 4)
A box filter would simply copy them around ; not good.

Quote:
Originally Posted by Thorham View Post
There's a lot to look at, right now! Should I just concentrate on doing the png codec, and wait until you've done the jpeg codec in full asm? Or should I try to get the jpeg codec to compile with Storm/Sas/Dice/VBCC? Any pointers are appreciated!
Feel free to do whatever you wish.

I don't have Dice or VBCC, and I don't like SasC's command-line stuff.
However I have StormC, maybe not the latest version, but I may look if I can do a project file, so you can compile the project.
The jpeg library compiles litteraly everywhere, but linking with asm (especially mine ) is something else (I wouldn't try this with gcc).

On the other hand you can simply disable the jpeg support and assemble the program (you would notice a major exe size drop then).
meynaf is offline  
Old 29 January 2008, 11:46   #65
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
That would require a new thread, if not a whole site !

Basically, it's making an open, lightweight, efficient, cool to code, computer architecture, which remains in adequation with todays requirements.
Remember : architectures persist longer than implementations.

The machine has to be some sort of "generic box" like PCs are meant to be.
It must be user friendly as well as programmer friendly, like Amigas are.

I am unsure an Amiga board is the right place to discuss of this.
Yes, it is quite off-topic! But a discussion could go in the off-topic section. Basically it's the ot-stupidity forum where all the silly stuff goes, the ot-general section should do fine!
Quote:
Originally Posted by meynaf
On fastmem it works as expected, however there are some (usually bad) surprises with chipmem ; see this thread : http://eab.abime.net/showthread.php?t=34481
Yes, I've read it, and there are some mighty strange things happening. This really deserves a closer look.
Quote:
Originally Posted by meynaf
Yeah, learn i386 assembler, and see its beautiful syntax, its numerous general-purpose registers, its powerful addressing modes...
See how it's easy to bang on the hardware, how well it is documented...
See the poetry of the various memory models...
They've done it for linux/windows (and 4gw, if I got the name right) etc. If they can do it, then so can I Of course, it won't be pretty. And that's probably an understatement. I still want to get into this, sooner or later, so I'm just going to have to cope!
Quote:
Originally Posted by meynaf
Good. I'm sure you feel much better now.
Yeah, man, I should've done this way sooner
Quote:
Originally Posted by meynaf
For 2x2 you basically make an average with :
. 9/16 of current pixel value
. 3/16 of left or right pixel value
. 3/16 of top or bottom pixel value
. 1/16 of diagonal pixel value
With diagonal I suppose you mean x+1,y+1, where the current pixel is just x,y?
Quote:
Originally Posted by meynaf
How would a bilinear filter do that ? (you have 1 pixel and want to output 4)
A box filter would simply copy them around ; not good.
If I've got it right, then this code should do bilinear interpolation for 2x2:
Code:
For yy=0 To 511 Step 2
    If InKey$<>"" Then Stop
    For xx=0 To 639 Step 2
    
        xxx=xx\2+640:yyy=yy\2
          p1=pointg(xxx-1,yyy-1):p2=Pointg(xxx,yyy-1):p3=Pointg(xxx+1,yyy-1)
          p4=pointg(xxx-1,yyy):p5=Pointg(xxx,yyy):p6=Pointg(xxx+1,yyy)
          p7=pointg(xxx-1,yyy+1):p8=Pointg(xxx,yyy+1):p9=Pointg(xxx+1,yyy+1)
  
        p=(p1+p2+p4+p5)\4:plot(xx,yy+512,p)
        p=(p2+p3+p5+p6)\4:plot(xx+1,yy+512,p)
          p=(p4+p5+p7+p8)\4:plot(xx,yy+1+512,p)
          p=(p5+p6+p8+p9)\4:plot(xx+1,yy+1+512,p)

    Next
Next
Of course, this is freebasic. Pointg is a procedure that simply returns the gray value for an rgb pixel. And plot writes a gray scale (0-255) pixel to screen. Then the image which has to be interpolated is located at 640,0 and is 320x256 gray scale pixels. The interpolated result is located at 0,512 and is 640x512 gray scale pixels. At location 0,0 there is the original which is scaled down. Of coarse the down scaling is not in this code. The \ is simply the same as /, except that \ does not round after dividing. I sure hope that's all clear...
Quote:
Originally Posted by meynaf
Feel free to do whatever you wish.

I don't have Dice or VBCC, and I don't like SasC's command-line stuff.
However I have StormC, maybe not the latest version, but I may look if I can do a project file, so you can compile the project.
The jpeg library compiles litteraly everywhere, but linking with asm (especially mine ) is something else (I wouldn't try this with gcc).

On the other hand you can simply disable the jpeg support and assemble the program (you would notice a major exe size drop then).
I would really appreciate it if you could do a Storm project. I have Storm 3, should be good enough, right? It would really be nice if I can compile the whole program as a simple Storm project; it would mean that when I find some optimizations that I can actually test them, instead of guessing if it's going to work or not. Not being able to run tests, just plain sucks, as you know. Again, if you could do it, then many, many thanks
Thorham is offline  
Old 29 January 2008, 12:41   #66
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Yes, it is quite off-topic! But a discussion could go in the off-topic section. Basically it's the ot-stupidity forum where all the silly stuff goes, the ot-general section should do fine!
Ot-general thread opened :
http://eab.abime.net/showthread.php?t=34571

Quote:
Originally Posted by Thorham View Post
Yes, I've read it, and there are some mighty strange things happening. This really deserves a closer look.
I think my program to measure execution times will run a lot next week-end.

Quote:
Originally Posted by Thorham View Post
They've done it for linux/windows (and 4gw, if I got the name right) etc. If they can do it, then so can I Of course, it won't be pretty. And that's probably an understatement. I still want to get into this, sooner or later, so I'm just going to have to cope!
Let me know when you achieve something...

Quote:
Originally Posted by Thorham View Post
Yeah, man, I should've done this way sooner
But finally you did it. Now you're a Man

Quote:
Originally Posted by Thorham View Post
With diagonal I suppose you mean x+1,y+1, where the current pixel is just x,y?
Not really but it's close.

You have 4 pixels to write : up-left, up-right, down-left, down-right.
All of them get 9/16 of (x,y), and :
- For up-left : 1/16 of (x-1,y-1), 3/16 of (x-1,y), 3/16 of (x,y-1)
- For up-right : 1/16 of (x+1, y-1), 3/16 of (x+1,y), 3/16 of (x,y-1)
- For down-left : 1/16 of (x-1,y+1), 3/16 of (x-1,y), 3/16 of (x,y+1)
- For down-right : 1/16 of (x+1,y+1), 3/16 of (x+1,y), 3/16 of (x,y+1)

Quote:
Originally Posted by Thorham View Post
If I've got it right, then this code should do bilinear interpolation for 2x2:

Of course, this is freebasic. Pointg is a procedure that simply returns the gray value for an rgb pixel. And plot writes a gray scale (0-255) pixel to screen. Then the image which has to be interpolated is located at 640,0 and is 320x256 gray scale pixels. The interpolated result is located at 0,512 and is 640x512 gray scale pixels. At location 0,0 there is the original which is scaled down. Of coarse the down scaling is not in this code. The \ is simply the same as /, except that \ does not round after dividing. I sure hope that's all clear...
It's clear enough. Apparently bilinear interpolation is the same as triangular, but it weights the pixels with 1/1/1/1 instead of 9/3/3/1.

Quote:
Originally Posted by Thorham View Post
I would really appreciate it if you could do a Storm project. I have Storm 3, should be good enough, right? It would really be nice if I can compile the whole program as a simple Storm project; it would mean that when I find some optimizations that I can actually test them, instead of guessing if it's going to work or not. Not being able to run tests, just plain sucks, as you know. Again, if you could do it, then many, many thanks
I don't know if I will succeed, but I'm going to try it this week-end.
meynaf is offline  
Old 29 January 2008, 13:56   #67
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
Cool It's just a pity that posting there doesn't increase the post count!
Quote:
Originally Posted by meynaf
I think my program to measure execution times will run a lot next week-end.
I'd say. I've seen the thread and theres a lot to test.
Quote:
Originally Posted by meynaf
Let me know when you achieve something...
Well, I wouldn't wait on it if I were you. It's not going to happen any time soon. I have a little compiler project I'm doing on the amiga, and one of the goals is to get this to compile pc code. Only when the 680x0 part is done will I ever learn ia32 code. So, that really could take a while.
Quote:
Originally Posted by meynaf
But finally you did it. Now you're a Man
Yippy, I'm a MAN now, hurray
Quote:
Originally Posted by meynaf
Not really but it's close.

You have 4 pixels to write : up-left, up-right, down-left, down-right.
All of them get 9/16 of (x,y), and :
- For up-left : 1/16 of (x-1,y-1), 3/16 of (x-1,y), 3/16 of (x,y-1)
- For up-right : 1/16 of (x+1, y-1), 3/16 of (x+1,y), 3/16 of (x,y-1)
- For down-left : 1/16 of (x-1,y+1), 3/16 of (x-1,y), 3/16 of (x,y+1)
- For down-right : 1/16 of (x+1,y+1), 3/16 of (x+1,y), 3/16 of (x,y+1)
Right. I've implemented it, and is does a much better job then simple bilinear. I'm simply not using that anymore. The image with this one is much sharper! Another plus is that it's not even going to be much slower then bilinear when implemented in asm. I might just do an asm version for the fun of it, maybe I can beat your version

By the way, you may still try bilinear for a speed gain on a plain a1200. I'll try it with my ycbcr program in basic.
Quote:
Originally Posted by meynaf
It's clear enough. Apparently bilinear interpolation is the same as triangular, but it weights the pixels with 1/1/1/1 instead of 9/3/3/1.
Yes, it does seem very similar. Here it is in basic:
Code:
p=(p1*1+p2*3+p4*3+p5*9)\16:plot(xx+640,yy+512,p)
p=(p2*3+p3*1+p5*9+p6*3)\16:plot(xx+1+640,yy+512,p)
p=(p4*3+p5*9+p7*1+p8*3)\16:plot(xx+640,yy+1+512,p)
p=(p5*9+p6*3+p8*3+p9*1)\16:plot(xx+1+640,yy+1+512,p)
Was very simple to modify the code, and the results are very good, too!
Quote:
Originally Posted by meynaf
I don't know if I will succeed, but I'm going to try it this week-end.
While I do hope you succeed, even if you don't, I really appreciate the fact that you're going to try it. Thank you

Last edited by Thorham; 29 January 2008 at 14:02.
Thorham is offline  
Old 29 January 2008, 14:54   #68
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Cool It's just a pity that posting there doesn't increase the post count!
I don't care a lot about the post count.
(do I sound credible ?)

Quote:
Originally Posted by Thorham View Post
Well, I wouldn't wait on it if I were you. It's not going to happen any time soon. I have a little compiler project I'm doing on the amiga, and one of the goals is to get this to compile pc code. Only when the 680x0 part is done will I ever learn ia32 code. So, that really could take a while.
That could take more than a while ! I dunno what you intend to compile, but writing a compiler is one of the toughest thing there are. You've put yourself into a hard task

Quote:
Originally Posted by Thorham View Post
Yippy, I'm a MAN now, hurray
Yep. Now you are allowed to bash other people's peecees

Quote:
Originally Posted by Thorham View Post
Right. I've implemented it, and is does a much better job then simple bilinear. I'm simply not using that anymore. The image with this one is much sharper! Another plus is that it's not even going to be much slower then bilinear when implemented in asm. I might just do an asm version for the fun of it, maybe I can beat your version
If you can do it with the exact same quality in less than 120 clock cycles per source pixel (-> 4 destination pixels), then let me know, I'll include it in the project asap !
(mine is actually 118 if I counted right)

Quote:
Originally Posted by Thorham View Post
By the way, you may still try bilinear for a speed gain on a plain a1200. I'll try it with my ycbcr program in basic.
You like damaging quality, don't you ?

Quote:
Originally Posted by Thorham View Post
While I do hope you succeed, even if you don't, I really appreciate the fact that you're going to try it. Thank you
At worse you can compile sources separately, then link them manually with e.g. phxlnk. Not very practical but better than nothing.
meynaf is offline  
Old 29 January 2008, 16:05   #69
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
I don't care a lot about the post count.
(do I sound credible ?)

Quote:
Originally Posted by meynaf
That could take more than a while ! I dunno what you intend to compile, but writing a compiler is one of the toughest thing there are. You've put yourself into a hard task
It's for an object oriented language which I want to be able to handle both low-level and high-level programming properly. Some of the stuff in such a compiler is pretty easy, such as the object handling, other things, I haven't figured out, but some of them could be quite hard. Any comments on what the hard parts are?
Quote:
Originally Posted by meynaf
Yep. Now you are allowed to bash other people's peecees
Yeah, cool. Lot's of
Quote:
Originally Posted by meynaf
If you can do it with the exact same quality in less than 120 clock cycles per source pixel (-> 4 destination pixels), then let me know, I'll include it in the project asap !
(mine is actually 118 if I counted right)
Here's a quick version:
Code:
;Bilinear 2x2
;
;For triangular the averageing blocks should
;look something like this:
;
;    move.l    d0,d7
;    lsl.l    #3,d7
;    add.l    d0,d7
;    add.l    d1,d7
;    add.l    d1,d1
;    add.l    d1,d7
;    add.l    d2,d7
;    add.l    d2,d2
;    add.l    d2,d7
;    add.l    d3,d7
;    lsr.l    #4,d7
;
;Note that for equal weights of 1, the order
;is not important. For triangular in the above
;example they have to be done in the right order.
;But, of course, you knew that, lol.
;

Filter
    move.l    In,a0
    sub.l    #Width,a0
    move.l    In,a1
    move.l    In,a2
    add.l    #Width,a2
    move.l    Out,a3
    move.l    Out,a4
    add.l    #Width*2,a4

    move.l    #Width-1,d6
    
    moveq    #0,d0
    moveq    #0,d1
    moveq    #0,d2
    moveq    #0,d3
    moveq    #0,d4
    moveq    #0,d5
.lp
    move.b    (a0)+,d0    ;Read block 1
    move.b    (a0)+,d1
    move.b    (a1)+,d2
    move.b    (a1)+,d3
    
    move.l    d0,d7        ;Calc averages
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    
    move.b    d7,(a3)+    ;Write pixel 1

    move.b    (a0)+,d4    ;Read block 2
    move.b    (a1)+,d5
    
    move.l    d1,d7        ;Calc averages
    add.l    d4,d7
    add.l    d3,d7
    add.l    d5,d7
    lsr.l    #2,d7

    move.b    d7,(a3)+    ;Write pixel 2

    move.b    (a2)+,d0    ;Read block 3
    move.b    (a2)+,d1
    
    move.l    d2,d7        ;Calc averages
    add.l    d3,d7
    add.l    d0,d7
    add.l    d1,d7
    lsr.l    #2,d7
    
    move.b    d7,(a4)+    ;Write pixel 3
    
    move.b    (a2)+,d0    ;Read block 4
    
    move.l    d3,d7        ;Calc averages
    add.l    d5,d7
    add.l    d1,d7
    add.l    d0,d7
    lsr.l    #2,d7
    
    move.b    d7,(a4)+    ;Write pixel 4
    
    dbra    d6,.lp
The reason I posted it like this is that it could be pretty easy to have two versions of the jpeg codec: one for a plain 1200 with all the quality sacrifices, and one for fast amigas in full quality. Furthermore, even with the ycbcr accuracy reduced plus bilinear, the quality should still be more then acceptable on a plain 1200. Just a thought Oh, by the way, this is still completely unoptimized
Quote:
Originally Posted by meynaf
You like damaging quality, don't you ?
Yes, but only if it doesn't get ugly. I've already tried it, and it really doesn't look bad. It's just the same with the ycbcr degradation; the loss is minimal, and should be great for a second version of the codec. This can be done easily when the hq codec is finished. See, there's actually a use for this
Quote:
Originally Posted by meynaf
At worse you can compile sources separately, then link them manually with e.g. phxlnk. Not very practical but better than nothing.
If you think that will work, I'll try that first. If it does, then you can skip making a storm project. Since it's only Tuesday, I've got plent of time left.
Thorham is offline  
Old 29 January 2008, 17:23   #70
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
It's for an object oriented language which I want to be able to handle both low-level and high-level programming properly. Some of the stuff in such a compiler is pretty easy, such as the object handling, other things, I haven't figured out, but some of them could be quite hard. Any comments on what the hard parts are?
The hard parts are probably in the code generation. You probably knew that compilers generate ugly code ; now you'll discover why

Quote:
Originally Posted by Thorham View Post
Yeah, cool. Lot's of
And lots of in return. Even more fun

Quote:
Originally Posted by Thorham View Post
The reason I posted it like this is that it could be pretty easy to have two versions of the jpeg codec: one for a plain 1200 with all the quality sacrifices, and one for fast amigas in full quality. Furthermore, even with the ycbcr accuracy reduced plus bilinear, the quality should still be more then acceptable on a plain 1200. Just a thought Oh, by the way, this is still completely unoptimized
Unoptimized, and certainly untested : it won't work
You are reading 3 bytes for each source line in each loop ; you should only read 1 or adjust pointers afterwards (or the funniest way : keep the old values).

But, pal, people having a plain a1200 are already prepared to wait ages before the image shows up, so a very slightly faster version won't fit them.
(said otherwise : when you have to wait a century, you're not after a few years...)

And, oh, yes, I've counted the clock cycles of your version and ended up with 144/loop (slower than mine, heheh). How will the optimized version look like ?

Quote:
Originally Posted by Thorham View Post
Yes, but only if it doesn't get ugly. I've already tried it, and it really doesn't look bad. It's just the same with the ycbcr degradation; the loss is minimal, and should be great for a second version of the codec. This can be done easily when the hq codec is finished. See, there's actually a use for this
A quick-and-dirty version will be better with a box filter IMO. And yes, I like to contradict people

Quote:
Originally Posted by Thorham View Post
If you think that will work, I'll try that first. If it does, then you can skip making a storm project. Since it's only Tuesday, I've got plent of time left.
That sure will work. The asm has to appear first in the list of objects and there mustn't be any C startup/cleanup code.

But even if you're successful in that way, I will try the StormC project.
meynaf is offline  
Old 29 January 2008, 17:51   #71
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
Unoptimized, and certainly untested : it won't work
You are reading 3 bytes for each source line in each loop ; you should only read 1 or adjust pointers afterwards (or the funniest way : keep the old values).
Hadn't thought of that one! Well, that's what you get from a plain version like mine. I think I posted it too quickly.
Quote:
Originally Posted by meynaf
But, pal, people having a plain a1200 are already prepared to wait ages before the image shows up, so a very slightly faster version won't fit them.
(said otherwise : when you have to wait a century, you're not after a few years...)
Good point, I rest my case, and will not suggest anything that sacrifices quality again. I swear it on my ....
Quote:
Originally Posted by meynaf
And, oh, yes, I've counted the clock cycles of your version and ended up with 144/loop (slower than mine, heheh). How will the optimized version look like ?
This was only a quick version to see what you think. I guess I'll try optimizing just for the fun of it. I guess I'll fix it first, and make sure I haven't made any mistakes before posting.
Quote:
Originally Posted by meynaf
A quick-and-dirty version will be better with a box filter IMO. And yes, I like to contradict people
Nearest neighbor, right? Yeah, that's the fastest! And if I were you, I wouldn't stop contradicting people, since they really aren't always right
Quote:
Originally Posted by meynaf
That sure will work. The asm has to appear first in the list of objects and there mustn't be any C startup/cleanup code.
Good. That's pretty easy, I guess. Or so I hope! Which compiler do you think produces the best code: storm, sas, dice or vbcc? Would be nice to use the best one.
Quote:
Originally Posted by meynaf
But even if you're successful in that way, I will try the StormC project.
Thanks again
Thorham is offline  
Old 29 January 2008, 18:26   #72
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Hadn't thought of that one! Well, that's what you get from a plain version like mine. I think I posted it too quickly.
Anyway you know next time I won't miss you

Quote:
Originally Posted by Thorham View Post
Good point, I rest my case, and will not suggest anything that sacrifices quality again. I swear it on my ....
You swear on your what ?
(/me tries to look innocent and fails)

Quote:
Originally Posted by Thorham View Post
This was only a quick version to see what you think. I guess I'll try optimizing just for the fun of it. I guess I'll fix it first, and make sure I haven't made any mistakes before posting.
Damn. It will be harder for me next time then.

Quote:
Originally Posted by Thorham View Post
Nearest neighbor, right? Yeah, that's the fastest! And if I were you, I wouldn't stop contradicting people, since they really aren't always right
Good. That's pretty easy, I guess. Or so I hope! Which compiler do you think produces the best code: storm, sas, dice or vbcc? Would be nice to use the best one.
Yes, the box filter simply replicates the pixels, so it ought to be fast...

The compiler which produces the best code on 68k is gcc, but you can't use it to link with asm because of its incompatible object format (with hunk2gcc and gcc's linker it might be possible though).

For the others I frankly don't know. They are the same (1) to me.

(1) : add the "crap" word here if you like, else leave it blank
Quote:
Originally Posted by Thorham View Post
Thanks again
No problem.
meynaf is offline  
Old 30 January 2008, 16:32   #73
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
Damn. It will be harder for me next time then.
Well, it might be. I've redone the interpolation code. The method is tested in basic, and it seems to be as good as it's supposed to be, except for the fact that I'm testing on an old monitor. I had an accident with my LG Studioworks, and now it's cable is broken. Until I can sort that out, I can't test properly. I can make some test images if you want, though. Here's the new code:
Code:
Filter
    move.l    In,a0
    sub.l    #Width,a0
    move.l    In,a1
    move.l    In,a2
    add.l    #Width,a2
    move.l    Out,a3
    move.l    Out,a4
    add.l    #Width*2,a4

    move.l    #Width/2-1,d6
    
    moveq    #0,d0
    moveq    #0,d1
    moveq    #0,d2
    moveq    #0,d3
    moveq    #0,d4
    moveq    #0,d5
.lpen                ;Entry code (unoptimized)
    move.b    (a0)+,d0
    move.b    (a0)+,d1
    move.b    (a1)+,d2
    move.b    (a1)+,d3

    move.l    d0,d7
    lsl.l    #3,d7
    add.l    d3,d7
    add.l    d3,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d2,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left

    move.l    d0,d7
    add.l    d0,d7
    add.l    d0,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    
    move.l    d0,d7
    add.l    d0,d7
    add.l    d0,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a4)+    ;Write bottom-left

    move.l    d0,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right

.lp                ;Rest of row. Here d1 and d2 contain old values
    move.b    (a0)+,d0
    move.b    (a1)+,d2
    move.l    d1,d7        ;x8 x3 x3 x2
    lsl.l    #3,d7
    add.l    d2,d7
    add.l    d2,d7
    move.l    d0,a5
    add.l    a5,a5
    add.l    d0,a5
    add.l    a5,d7
    move.l    d3,d4
    add.l    d4,d4
    add.l    d3,d4
    add.l    d4,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left
    move.l    d1,d5        ;x3 x3 x1 x1
    add.l    d5,d5
    add.l    d1,d5
    move.l    d5,d7
    add.l    a5,d7
    add.l    d3,d7
    add.l    d2,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    add.l    d0,d5        ;x3 x1 x3 x1
    add.l    d4,d5
    add.l    d2,d5
    lsr.l    #3,d5
    move.b    d5,(a4)+    ;Write bottom-left
    move.l    d1,d7        ;x1 x1 x1 x1
    add.l    d0,d7
    add.l    d3,d7
    add.l    d2,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right

;Next four pixels. Here d0 and d2 contain old values.

    move.b    (a0)+,d1
    move.b    (a1)+,d3
    move.l    d0,d7        ;x8 x3 x3 x2
    lsl.l    #3,d7
    add.l    d3,d7
    add.l    d3,d7
    move.l    d1,a5
    add.l    d5,a5
    add.l    d1,a5
    add.l    a5,d7
    move.l    d2,d4
    add.l    d4,d4
    add.l    d2,d4
    add.l    d4,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left
    move.l    d0,d5        ;x3 x3 x1 x1
    add.l    d5,d5
    add.l    d0,d5
    move.l    d5,d7
    add.l    a5,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    add.l    d1,d5        ;x3 x1 x3 x1
    add.l    d4,d5
    add.l    d3,d5
    lsr.l    #3,d5
    move.b    d5,(a4)+    ;Write bottom-left
    move.l    d0,d7        ;x1 x1 x1 x1
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right
    dbf    d6,.lp

;Here some exit code for the last pixels in the row is needed.
Notice how the loop does eight pixels in one go now, for only four reads! It does eight pixels so some old values can be used easily, and it still fits in the cache easily.

Furthermore the inner loop is somewhat optimized, while the entry code is not, although it can be optimized in the same way as I did for the rest of the code. If this is as good as it's supposed to be (which I can't tell now) then try to beat it
Thorham is offline  
Old 30 January 2008, 17:29   #74
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Well, it might be. I've redone the interpolation code. The method is tested in basic, and it seems to be as good as it's supposed to be, except for the fact that I'm testing on an old monitor. I had an accident with my LG Studioworks, and now it's cable is broken. Until I can sort that out, I can't test properly. I can make some test images if you want, though.
Too bad. To avoid this I have 2 working monitors and a 3rd which nearly works.
Maybe you could get your hands on a 1083S or similar monitor...

Quote:
Originally Posted by Thorham View Post
Here's the new code:
Oh yes some fresh code to look at !

At first glance I'd say that you're reading from 2 sources, not 3.
Shouldn't you access 3 lines (previous, current, next) ?

Quote:
Originally Posted by Thorham View Post
Notice how the loop does eight pixels in one go now, for only four reads! It does eight pixels so some old values can be used easily, and it still fits in the cache easily.
No need to unroll loops, speed would have been exactly the same if you didn't duplicate the code.

Quote:
Originally Posted by Thorham View Post
Furthermore the inner loop is somewhat optimized, while the entry code is not, although it can be optimized in the same way as I did for the rest of the code. If this is as good as it's supposed to be (which I can't tell now) then try to beat it
How hard is it to beat it ? Let's see...

All lemm... errrh... clock cycles accounted for : 100 per 4-pixel write.
Ok it's fast (18% as compared to mine). But for the quality I have serious doubts (see my remark above about reading only 2 lines).

Anyway it doesn't perform the exact same work.
meynaf is offline  
Old 30 January 2008, 17:43   #75
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
Too bad. To avoid this I have 2 working monitors and a 3rd which nearly works.
Maybe you could get your hands on a 1083S or similar monitor...
Yeah, it is. I'll need a new cable. The monitor has to be for the pc, though, although I wouldn't mind having an amiga monitor...
Quote:
Originally Posted by meynaf
Oh yes some fresh code to look at !

At first glance I'd say that you're reading from 2 sources, not 3.
Shouldn't you access 3 lines (previous, current, next) ?
Not for this algorithm.
Quote:
Originally Posted by meynaf
No need to unroll loops, speed would have been exactly the same if you didn't duplicate the code.
Well, this actually saves a few instructions, and as a side effect, the dbf instruction get's executed only half the amount it would normally. So it is a little faster in this case, just look at the register usage.
Quote:
Originally Posted by meynaf
But for the quality I have serious doubts (see my remark above about reading only 2 lines).
It looks just as good on this end, but the monitor is a lot smaller and less sharp, so if there is a quality difference, I can't see it!
Quote:
Originally Posted by meynaf
Anyway it doesn't perform the exact same work.
If a replacement algorithm delivers equal quality, then it doesn't have to. Of course, this remains to be seen.

Man, this sucks. I didn't want to do this today, but I'm going to try and repair the cable. I really can't work like this, argh
Thorham is offline  
Old 30 January 2008, 18:06   #76
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Yeah, it is. I'll need a new cable. The monitor has to be for the pc, though, although I wouldn't mind having an amiga monitor...
You use a pc monitor on your amiga ?

Quote:
Originally Posted by Thorham View Post
Not for this algorithm.
Well, this actually saves a few instructions, and as a side effect, the dbf instruction get's executed only half the amount it would normally. So it is a little faster in this case, just look at the register usage.
The dbf instruction is pipelined in the last write and actually amounts for 0, so if you divide it by two it'll still be 0...
However if it saves you some moves then it's ok.

Quote:
Originally Posted by Thorham View Post
It looks just as good on this end, but the monitor is a lot smaller and less sharp, so if there is a quality difference, I can't see it!
On a pc screen you have to damage an image quite a lot before seeing a real difference...

Quote:
Originally Posted by Thorham View Post
If a replacement algorithm delivers equal quality, then it doesn't have to. Of course, this remains to be seen.
This has to be seen, for sure.

Quote:
Originally Posted by Thorham View Post
Man, this sucks. I didn't want to do this today, but I'm going to try and repair the cable. I really can't work like this, argh
I'm sad for your cable. R.I.P.
meynaf is offline  
Old 30 January 2008, 19:07   #77
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
You use a pc monitor on your amiga ?
Yep, unfortunately. When my last 1084 broke down, I decided not to buy a 'new' one, because they're all old, of course. Around the same time I also got a pc, and so I decided to buy an svga monitor and a video/tv box. Now I have to make do with using my miggy's video out. Not ideal.
Quote:
Originally Posted by meynaf
The dbf instruction is pipelined in the last write and actually amounts for 0, so if you divide it by two it'll still be 0...
However if it saves you some moves then it's ok.
Cool That means I can pipeline some more stuff, actually. I can fit some register only instructions after the first read. Great!
Quote:
Originally Posted by meynaf
On a pc screen you have to damage an image quite a lot before seeing a real difference...
Not on my studioworks in 1280x1024x24bit! Although some color differences are a bit hard to spot. Also, testing things on the pc in super hires 24bit is much better then doing it on the amiga.
Quote:
Originally Posted by meynaf
This has to be seen, for sure.
Yes, and I can't wait. It shouldn't be to difficult to use an extra read, though.
Quote:
Originally Posted by meynaf
I'm sad for your cable. R.I.P.
Thank you Anyway, someone I know has a broken monitor, so I can use his cable! Tomorrow I'll have it fixed
Thorham is offline  
Old 31 January 2008, 09:51   #78
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Yep, unfortunately. When my last 1084 broke down, I decided not to buy a 'new' one, because they're all old, of course. Around the same time I also got a pc, and so I decided to buy an svga monitor and a video/tv box. Now I have to make do with using my miggy's video out. Not ideal.
So you have some sort of a scan doubler in your video/tv box ?

Quote:
Originally Posted by Thorham View Post
Cool That means I can pipeline some more stuff, actually. I can fit some register only instructions after the first read. Great!
After the first read ? You won't gain anything by doing so. It's the writes that can be pipelined.
(Well, in chipmem things are a little bit more complex, but here we're accessing fastmem only.)

Quote:
Originally Posted by Thorham View Post
Not on my studioworks in 1280x1024x24bit! Although some color differences are a bit hard to spot. Also, testing things on the pc in super hires 24bit is much better then doing it on the amiga.
On yours probably. But on a good monitor, amiga colors are much brighter than on a pc. Anyway the code is intended to work on an amiga, not on a pc.
(Btw why do you always write "then" instead of "than" ?)

Quote:
Originally Posted by Thorham View Post
Yes, and I can't wait. It shouldn't be to difficult to use an extra read, though.
I'm using 9 values in my code, where you're using 4, so it could be a little bit more than an extra read.
But, please tell me : where does your algorithm come from ?

Quote:
Originally Posted by Thorham View Post
Thank you Anyway, someone I know has a broken monitor, so I can use his cable! Tomorrow I'll have it fixed
A broken cable with a working monitor, and a working cable with a broken monitor... so you'll end up with a broken cable and a broken monitor
meynaf is offline  
Old 31 January 2008, 11:01   #79
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,770
Quote:
Originally Posted by meynaf
So you have some sort of a scan doubler in your video/tv box ?
Unfortunately not. But it does deinterlace, and also, it 'uprades' everything to at least 60 herz. Plus, even in that mode, super smooth bugs It's a crappy b-grade product
Quote:
Originally Posted by meynaf
After the first read ? You won't gain anything by doing so. It's the writes that can be pipelined.
(Well, in chipmem things are a little bit more complex, but here we're accessing fastmem only.)
Ok, very odd though. Perhaps I should read the original Motorola docs
Quote:
Originally Posted by meynaf
On yours probably. But on a good monitor, amiga colors are much brighter than on a pc. Anyway the code is intended to work on an amiga, not on a pc.
(Btw why do you always write "then" instead of "than" ?)
Really? I haven't noticed! About the 'than' thing: It's probably just a stupid typo. Of course in the Netherlands everyone speaks Dutch, so I never have to use English for anything here. Having been raised with English just makes it easy to understand, it doesn't make you flawless at using it. There are cases where it's 'then' and when it's 'than', I just don't know exactly when to use them
Quote:
Originally Posted by meynaf
I'm using 9 values in my code, where you're using 4, so it could be a little bit more than an extra read.
But, please tell me : where does your algorithm come from ?
True. The algorithm is quite different. I've come up with it myself. And while it's better than bilinear, now that I can see clearly again, I know now that it is not as good as triangular. Damn, what a shame
Quote:
Originally Posted by meynaf
A broken cable with a working monitor, and a working cable with a broken monitor... so you'll end up with a broken cable and a broken monitor
No, it works again, now. It was pretty easy, too. Nothing more than 20 minutes of work, haha
Thorham is offline  
Old 31 January 2008, 14:13   #80
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Unfortunately not. But it does deinterlace, and also, it 'uprades' everything to at least 60 herz. Plus, even in that mode, super smooth bugs It's a crappy b-grade product
I wondered if it couldn't be the source of your curious machine behaviors. No driver needed on the amiga side ?

Quote:
Originally Posted by Thorham View Post
Ok, very odd though. Perhaps I should read the original Motorola docs
It can be understood like this : when you write a value to memory, you don't need it to be actually written before going on. On the other hand, how can the program continue without knowing what we have read ?

Quote:
Originally Posted by Thorham View Post
Really? I haven't noticed! About the 'than' thing: It's probably just a stupid typo. Of course in the Netherlands everyone speaks Dutch, so I never have to use English for anything here. Having been raised with English just makes it easy to understand, it doesn't make you flawless at using it. There are cases where it's 'then' and when it's 'than', I just don't know exactly when to use them
So "than" and "then" are the same in Dutch, am I right ?

Quote:
Originally Posted by Thorham View Post
True. The algorithm is quite different. I've come up with it myself. And while it's better than bilinear, now that I can see clearly again, I know now that it is not as good as triangular. Damn, what a shame
How unfortunate. Now what's left for you to do is to write a faster triangular one
Note that if you can't beat mine (and you won't, heheh ) there is still the 2:1 version to check (also triangular interpolation but writes 2 horizontal pixels and 1 vertical). A more common case than I first expected.

Quote:
Originally Posted by Thorham View Post
No, it works again, now. It was pretty easy, too. Nothing more than 20 minutes of work, haha
Resurrected ! It's miraculous. You're a wizard, man
meynaf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
JPEG to IFF Coverter W4r3DeV1L request.Apps 15 14 February 2020 17:21
Overzealous Kickstart ROM - address decoding? robinsonb5 Hardware mods 3 30 June 2013 11:09
JPEG to PNG (via CLI) amiga_user support.Apps 3 28 November 2011 11:50
Decoding algorithm(s) for encoded disk sectors (ADOS) andreas Coders. General 10 02 November 2009 22:18
Blitter MFM decoding Photon Coders. General 14 16 March 2006 11:24

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:54.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11189 seconds with 13 queries