Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
|
Quote:
Originally Posted by meynaf
Damn. It will be harder for me next time then.
|
Well, it might be. I've redone the interpolation code. The method is tested in basic, and it seems to be as good as it's supposed to be, except for the fact that I'm testing on an old monitor. I had an accident with my LG Studioworks, and now it's cable is broken. Until I can sort that out, I can't test properly. I can make some test images if you want, though. Here's the new code:
Code:
Filter
move.l In,a0
sub.l #Width,a0
move.l In,a1
move.l In,a2
add.l #Width,a2
move.l Out,a3
move.l Out,a4
add.l #Width*2,a4
move.l #Width/2-1,d6
moveq #0,d0
moveq #0,d1
moveq #0,d2
moveq #0,d3
moveq #0,d4
moveq #0,d5
.lpen ;Entry code (unoptimized)
move.b (a0)+,d0
move.b (a0)+,d1
move.b (a1)+,d2
move.b (a1)+,d3
move.l d0,d7
lsl.l #3,d7
add.l d3,d7
add.l d3,d7
add.l d1,d7
add.l d1,d7
add.l d1,d7
add.l d2,d7
add.l d2,d7
add.l d2,d7
lsr.l #4,d7
move.b d7,(a3)+ ;Write top-left
move.l d0,d7
add.l d0,d7
add.l d0,d7
add.l d1,d7
add.l d1,d7
add.l d1,d7
add.l d2,d7
add.l d3,d7
lsr.l #3,d7
move.b d7,(a3)+ ;Write top-right
move.l d0,d7
add.l d0,d7
add.l d0,d7
add.l d1,d7
add.l d2,d7
add.l d2,d7
add.l d2,d7
add.l d3,d7
lsr.l #3,d7
move.b d7,(a4)+ ;Write bottom-left
move.l d0,d7
add.l d1,d7
add.l d2,d7
add.l d3,d7
lsr.l #2,d7
move.b d7,(a4)+ ;Write bottom-right
.lp ;Rest of row. Here d1 and d2 contain old values
move.b (a0)+,d0
move.b (a1)+,d2
move.l d1,d7 ;x8 x3 x3 x2
lsl.l #3,d7
add.l d2,d7
add.l d2,d7
move.l d0,a5
add.l a5,a5
add.l d0,a5
add.l a5,d7
move.l d3,d4
add.l d4,d4
add.l d3,d4
add.l d4,d7
lsr.l #4,d7
move.b d7,(a3)+ ;Write top-left
move.l d1,d5 ;x3 x3 x1 x1
add.l d5,d5
add.l d1,d5
move.l d5,d7
add.l a5,d7
add.l d3,d7
add.l d2,d7
lsr.l #3,d7
move.b d7,(a3)+ ;Write top-right
add.l d0,d5 ;x3 x1 x3 x1
add.l d4,d5
add.l d2,d5
lsr.l #3,d5
move.b d5,(a4)+ ;Write bottom-left
move.l d1,d7 ;x1 x1 x1 x1
add.l d0,d7
add.l d3,d7
add.l d2,d7
lsr.l #2,d7
move.b d7,(a4)+ ;Write bottom-right
;Next four pixels. Here d0 and d2 contain old values.
move.b (a0)+,d1
move.b (a1)+,d3
move.l d0,d7 ;x8 x3 x3 x2
lsl.l #3,d7
add.l d3,d7
add.l d3,d7
move.l d1,a5
add.l d5,a5
add.l d1,a5
add.l a5,d7
move.l d2,d4
add.l d4,d4
add.l d2,d4
add.l d4,d7
lsr.l #4,d7
move.b d7,(a3)+ ;Write top-left
move.l d0,d5 ;x3 x3 x1 x1
add.l d5,d5
add.l d0,d5
move.l d5,d7
add.l a5,d7
add.l d2,d7
add.l d3,d7
lsr.l #3,d7
move.b d7,(a3)+ ;Write top-right
add.l d1,d5 ;x3 x1 x3 x1
add.l d4,d5
add.l d3,d5
lsr.l #3,d5
move.b d5,(a4)+ ;Write bottom-left
move.l d0,d7 ;x1 x1 x1 x1
add.l d1,d7
add.l d2,d7
add.l d3,d7
lsr.l #2,d7
move.b d7,(a4)+ ;Write bottom-right
dbf d6,.lp
;Here some exit code for the last pixels in the row is needed.
Notice how the loop does eight pixels in one go now, for only four reads! It does eight pixels so some old values can be used easily, and it still fits in the cache easily.
Furthermore the inner loop is somewhat optimized, while the entry code is not, although it can be optimized in the same way as I did for the rest of the code. If this is as good as it's supposed to be (which I can't tell now) then try to beat it
|