View Single Post
Old 30 January 2008, 16:32   #73
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
Quote:
Originally Posted by meynaf
Damn. It will be harder for me next time then.
Well, it might be. I've redone the interpolation code. The method is tested in basic, and it seems to be as good as it's supposed to be, except for the fact that I'm testing on an old monitor. I had an accident with my LG Studioworks, and now it's cable is broken. Until I can sort that out, I can't test properly. I can make some test images if you want, though. Here's the new code:
Code:
Filter
    move.l    In,a0
    sub.l    #Width,a0
    move.l    In,a1
    move.l    In,a2
    add.l    #Width,a2
    move.l    Out,a3
    move.l    Out,a4
    add.l    #Width*2,a4

    move.l    #Width/2-1,d6
    
    moveq    #0,d0
    moveq    #0,d1
    moveq    #0,d2
    moveq    #0,d3
    moveq    #0,d4
    moveq    #0,d5
.lpen                ;Entry code (unoptimized)
    move.b    (a0)+,d0
    move.b    (a0)+,d1
    move.b    (a1)+,d2
    move.b    (a1)+,d3

    move.l    d0,d7
    lsl.l    #3,d7
    add.l    d3,d7
    add.l    d3,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d2,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left

    move.l    d0,d7
    add.l    d0,d7
    add.l    d0,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    
    move.l    d0,d7
    add.l    d0,d7
    add.l    d0,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a4)+    ;Write bottom-left

    move.l    d0,d7
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right

.lp                ;Rest of row. Here d1 and d2 contain old values
    move.b    (a0)+,d0
    move.b    (a1)+,d2
    move.l    d1,d7        ;x8 x3 x3 x2
    lsl.l    #3,d7
    add.l    d2,d7
    add.l    d2,d7
    move.l    d0,a5
    add.l    a5,a5
    add.l    d0,a5
    add.l    a5,d7
    move.l    d3,d4
    add.l    d4,d4
    add.l    d3,d4
    add.l    d4,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left
    move.l    d1,d5        ;x3 x3 x1 x1
    add.l    d5,d5
    add.l    d1,d5
    move.l    d5,d7
    add.l    a5,d7
    add.l    d3,d7
    add.l    d2,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    add.l    d0,d5        ;x3 x1 x3 x1
    add.l    d4,d5
    add.l    d2,d5
    lsr.l    #3,d5
    move.b    d5,(a4)+    ;Write bottom-left
    move.l    d1,d7        ;x1 x1 x1 x1
    add.l    d0,d7
    add.l    d3,d7
    add.l    d2,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right

;Next four pixels. Here d0 and d2 contain old values.

    move.b    (a0)+,d1
    move.b    (a1)+,d3
    move.l    d0,d7        ;x8 x3 x3 x2
    lsl.l    #3,d7
    add.l    d3,d7
    add.l    d3,d7
    move.l    d1,a5
    add.l    d5,a5
    add.l    d1,a5
    add.l    a5,d7
    move.l    d2,d4
    add.l    d4,d4
    add.l    d2,d4
    add.l    d4,d7
    lsr.l    #4,d7
    move.b    d7,(a3)+    ;Write top-left
    move.l    d0,d5        ;x3 x3 x1 x1
    add.l    d5,d5
    add.l    d0,d5
    move.l    d5,d7
    add.l    a5,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #3,d7
    move.b    d7,(a3)+    ;Write top-right
    add.l    d1,d5        ;x3 x1 x3 x1
    add.l    d4,d5
    add.l    d3,d5
    lsr.l    #3,d5
    move.b    d5,(a4)+    ;Write bottom-left
    move.l    d0,d7        ;x1 x1 x1 x1
    add.l    d1,d7
    add.l    d2,d7
    add.l    d3,d7
    lsr.l    #2,d7
    move.b    d7,(a4)+    ;Write bottom-right
    dbf    d6,.lp

;Here some exit code for the last pixels in the row is needed.
Notice how the loop does eight pixels in one go now, for only four reads! It does eight pixels so some old values can be used easily, and it still fits in the cache easily.

Furthermore the inner loop is somewhat optimized, while the entry code is not, although it can be optimized in the same way as I did for the rest of the code. If this is as good as it's supposed to be (which I can't tell now) then try to beat it
Thorham is offline  
 
Page generated in 0.05847 seconds with 10 queries