View Single Post
Old 31 October 2016, 19:34   #11
buggs
Registered User

 
Join Date: May 2016
Location: Rostock/Germany
Posts: 44
Well Meynaf, you'd like to see some code? Here you go. Core loop inhorizontal interpolation as an example. Hope, the post ain't too long.

Original (core loop over two pixels, without proper rounding):
Code:
.y_xloop            move.b  (a1)+,d2        ;d2: --- --- ---  1
                        add.l   d2,d1           ;d1: --- --- --- 0+1
                        lsr.l   #1,d1           ;d1: --- --- --- 0\1
                        move.b  d1,(a2)+
                        move.b  (a1)+,d1        ;d1: --- --- ---  2
                        add.l   d1,d2           ;d2: --- --- --- 1+2
                        lsr.l   #1,d2           ;d2: --- --- --- 1\2
                        move.b  d2,(a2)+
                        dbf     d6,.y_xloop
Mine (core loop as poor man's SIMD over 8 pixels, with rounding):
Code:
                        move.l  (a1),d1 ; P00 P01 P02 P03
                         move.l 1(a1),d2        ; P01 P02 P03 P04
                         move.l d1,d3
                        or.l    d2,d3           ; P00|P01 P01|P02 P02|P03 P03|P04 -> meaning: we need to add "1" whenever any of the operands has it's LSB set
                        and.l   d6,d1           ; upper 7 bits P00 P01 P02 P03
                         and.l  d6,d2           ; upper 7 bits P01 P02 P03 P04
                         lsr.l  #1,d1           ;
                        lsr.l   #1,d2           ;
                        and.l   d0,d3           ; keep the 1
                         add.l  d1,d2           ; P00+P01 .. .. ..
                         move.l 4(a1),d1        ; P04 P05 P06 P07
                        add.l   d3,d2           ; (P00+P01+1)>>1 .. .. ..
                        move.l  5(a1),d7        ; P05 P06 P07 P08
                         move.l d2,(a2)+
                         move.l d1,d3
                        or.l    d7,d3           ; P00|P01 P01|P02 P02|P03 P03|P04 -> meaning: we need to add "1" whenever any of the operands has it's LSB set
                        and.l   d6,d1           ; upper 7 bits P00 P01 P02 P03
                         and.l  d6,d7           ; upper 7 bits P01 P02 P03 P04
                         lsr.l  #1,d1           ;
                        lsr.l   #1,d7           ;
                        and.l   d0,d3           ; keep the 1
                         add.l  d1,d7           ; P00+P01 .. .. ..
                        add.l   d3,d7           ; (P00+P01+1)>>1 .. .. ..
                         move.l d7,(a2)+
My AMMX variant, also 8 Pixels with proper rounding:
Code:
        LOADAB   1,0                 ; LOAD    (A1),B0
        PAVGd16ABB 1,1,0,1       ; PAVG.B 1(A1),B0,B1
        STOREApB 2,1                ; STORE    B1,(A2)+
You might dislike data parallelism out of whatever reasons. I respect that. But sometimes, it just comes in quite handy.

Last edited by buggs; 31 October 2016 at 19:44. Reason: restored the "+" in the last code statement, got lost in c+p
buggs is offline  
 
Page generated in 0.04847 seconds with 9 queries