Well Meynaf, you'd like to see some code? Here you go. Core loop inhorizontal interpolation as an example. Hope, the post ain't too long.

Original (core loop over two pixels, without proper rounding):

Code:

.y_xloop move.b (a1)+,d2 ;d2: --- --- --- 1
add.l d2,d1 ;d1: --- --- --- 0+1
lsr.l #1,d1 ;d1: --- --- --- 0\1
move.b d1,(a2)+
move.b (a1)+,d1 ;d1: --- --- --- 2
add.l d1,d2 ;d2: --- --- --- 1+2
lsr.l #1,d2 ;d2: --- --- --- 1\2
move.b d2,(a2)+
dbf d6,.y_xloop

Mine (core loop as poor man's SIMD over 8 pixels, with rounding):

Code:

move.l (a1),d1 ; P00 P01 P02 P03
move.l 1(a1),d2 ; P01 P02 P03 P04
move.l d1,d3
or.l d2,d3 ; P00|P01 P01|P02 P02|P03 P03|P04 -> meaning: we need to add "1" whenever any of the operands has it's LSB set
and.l d6,d1 ; upper 7 bits P00 P01 P02 P03
and.l d6,d2 ; upper 7 bits P01 P02 P03 P04
lsr.l #1,d1 ;
lsr.l #1,d2 ;
and.l d0,d3 ; keep the 1
add.l d1,d2 ; P00+P01 .. .. ..
move.l 4(a1),d1 ; P04 P05 P06 P07
add.l d3,d2 ; (P00+P01+1)>>1 .. .. ..
move.l 5(a1),d7 ; P05 P06 P07 P08
move.l d2,(a2)+
move.l d1,d3
or.l d7,d3 ; P00|P01 P01|P02 P02|P03 P03|P04 -> meaning: we need to add "1" whenever any of the operands has it's LSB set
and.l d6,d1 ; upper 7 bits P00 P01 P02 P03
and.l d6,d7 ; upper 7 bits P01 P02 P03 P04
lsr.l #1,d1 ;
lsr.l #1,d7 ;
and.l d0,d3 ; keep the 1
add.l d1,d7 ; P00+P01 .. .. ..
add.l d3,d7 ; (P00+P01+1)>>1 .. .. ..
move.l d7,(a2)+

My AMMX variant, also 8 Pixels with proper rounding:

Code:

LOADAB 1,0 ; LOAD (A1),B0
PAVGd16ABB 1,1,0,1 ; PAVG.B 1(A1),B0,B1
STOREApB 2,1 ; STORE B1,(A2)+

You might dislike data parallelism out of whatever reasons. I respect that. But sometimes, it just comes in quite handy.