Getting too old for this
, should've noticed earlier. 2 cycles faster, so the same as yours, but without extra mem accesses.
Code:
...
move.w #%1010110011010101,d0
moveq #0,d1
move.b d0,d1
; lsr.w #8,d0
; add.w d0,d0
clr.b d0
lsr.w #7,d0
move.l lut(pc,d0.w),d2
add.w d1,d1
move.w lut(pc,d1.w),d2
rts
lut ds.w 256+1 ; extra word