Quote:
Originally Posted by Thorham
I somehow doubt that that's going to be faster than doing it in code. The 4x indexed addressing mode alone seems slower than the code I posted.
|
Hi Thorham, your routine is not working.
This is a right version:
(I have not thought that much if it can be optimized)
Code:
move.l d0,d1
move.l #$55555555,d2
lsr.l #1,d0
add.l d1,d1
and.l d2,d0
add.l d2,d2
and.l d2,d1
or.l d1,d0
move.l d0,d1
move.l #$33333333,d2
lsr.l #2,d0
lsl.l #2,d1
and.l d2,d0
lsl.l #2,d2
and.l d2,d1
or.l d1,d0
move.l d0,d1
move.l #$0f0f0f0f,d2
lsr.l #4,d0
lsl.l #4,d1
and.l d2,d0
lsl.l #4,d2
and.l d2,d1
or.l d1,d0
rol.w #8,d0
swap d0
rol.w #8,d0
I've serious doubts that it may be faster than a LUT version, especially if designed for a CD32 (a chipmem only 020).