English Amiga Board - View Single Post

ross · 02 June 2018, 23:07

Quote:

Originally Posted by Thorham

I somehow doubt that that's going to be faster than doing it in code. The 4x indexed addressing mode alone seems slower than the code I posted.

Hi Thorham, your routine is not working.

This is a right version:
(I have not thought that much if it can be optimized)

Code:

    move.l  d0,d1
    move.l  #$55555555,d2
    lsr.l   #1,d0
    add.l   d1,d1
    and.l   d2,d0
    add.l   d2,d2
    and.l   d2,d1
    or.l    d1,d0
    
    move.l  d0,d1
    move.l  #$33333333,d2
    lsr.l   #2,d0
    lsl.l   #2,d1
    and.l   d2,d0
    lsl.l   #2,d2
    and.l   d2,d1
    or.l    d1,d0

    move.l  d0,d1
    move.l  #$0f0f0f0f,d2
    lsr.l   #4,d0
    lsl.l   #4,d1
    and.l   d2,d0
    lsl.l   #4,d2
    and.l   d2,d1
    or.l    d1,d0
    
    rol.w   #8,d0
    swap    d0
    rol.w   #8,d0

I've serious doubts that it may be faster than a LUT version, especially if designed for a CD32 (a chipmem only 020).