View Single Post
Old 02 June 2018, 23:07   #12
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Thorham View Post
I somehow doubt that that's going to be faster than doing it in code. The 4x indexed addressing mode alone seems slower than the code I posted.
Hi Thorham, your routine is not working.

This is a right version:
(I have not thought that much if it can be optimized)
Code:
    move.l  d0,d1
    move.l  #$55555555,d2
    lsr.l   #1,d0
    add.l   d1,d1
    and.l   d2,d0
    add.l   d2,d2
    and.l   d2,d1
    or.l    d1,d0
    
    move.l  d0,d1
    move.l  #$33333333,d2
    lsr.l   #2,d0
    lsl.l   #2,d1
    and.l   d2,d0
    lsl.l   #2,d2
    and.l   d2,d1
    or.l    d1,d0

    move.l  d0,d1
    move.l  #$0f0f0f0f,d2
    lsr.l   #4,d0
    lsl.l   #4,d1
    and.l   d2,d0
    lsl.l   #4,d2
    and.l   d2,d1
    or.l    d1,d0
    
    rol.w   #8,d0
    swap    d0
    rol.w   #8,d0
I've serious doubts that it may be faster than a LUT version, especially if designed for a CD32 (a chipmem only 020).

ross is offline  
 
Page generated in 0.04379 seconds with 11 queries