English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)
-   -   Stretch bit in word into longword (https://eab.abime.net/showthread.php?t=99966)

KONEY 08 December 2019 20:38

Stretch bit in word into longword
 
Here I am again stuck in very silly problems.

I'm trying to write a piece of code to transform a word into a longword by "stretching" the bits.

For example:

%1010110011010101

should become:

%11001100111100001111001100110011

so every value repeated twice.

But I have no idea how to proceed. Actually it's an exercise to learn manipulate data and I really need some help here.

I tried by copying the data, then LSR by 1 bit and copy it again with a XOR but all I got was a big mess :banghead

mcgeezer 08 December 2019 21:09

Probably an easier way to do it... but a five minute shot.

Code:

        moveq        #0,d0
        moveq        #0,d1
        moveq        #0,d2
        move.w        #%1010110011010101,d0
        moveq        #15,d7
.loop:        btst        d7,d0
        beq.s        .next
        move.w        d7,d2
        add.w        d2,d2
        bset        d2,d1
        addq.w        #1,d2
        bset        d2,d1
.next:        dbf        d7,.loop

Your result will be in d1

mcgeezer

ross 08 December 2019 23:50

This is faster and the number of processor cycles used are a constant:
Code:

    move.w  #%1010110011010101,d0
    move.w  d0,d1
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    add.w  d0,d0
    addx.l  d2,d2
    add.w  d1,d1
    addx.l  d2,d2
    rts

Result in D2.
Yes, is ugly :)

Probably with a LUT can be made much faster.

KONEY 09 December 2019 00:14

makes sense and actually works, thanks!

KONEY 09 December 2019 00:33

Quote:

Originally Posted by ross (Post 1364170)
This is faster and the number of processor cycles used are a constant:
Result in D2.
Yes, is ugly :)
Probably with a LUT can be made much faster.

LOL yes quite ugly but still a trick to learn, thanks :great

ross 09 December 2019 00:38

A simple trick to make it much faster on bare 68k:

Code:

    move.w  #%1010110011010101,d0
    move.w  d0,d1
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    swap    d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    add.w  d0,d0
    addx.w  d2,d2
    add.w  d1,d1
    addx.w  d2,d2
    rts

mcgeezer routine: 678 cycles*
my previous one: 396 cycles
this: 272 cycles**

*of course only for this number of 1 bits
**occhio allo swap ;)

ross 09 December 2019 01:14

LUT version: 82 cycles :D

Code:

    lea lut(pc),a0
    moveq  #0,d0
.md move.w  d0,d4
    moveq  #7,d3
    move.w  d4,d1
.ml add.b  d4,d4
    addx.w  d2,d2
    add.b  d1,d1
    addx.w  d2,d2
    dbf d3,.ml
    move.w  d2,(a0)+
    addq.b  #1,d0
    bne.b  .md
   
    move.w  #%1010110011010101,d0
    moveq  #0,d1
    move.b  d0,d1
    add.w  d1,d1
    move.w  lut(pc,d1.l),d2
    lsr.w  #8,d0
    swap    d2
    add.w  d0,d0
    move.w  lut(pc,d0.w),d2
    swap    d2
    rts
   
lut ds.w    256


malko 09 December 2019 01:30

Quote:

Originally Posted by ross (Post 1364186)
LUT version: 82 cycles :D [...]

You're the best :great
:)

a/b 09 December 2019 03:34

Slightly lower cycle count (-4):
Code:

...
        move.w        #%1010110011010101,d0

        moveq        #0,d1
        move.b        d0,d1
        lsr.w        #8,d0
        add.w        d0,d0
        move.l        lut(pc,d0.w),d2
        add.w        d1,d1
        move.w        lut(pc,d1.w),d2

        rts
   
lut ds.w    256+1        ; extra word


ross 09 December 2019 09:48

Quote:

Originally Posted by a/b (Post 1364194)
Slightly lower cycle count (-4)

:great

Another little gain (-2):
Code:

        move.w        #%1010110011010101,d0

        moveq        #0,d1
        move.w  d0,-(sp)
        move.b  (sp)+,d1
        add.w        d1,d1
        move.l        lut(pc,d1.l),d2
        moveq        #0,d1
        move.b        d0,d1
        add.w        d1,d1
        move.w        lut(pc,d1.l),d2

lut ds.w    256+1        ; extra word

:)

a/b 09 December 2019 11:07

Getting too old for this :(, should've noticed earlier. 2 cycles faster, so the same as yours, but without extra mem accesses.
Code:

...
        move.w        #%1010110011010101,d0

        moveq        #0,d1
        move.b        d0,d1
;        lsr.w        #8,d0
;        add.w        d0,d0
 clr.b        d0
 lsr.w        #7,d0
        move.l        lut(pc,d0.w),d2
        add.w        d1,d1
        move.w        lut(pc,d1.w),d2

        rts
   
lut ds.w    256+1        ; extra word


ross 09 December 2019 11:11

Quote:

Originally Posted by a/b (Post 1364217)
Getting too old for this :(, should've noticed earlier. 2 cycles faster, so the same as yours, but without extra mem accesses.

:cool

Damn, that's why last night I dreamed of a lsr #7, but this morning I forgot it! :p

KONEY 09 December 2019 11:12

Anyone shorter? :)

ross 09 December 2019 11:22

Quote:

Originally Posted by KONEY (Post 1364219)
Anyone shorter? :)

Be satisfied with that, without considering the initial setup in d0 and lut calc these are 68 cycles, I would say not bad at all ;)

meynaf 09 December 2019 12:30

Quote:

Originally Posted by KONEY (Post 1364219)
Anyone shorter? :)

Absolute shortest (in some way :D) :
Code:

move.w #%1010110011010101,d0

 move.l (lut+32768*4,pc,d0.w*4),d0

lut ds.l 65536

Obviously needs 68020+.

ross 09 December 2019 13:04

Quote:

Originally Posted by meynaf (Post 1364227)
Absolute shortest (in some way :D) :

:cheese
Thinks that a 512byte lut already seemed big to me...
It would be interesting to calculate how many cycles it takes to fill the lut ;)

meynaf 09 December 2019 13:50

Quote:

Originally Posted by ross (Post 1364231)
It would be interesting to calculate how many cycles it takes to fill the lut ;)

Indeed. :)
This is a rather typical example of code where we need to know what the program does and how badly it needs to be fast...

grond 09 December 2019 14:13

The non-LUT code can be shortened and sped up quite a bit:


Code:

   
    move.w  #%1010110011010101,d0
    moveq    #0,d1

    add.w  d0,d0
    addx.w d1,d1
REPT 7
    add.w  d1,d1
    add.w  d0,d0
    addx.w d1,d1
ENDR

    swap    d1

    add.w  d0,d0
    addx.w d1,d1
REPT 7
    add.w  d1,d1
    add.w  d0,d0
    addx.w d1,d1
ENDR

    move.l  d1,d0
    add.l    d1,d1
    or.l      d1,d0
    rts

No idea about the actual cycle count.

meynaf 09 December 2019 14:51

Another possibility is ye olde c2p merge trick (unverified, but it gives the idea) :
Code:

move.w d0,d1
 ror.w #1,d1
 move.w d0,d2
 eor.w d1,d2
 andi.w #$5555,d2
 eor.w d2,d0
 eor.w d2,d1
 ror.w #1,d1
 move.w d0,d2
 eor.w d1,d2
 andi.w #$3333,d2
 eor.w d2,d0
 eor.w d2,d1
 ror.w #2,d1
 move.w d0,d2
 eor.w d1,d2
 andi.w #$0f0f,d2
 eor.w d2,d0
 eor.w d2,d1
 ror.w #4,d1
 move.b d0,d2
 move.b d1,d0
 move.b d2,d1
 rol.w #8,d1
 swap d0
 move.w d1,d0

Use of rol/ror is probably killing the timing on 68000, though...


All times are GMT +2. The time now is 14:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.04887 seconds with 11 queries