English Amiga Board - Stretch bit in word into longword

English Amiga Board (https://eab.abime.net/index.php)

- Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)

- - Stretch bit in word into longword (https://eab.abime.net/showthread.php?t=99966)

KONEY

08 December 2019 20:38

Stretch bit in word into longword

Here I am again stuck in very silly problems.

I'm trying to write a piece of code to transform a word into a longword by "stretching" the bits.

For example:

%1010110011010101

should become:

%11001100111100001111001100110011

so every value repeated twice.

But I have no idea how to proceed. Actually it's an exercise to learn manipulate data and I really need some help here.

I tried by copying the data, then LSR by 1 bit and copy it again with a XOR but all I got was a big mess :banghead

mcgeezer

08 December 2019 21:09

Probably an easier way to do it... but a five minute shot.

Code:

        moveq        #0,d0

        moveq        #0,d1

        moveq        #0,d2

        move.w        #%1010110011010101,d0

        moveq        #15,d7

.loop:        btst        d7,d0

        beq.s        .next

        move.w        d7,d2

        add.w        d2,d2

        bset        d2,d1

        addq.w        #1,d2

        bset        d2,d1

.next:        dbf        d7,.loop

Your result will be in d1

mcgeezer

ross	08 December 2019 23:50

This is faster and the number of processor cycles used are a constant:

Code:

    move.w  #%1010110011010101,d0

    move.w  d0,d1

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    add.w   d0,d0

    addx.l  d2,d2

    add.w   d1,d1

    addx.l  d2,d2

    rts

Result in D2.
Yes, is ugly :)

Probably with a LUT can be made much faster.

KONEY

09 December 2019 00:14

makes sense and actually works, thanks!

KONEY

09 December 2019 00:33

Quote:

Originally Posted by ross (Post 1364170)

This is faster and the number of processor cycles used are a constant:
Result in D2.
Yes, is ugly :)
Probably with a LUT can be made much faster.

LOL yes quite ugly but still a trick to learn, thanks :great

ross	09 December 2019 00:38

A simple trick to make it much faster on bare 68k:

Code:

    move.w  #%1010110011010101,d0

    move.w  d0,d1

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    swap    d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    add.w   d0,d0

    addx.w  d2,d2

    add.w   d1,d1

    addx.w  d2,d2

    rts

mcgeezer routine: 678 cycles*
my previous one: 396 cycles
this: 272 cycles**

*of course only for this number of 1 bits
**occhio allo swap ;)

ross	09 December 2019 01:14

LUT version: 82 cycles :D

Code:

    lea lut(pc),a0

    moveq   #0,d0

.md move.w  d0,d4

    moveq   #7,d3

    move.w  d4,d1

.ml add.b   d4,d4

    addx.w  d2,d2

    add.b   d1,d1

    addx.w  d2,d2

    dbf d3,.ml

    move.w  d2,(a0)+

    addq.b  #1,d0

    bne.b   .md

    

    move.w  #%1010110011010101,d0

    moveq   #0,d1

    move.b  d0,d1

    add.w   d1,d1

    move.w  lut(pc,d1.l),d2

    lsr.w   #8,d0

    swap    d2

    add.w   d0,d0

    move.w  lut(pc,d0.w),d2

    swap    d2

    rts

    

lut ds.w    256

malko

09 December 2019 01:30

Quote:

Originally Posted by ross (Post 1364186)

LUT version: 82 cycles :D [...]

You're the best :great
:)

a/b	09 December 2019 03:34

Slightly lower cycle count (-4):

Code:

...

        move.w        #%1010110011010101,d0



        moveq        #0,d1

        move.b        d0,d1

        lsr.w        #8,d0

        add.w        d0,d0

        move.l        lut(pc,d0.w),d2

        add.w        d1,d1

        move.w        lut(pc,d1.w),d2



        rts

    

lut ds.w    256+1        ; extra word

ross	09 December 2019 09:48

Quote:

Originally Posted by a/b (Post 1364194)

Slightly lower cycle count (-4)

:great

Another little gain (-2):

Code:

        move.w        #%1010110011010101,d0



        moveq        #0,d1

        move.w  d0,-(sp)

        move.b  (sp)+,d1

        add.w        d1,d1

        move.l        lut(pc,d1.l),d2

        moveq        #0,d1

        move.b        d0,d1

        add.w        d1,d1

        move.w        lut(pc,d1.l),d2



lut ds.w    256+1        ; extra word

a/b	09 December 2019 11:07

Getting too old for this :(, should've noticed earlier. 2 cycles faster, so the same as yours, but without extra mem accesses.

Code:

...

        move.w        #%1010110011010101,d0



        moveq        #0,d1

        move.b        d0,d1

;        lsr.w        #8,d0

;        add.w        d0,d0

 clr.b        d0

 lsr.w        #7,d0

        move.l        lut(pc,d0.w),d2

        add.w        d1,d1

        move.w        lut(pc,d1.w),d2



        rts

    

lut ds.w    256+1        ; extra word

ross	09 December 2019 11:11

Quote:

Originally Posted by a/b (Post 1364217)

Getting too old for this :(, should've noticed earlier. 2 cycles faster, so the same as yours, but without extra mem accesses.

:cool

Damn, that's why last night I dreamed of a lsr #7, but this morning I forgot it! :p

KONEY

09 December 2019 11:12

Anyone shorter? :)

ross	09 December 2019 11:22

Quote:

Originally Posted by KONEY (Post 1364219)

Anyone shorter? :)

Be satisfied with that, without considering the initial setup in d0 and lut calc these are 68 cycles, I would say not bad at all ;)

meynaf

09 December 2019 12:30

Quote:

Originally Posted by KONEY (Post 1364219)

Anyone shorter? :)

Absolute shortest (in some way :D) :

Code:

 move.w #%1010110011010101,d0



 move.l (lut+32768*4,pc,d0.w*4),d0



lut ds.l 65536

Obviously needs 68020+.

ross	09 December 2019 13:04

Quote:

Originally Posted by meynaf (Post 1364227)

Absolute shortest (in some way :D) :

:cheese
Thinks that a 512byte lut already seemed big to me...
It would be interesting to calculate how many cycles it takes to fill the lut ;)

meynaf

09 December 2019 13:50

Quote:

Originally Posted by ross (Post 1364231)

It would be interesting to calculate how many cycles it takes to fill the lut ;)

Indeed. :)
This is a rather typical example of code where we need to know what the program does and how badly it needs to be fast...

grond

09 December 2019 14:13

The non-LUT code can be shortened and sped up quite a bit:

Code:

    

    move.w  #%1010110011010101,d0

    moveq    #0,d1



    add.w   d0,d0

    addx.w d1,d1

REPT 7

    add.w   d1,d1

    add.w   d0,d0

    addx.w d1,d1

ENDR



    swap    d1



    add.w   d0,d0

    addx.w d1,d1

REPT 7

    add.w   d1,d1

    add.w   d0,d0

    addx.w d1,d1

ENDR



    move.l  d1,d0

    add.l    d1,d1

    or.l       d1,d0

    rts

No idea about the actual cycle count.

meynaf

09 December 2019 14:51

Another possibility is ye olde c2p merge trick (unverified, but it gives the idea) :

Code:

 move.w d0,d1

 ror.w #1,d1

 move.w d0,d2

 eor.w d1,d2

 andi.w #$5555,d2

 eor.w d2,d0

 eor.w d2,d1

 ror.w #1,d1

 move.w d0,d2

 eor.w d1,d2

 andi.w #$3333,d2

 eor.w d2,d0

 eor.w d2,d1

 ror.w #2,d1

 move.w d0,d2

 eor.w d1,d2

 andi.w #$0f0f,d2

 eor.w d2,d0

 eor.w d2,d1

 ror.w #4,d1

 move.b d0,d2

 move.b d1,d0

 move.b d2,d1

 rol.w #8,d1

 swap d0

 move.w d1,d0

Use of rol/ror is probably killing the timing on 68000, though...

All times are GMT +2. The time now is 14:04.

Page generated in 0.04887 seconds with 11 queries