English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)
-   -   Fast way of getting absolute value (https://eab.abime.net/showthread.php?t=107165)

buzzybee 31 May 2021 01:02

Fast way of getting absolute value
 
Hi guys!

Wonder what you guys think of my code. Its purpose is to convert the value in register d7 to its absolute value:

Code:

        swap d7                ; fetch x-acceleration
        tst.w d7        ; test if negative
        smi d4                ; Yes? Set d4
        ext.w d4        ; create polarity converter
        eor d4,d7        ; convert polarity (if x-acc is negative)

d7 is loaded with one longword-operation prior to this code, and contains x- (upper word) and y-acceleration (lower word) of a given sprite object. Code works. But this code is repeatedly executed a few times during one frame [in Reshoot Proxima 3], and I have a feeling that this can be optimized for speed. Any thoughts?

Antiriad_UK 31 May 2021 01:04

I've got a macro I stole from Kalm's on another forum:

Code:

;(Kalms explained the method itself in the other thread.) Now, to use that you'd simply
;enter the macro like any other instruction, specifying a source register, a destination
; register and an available scratch register like so:
; ABS_W d1,d2,d6 ; d2.w <- abs(d1.w-d2.w), trashes d6.w
ABS_W        MACRO
        sub.w \1,\2
        subx.w \3,\3
        eor.w \3,\2
        sub.w \3,\2
        ENDM

Not worked out if it's faster but worth a look :)

Edit: Ah wait, it's the abs of two values, not sure this is applicable, but left here for others :)

Don_Adan 31 May 2021 02:31

Quote:

Originally Posted by buzzybee (Post 1487782)
Hi guys!

Wonder what you guys think of my code. Its purpose is to convert the value in register d7 to its absolute value:

Code:

        swap d7                ; fetch x-acceleration
        tst.w d7        ; test if negative
        smi d4                ; Yes? Set d4
        ext.w d4        ; create polarity converter
        eor d4,d7        ; convert polarity (if x-acc is negative)

d7 is loaded with one longword-operation prior to this code, and contains x- (upper word) and y-acceleration (lower word) of a given sprite object. Code works. But this code is repeatedly executed a few times during one frame [in Reshoot Proxima 3], and I have a feeling that this can be optimized for speed. Any thoughts?

Im not eor expert, but are you sure that it works correctly? For me for D7=$FFFF, you received D7=0, not D7=1.

Don_Adan 31 May 2021 03:17

I will use next code, but this is not fastest.

Code:


 swap D7
 move.w D7,D4
 add.w D4,D4
 subx.w D4,D4
 eor.w D4,D7
 sub.w D4,D7


buzzybee 31 May 2021 08:15

Quote:

Originally Posted by Antiriad_UK (Post 1487783)
I've got a macro I stole from Kalm's on another forum:

Code:

;(Kalms explained the method itself in the other thread.) Now, to use that you'd simply
;enter the macro like any other instruction, specifying a source register, a destination
; register and an available scratch register like so:
; ABS_W d1,d2,d6 ; d2.w <- abs(d1.w-d2.w), trashes d6.w
ABS_W        MACRO
        sub.w \1,\2
        subx.w \3,\3
        eor.w \3,\2
        sub.w \3,\2
        ENDM

Not worked out if it's faster but worth a look :)

Edit: Ah wait, it's the abs of two values, not sure this is applicable, but left here for others :)

Will have a closer look at this, as my code actually uses two values - same operation is done with y-acc too, and then both results compared to control animation frames. Thanks a lot!

Quote:

Originally Posted by Don_Adan (Post 1487788)
Im not eor expert, but are you sure that it works correctly? For me for D7=$FFFF, you received D7=0, not D7=1.

You are certainly more eor-expert than I am :-) Yes, the function is not bulletproof as is. Can live with that, as speed is more important than accuracy in this case. But it'd be nice to make it bulletproof and faster :-)

a/b 31 May 2021 09:17

When you are dealing with short code sequences a lot depends on how they are interfacing with the rest of the code (extreme example: super fast pixel draw but you call it with movem/jsr/movem for each pixel), and this opens up several questions...
Would it be faster to load each word individually to get rid of swap+tst? Is source addressing mode simple enough?
Do you need the result as individual words or as a longword (simple partial parallelization: tst.l if ccr not set by move, smi, extb.l, tst.w, smi, ext.w, eor.l, so 1 less eor)?
Assuming this is for 020+, so branching should be avoided...

buzzybee 31 May 2021 09:41

Quote:

Originally Posted by a/b (Post 1487810)
When you are dealing with short code sequences a lot depends on how they are interfacing with the rest of the code (extreme example: super fast pixel draw but you call it with movem/jsr/movem for each pixel), and this opens up several questions...
Would it be faster to load each word individually to get rid of swap+tst? Is source addressing mode simple enough?
Do you need the result as individual words or as a longword (simple partial parallelization: tst.l if ccr not set by move, smi, extb.l, tst.w, smi, ext.w, eor.l, so 1 less eor)?
Assuming this is for 020+, so branching should be avoided...

See, the purpose of the code is to compare x-acceleration and y-acceleration of an object, and init animation frames which show y-axis-orientation or x-axis-orientation accordingly. Acceleration is stored like this:

0.w = x-acceleration (<0 = move left, >0 = move right)
2.w= y-acceleration (< 0 = move up, > 0 = move down)

So one longword-read can fetch both values. I tried to avoid absolute value conversion by simply comparing x-acc and y-acc, but cannot get it to work faultlessly. This is the complete code, with absolute conversion:

Code:

krakenSmall
        move.l objectListAcc(a2),d7        ; get x- and y-acceleration
        move.w d7,d0                                ; fetch y-acceleration in world
        move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0        ; convert to y-acceleration in view
        smi d4                ; y-acc is negative (object goes up) -> set to $ff
        ext.w d4        ; 0->0, $ff->$ffff
        eor d4,d0        ; convert polarity if y-acc is negative

        swap d7                ; fetch x-acceleration
        tst.w d7        ; test if negative
        smi d4                ; Yes? Set d4
        ext.w d4        ; create polarity converter
        eor.w d4,d7        ; convert polarity if x-acc is negative

          cmp.w d0,d7        ; compare x-acceleration and y-acceleration
          shi d6                ; set d6=0        if object moves up/down - y-acceleration>x-acceleration
                                  ; set d6=$ff if object moves left/right y-acceleration<x-acceleration


robinsonb5 31 May 2021 10:57

Quote:

Originally Posted by buzzybee (Post 1487812)
See, the purpose of the code is to compare x-acceleration and y-acceleration of an object, and init animation frames which show y-axis-orientation or x-axis-orientation accordingly.

If you don't actually need the absolute values once the comparison is done (and assuming the acceleration values are always less than half the range of a word), you could both add and subtract the values; if the absolute value of the second operand is larger than the first, then at least one of the two calculations will cross zero and set the carry flag. (untested, so beware of typos - but the logic should be sound.)

Edit: corrected typos - changed "scc" to "scs"!
Code:

    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w d7,d0                ; fetch y-acceleration in world
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    scs    d6
    sub.w    d0,d7
    scs    d7
    or.w    d7,d6 ; D6 is set if either the add or sub generated a carry.


buzzybee 31 May 2021 11:15

Quote:

Originally Posted by robinsonb5 (Post 1487819)
If you don't actually need the absolute values once the comparison is done (and assuming the acceleration values are always less than half the range of a word), you could both add and subtract the values; if the absolute value of the second operand is larger than the first, then at least one of the two calculations will cross zero and set the carry flag. (untested, so beware of typos - but the logic should be sound.)

Code:

    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w d7,d0                ; fetch y-acceleration in world
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    scc    d6
    sub.w    d0,d7
    scc    d7
    or.w    d7,d6 ; D6 is set if either the add or sub generated a carry.


Will test that later. Thanks a lot!

ross 31 May 2021 11:26

Quote:

Originally Posted by robinsonb5 (Post 1487819)
If you don't actually need the absolute values once the comparison is done (and assuming the acceleration values are always less than half the range of a word), you could both add and subtract the values; if the absolute value of the second operand is larger than the first, then at least one of the two calculations will cross zero and set the carry flag. (untested, so beware of typos - but the logic should be sound.)

I have serious doubts this will work :)
Just try it for positive y values greater than positive x values and you will have problems ..

---

For a bare 68k I would have no doubts and would simply do:
Code:

    move.w  objectListAcc+2(a2),d7
    sub.w  viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b  .1
    neg.w  d7
.1  move.w  objectListAcc(a2),d4
    bpl.b  .2
    neg.w  d4
.2  cmp.w  d7,d4
    shi d6

For 020+ you could try:
Code:

    move.l  objectListAcc(a2),d7
    move.l  d7,d4
    swap    d4
    sub.w  viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b  .1
    neg.w  d7
.1  tst.w  d4
    bpl.b  .2
    neg.w  d4
.2  cmp.w  d7,d4
    shi d6


buzzybee 31 May 2021 12:03

Quote:

Originally Posted by ross (Post 1487825)
I have serious doubts this will work :)
Just try it for positive y values greater than positive x values and you will have problems ..

---

For a bare 68k I would have no doubts and would simply do:
Code:

    move.w  objectListAcc+2(a2),d7
    sub.w  viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b  .1
    neg.w  d7
.1  move.w  objectListAcc(a2),d4
    bpl.b  .2
    neg.w  d4
.2  cmp.w  d7,d4
    shi d6

For 020+ you could try:
Code:

    move.l  objectListAcc(a2),d7
    move.l  d7,d4
    swap    d4
    sub.w  viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b  .1
    neg.w  d7
.1  tst.w  d4
    bpl.b  .2
    neg.w  d4
.2  cmp.w  d7,d4
    shi d6


Game will run on 68020+, as AGA is the target platform. But code will not run in a cached loop as it is too big. Therefore: Could this really be faster, with all these branches? A solution with no branching certainly looks more elegant to me ...

robinsonb5 31 May 2021 12:05

Quote:

Originally Posted by ross (Post 1487825)
I have serious doubts this will work :)
Just try it for positive y values greater than positive x values and you will have problems ..


Gah - I said beware of typos - my "scc"s should be "scs"!


With that correction made, would it work, or is there still something I'm missing?

meynaf 31 May 2021 12:11

Quote:

Originally Posted by buzzybee (Post 1487828)
Game will run on 68020+, as AGA is the target platform. But code will not run in a cached loop as it is too big. Therefore: Could this really be faster, with all these branches? A solution with no branching certainly looks more elegant to me ...

If the code does not fit into the cache, it has better be short. The branch solution looks like it is the shortest.

ross 31 May 2021 12:25

Quote:

Originally Posted by buzzybee (Post 1487828)
Game will run on 68020+, as AGA is the target platform. But code will not run in a cached loop as it is too big. Therefore: Could this really be faster, with all these branches? A solution with no branching certainly looks more elegant to me ...

You have to choose whether to be elegant or fast (and my solution may not necessarily be, you have to try it on the real deal).
I also prefer to avoid branches (there are many examples of how I feel about it on the forum), but in some cases they are advantageous, especially on slower machines.


Quote:

Originally Posted by robinsonb5 (Post 1487829)
Gah - I said beware of typos - my "scc"s should be "scs"!


With that correction made, would it work, or is there still something I'm missing?

Nah :), you simply turn the tables but the
or
brings you back to the wrong result.



Quote:

Originally Posted by meynaf (Post 1487831)
If the code does not fit into the cache, it has better be short. The branch solution looks like it is the shortest.

:great

robinsonb5 31 May 2021 12:44

Quote:

Originally Posted by ross (Post 1487833)
Nah :), you simply turn the tables but the
or
brings you back to the wrong result.


Yeah, I see it now - I think I'm still confused about exactly how the carry flag works on 68k!

robinsonb5 31 May 2021 13:10

OK this one works, I think:

Code:

    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w d7,d0                ; fetch y-acceleration in world
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    smi    d6
    sub.w    d0,d7
    smi    d7
    xor.b    d7,d6 ; D6 is set if either but not both the add or sub generated a negative result.


ross 31 May 2021 14:31

Quote:

Originally Posted by robinsonb5 (Post 1487844)
OK this one works, I think:

No, it does not ;).

EDIT:
The problem with your algorithm is that in any case (for both the addition and the subtraction) you are doing it on signed values, while what you are asked to do is to do it for unsigned values.
That's why in my algorithm I have to first change it, if needed, for both, and then make a comparison.

robinsonb5 31 May 2021 15:07

Quote:

Originally Posted by ross (Post 1487854)
No, it does not ;).

LOL - OK, what am I missing?

This testbench tests all eight permutations of +/- Lo/Hi for each operand, and gives the expected result? (0xaa in D1 at the end, or 0x55 if you reverse HI and LO)


The inputs do have to be within the range +/- 16383, however.


Code:

    ORG    $1000
   
HI equ 5
LO equ 4
 
START:                  ; first instruction of program
           
    moveq    #0,d1

    move.w  #-LO,d7
    swap    d7
    move.w #-HI,d7
    bsr    abscmp    ; set
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #-HI,d7
    swap    d7
    move.w #-LO,d7
    bsr    abscmp    ; clr
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #LO,d7
    swap    d7
    move.w #-HI,d7
    bsr    abscmp ; set
    move.b    d6,d1
    lsl.w    #1,d1

    move.w  #HI,d7
    swap    d7
    move.w  #-LO,d7
    bsr    abscmp ; clr
    move.b    d6,d1
    lsl.w    #1,d1

    move.w  #-LO,d7 ; lt
    swap    d7
    move.w #HI,d7
    bsr    abscmp ; set
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #-HI,d7 ; gt
    swap    d7
    move.w #LO,d7
    bsr    abscmp ; clr
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #LO,d7 ; lt
    swap    d7
    move.w #HI,d7
    bsr    abscmp ; set
    move.b    d6,d1
    lsl.w    #1,d1
               
    move.w #HI,d7 ; lt
    swap    d7
    move.w #LO,d7
    bsr    abscmp ; clr
    move.b    d6,d1

    lsr.w    #7,d1
             
    SIMHALT

abscmp:
    move.w d7,d0
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    smi    d6
    sub.w    d0,d7
    smi    d7
    eor.b    d7,d6
    rts

    END START

(Using EASy68k to run / sim the code)

ross 31 May 2021 15:24

I grab what Buzzybee wrote and give an example.

0.w = x-acceleration (<0 = move left, >0 = move right)
2.w= y-acceleration (< 0 = move up, > 0 = move down)

objectListAcc dc.w a,b ;x-acceleration,y-acceleration
Where b>a, generic low values, positive.

set d6=0 if object moves up/down - y-acceleration>x-acceleration
set d6=$ff if object moves left/right y-acceleration<x-acceleration

I'm expecting in this case d6=0 because y-acceleration>x-acceleration (b>a)

Now step by step in your algorithm.

move.l objectListAcc(a2),d7 ; get x- and y-acceleration
d7.h=a; d7.w=b

move.w d7,d0 ; fetch y-acceleration in world
d0.w=b

move.w viewPosition+vPyAccConvertWorldToView(pc),d6
sub.w d6,d0 ; convert to y-acceleration in view
we do not care of this

swap d7
d7.w=a

move.w d7,d6
d6.w=a

add.w d0,d6
d6=a+b=positive value

smi d6
d6.b=0

sub.w d0,d7
d7=a-b=negative value (as b>a)

smi d7
d7.b=$FF

eor.b d7,d6
d6=$00^$ff=$ff <- wrong!

I don't know how else to describe it to you :D

robinsonb5 31 May 2021 16:26

Quote:

Originally Posted by ross (Post 1487871)
I'm expecting in this case d6=0 because y-acceleration>x-acceleration (b>a)

...

I don't know how else to describe it to you :D

Oh I see - the sense of the output is inverted. I thought you were talking about something more fundamental that I was overlooking (and of course there still might be, but I do believe the algorithm itself works for input values in the range +/-16384.)

OK, in that case, switch the operands:
Code:

    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d7    ; convert to y-acceleration in view
    move.w d7,d0
    move.w    d7,d6
    swap    d7
    add.w    d7,d6
    smi    d6
    sub.w    d7,d0
    smi    d7
    eor.b    d7,d6

Alternatively, if buzzybee can tolerate the output being inverted, then the first subtraction can be reversed, saving a move, since its result being negated doesn't matter.

Code:

    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w viewPosition+vPyAccConvertWorldToView(pc),d0
    sub.w d7,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    smi    d6
    sub.w    d0,d7
    smi    d7
    eor.b    d7,d6



All times are GMT +2. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.05708 seconds with 11 queries