English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 31 May 2021, 01:02   #1
buzzybee
Registered User
 
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
Fast way of getting absolute value

Hi guys!

Wonder what you guys think of my code. Its purpose is to convert the value in register d7 to its absolute value:

Code:
	swap d7		; fetch x-acceleration
	tst.w d7	; test if negative
	smi d4		; Yes? Set d4
	ext.w d4	; create polarity converter
	eor d4,d7	; convert polarity (if x-acc is negative)
d7 is loaded with one longword-operation prior to this code, and contains x- (upper word) and y-acceleration (lower word) of a given sprite object. Code works. But this code is repeatedly executed a few times during one frame [in Reshoot Proxima 3], and I have a feeling that this can be optimized for speed. Any thoughts?
buzzybee is offline  
Old 31 May 2021, 01:04   #2
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
I've got a macro I stole from Kalm's on another forum:

Code:
;(Kalms explained the method itself in the other thread.) Now, to use that you'd simply 
;enter the macro like any other instruction, specifying a source register, a destination
; register and an available scratch register like so:
; ABS_W d1,d2,d6 ; d2.w <- abs(d1.w-d2.w), trashes d6.w
ABS_W	MACRO
	sub.w \1,\2
	subx.w \3,\3
	eor.w \3,\2
	sub.w \3,\2
	ENDM
Not worked out if it's faster but worth a look

Edit: Ah wait, it's the abs of two values, not sure this is applicable, but left here for others
Antiriad_UK is offline  
Old 31 May 2021, 02:31   #3
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by buzzybee View Post
Hi guys!

Wonder what you guys think of my code. Its purpose is to convert the value in register d7 to its absolute value:

Code:
	swap d7		; fetch x-acceleration
	tst.w d7	; test if negative
	smi d4		; Yes? Set d4
	ext.w d4	; create polarity converter
	eor d4,d7	; convert polarity (if x-acc is negative)
d7 is loaded with one longword-operation prior to this code, and contains x- (upper word) and y-acceleration (lower word) of a given sprite object. Code works. But this code is repeatedly executed a few times during one frame [in Reshoot Proxima 3], and I have a feeling that this can be optimized for speed. Any thoughts?
Im not eor expert, but are you sure that it works correctly? For me for D7=$FFFF, you received D7=0, not D7=1.
Don_Adan is offline  
Old 31 May 2021, 03:17   #4
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
I will use next code, but this is not fastest.

Code:
 swap D7
 move.w D7,D4
 add.w D4,D4
 subx.w D4,D4
 eor.w D4,D7
 sub.w D4,D7
Don_Adan is offline  
Old 31 May 2021, 08:15   #5
buzzybee
Registered User
 
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
Quote:
Originally Posted by Antiriad_UK View Post
I've got a macro I stole from Kalm's on another forum:

Code:
;(Kalms explained the method itself in the other thread.) Now, to use that you'd simply 
;enter the macro like any other instruction, specifying a source register, a destination
; register and an available scratch register like so:
; ABS_W d1,d2,d6 ; d2.w <- abs(d1.w-d2.w), trashes d6.w
ABS_W	MACRO
	sub.w \1,\2
	subx.w \3,\3
	eor.w \3,\2
	sub.w \3,\2
	ENDM
Not worked out if it's faster but worth a look

Edit: Ah wait, it's the abs of two values, not sure this is applicable, but left here for others
Will have a closer look at this, as my code actually uses two values - same operation is done with y-acc too, and then both results compared to control animation frames. Thanks a lot!

Quote:
Originally Posted by Don_Adan View Post
Im not eor expert, but are you sure that it works correctly? For me for D7=$FFFF, you received D7=0, not D7=1.
You are certainly more eor-expert than I am :-) Yes, the function is not bulletproof as is. Can live with that, as speed is more important than accuracy in this case. But it'd be nice to make it bulletproof and faster :-)
buzzybee is offline  
Old 31 May 2021, 09:17   #6
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
When you are dealing with short code sequences a lot depends on how they are interfacing with the rest of the code (extreme example: super fast pixel draw but you call it with movem/jsr/movem for each pixel), and this opens up several questions...
Would it be faster to load each word individually to get rid of swap+tst? Is source addressing mode simple enough?
Do you need the result as individual words or as a longword (simple partial parallelization: tst.l if ccr not set by move, smi, extb.l, tst.w, smi, ext.w, eor.l, so 1 less eor)?
Assuming this is for 020+, so branching should be avoided...
a/b is online now  
Old 31 May 2021, 09:41   #7
buzzybee
Registered User
 
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
Quote:
Originally Posted by a/b View Post
When you are dealing with short code sequences a lot depends on how they are interfacing with the rest of the code (extreme example: super fast pixel draw but you call it with movem/jsr/movem for each pixel), and this opens up several questions...
Would it be faster to load each word individually to get rid of swap+tst? Is source addressing mode simple enough?
Do you need the result as individual words or as a longword (simple partial parallelization: tst.l if ccr not set by move, smi, extb.l, tst.w, smi, ext.w, eor.l, so 1 less eor)?
Assuming this is for 020+, so branching should be avoided...
See, the purpose of the code is to compare x-acceleration and y-acceleration of an object, and init animation frames which show y-axis-orientation or x-axis-orientation accordingly. Acceleration is stored like this:

0.w = x-acceleration (<0 = move left, >0 = move right)
2.w= y-acceleration (< 0 = move up, > 0 = move down)

So one longword-read can fetch both values. I tried to avoid absolute value conversion by simply comparing x-acc and y-acc, but cannot get it to work faultlessly. This is the complete code, with absolute conversion:

Code:
krakenSmall
	move.l objectListAcc(a2),d7	; get x- and y-acceleration
	move.w d7,d0				; fetch y-acceleration in world 
	move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0	; convert to y-acceleration in view
	smi d4		; y-acc is negative (object goes up) -> set to $ff
	ext.w d4	; 0->0, $ff->$ffff
	eor d4,d0	; convert polarity if y-acc is negative

	swap d7		; fetch x-acceleration
	tst.w d7	; test if negative
	smi d4		; Yes? Set d4
	ext.w d4	; create polarity converter
	eor.w d4,d7	; convert polarity if x-acc is negative

   	cmp.w d0,d7	; compare x-acceleration and y-acceleration
   	shi d6		; set d6=0 	if object moves up/down - y-acceleration>x-acceleration
   				; set d6=$ff if object moves left/right y-acceleration<x-acceleration
buzzybee is offline  
Old 31 May 2021, 10:57   #8
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by buzzybee View Post
See, the purpose of the code is to compare x-acceleration and y-acceleration of an object, and init animation frames which show y-axis-orientation or x-axis-orientation accordingly.
If you don't actually need the absolute values once the comparison is done (and assuming the acceleration values are always less than half the range of a word), you could both add and subtract the values; if the absolute value of the second operand is larger than the first, then at least one of the two calculations will cross zero and set the carry flag. (untested, so beware of typos - but the logic should be sound.)

Edit: corrected typos - changed "scc" to "scs"!
Code:
    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w d7,d0                ; fetch y-acceleration in world 
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    scs    d6
    sub.w    d0,d7
    scs    d7
    or.w    d7,d6 ; D6 is set if either the add or sub generated a carry.

Last edited by robinsonb5; 31 May 2021 at 12:06.
robinsonb5 is offline  
Old 31 May 2021, 11:15   #9
buzzybee
Registered User
 
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
Quote:
Originally Posted by robinsonb5 View Post
If you don't actually need the absolute values once the comparison is done (and assuming the acceleration values are always less than half the range of a word), you could both add and subtract the values; if the absolute value of the second operand is larger than the first, then at least one of the two calculations will cross zero and set the carry flag. (untested, so beware of typos - but the logic should be sound.)

Code:
    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w d7,d0                ; fetch y-acceleration in world 
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    scc    d6
    sub.w    d0,d7
    scc    d7
    or.w    d7,d6 ; D6 is set if either the add or sub generated a carry.
Will test that later. Thanks a lot!
buzzybee is offline  
Old 31 May 2021, 11:26   #10
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by robinsonb5 View Post
If you don't actually need the absolute values once the comparison is done (and assuming the acceleration values are always less than half the range of a word), you could both add and subtract the values; if the absolute value of the second operand is larger than the first, then at least one of the two calculations will cross zero and set the carry flag. (untested, so beware of typos - but the logic should be sound.)
I have serious doubts this will work
Just try it for positive y values greater than positive x values and you will have problems ..

---

For a bare 68k I would have no doubts and would simply do:
Code:
    move.w  objectListAcc+2(a2),d7
    sub.w   viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b   .1
    neg.w   d7
.1  move.w  objectListAcc(a2),d4
    bpl.b   .2
    neg.w   d4
.2  cmp.w   d7,d4
    shi d6
For 020+ you could try:
Code:
    move.l  objectListAcc(a2),d7
    move.l  d7,d4
    swap    d4
    sub.w   viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b   .1
    neg.w   d7
.1  tst.w   d4
    bpl.b   .2
    neg.w   d4
.2  cmp.w   d7,d4
    shi d6
ross is offline  
Old 31 May 2021, 12:03   #11
buzzybee
Registered User
 
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
Quote:
Originally Posted by ross View Post
I have serious doubts this will work
Just try it for positive y values greater than positive x values and you will have problems ..

---

For a bare 68k I would have no doubts and would simply do:
Code:
    move.w  objectListAcc+2(a2),d7
    sub.w   viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b   .1
    neg.w   d7
.1  move.w  objectListAcc(a2),d4
    bpl.b   .2
    neg.w   d4
.2  cmp.w   d7,d4
    shi d6
For 020+ you could try:
Code:
    move.l  objectListAcc(a2),d7
    move.l  d7,d4
    swap    d4
    sub.w   viewPosition+vPyAccConvertWorldToView(pc),d7
    bpl.b   .1
    neg.w   d7
.1  tst.w   d4
    bpl.b   .2
    neg.w   d4
.2  cmp.w   d7,d4
    shi d6
Game will run on 68020+, as AGA is the target platform. But code will not run in a cached loop as it is too big. Therefore: Could this really be faster, with all these branches? A solution with no branching certainly looks more elegant to me ...
buzzybee is offline  
Old 31 May 2021, 12:05   #12
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by ross View Post
I have serious doubts this will work
Just try it for positive y values greater than positive x values and you will have problems ..

Gah - I said beware of typos - my "scc"s should be "scs"!


With that correction made, would it work, or is there still something I'm missing?
robinsonb5 is offline  
Old 31 May 2021, 12:11   #13
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by buzzybee View Post
Game will run on 68020+, as AGA is the target platform. But code will not run in a cached loop as it is too big. Therefore: Could this really be faster, with all these branches? A solution with no branching certainly looks more elegant to me ...
If the code does not fit into the cache, it has better be short. The branch solution looks like it is the shortest.
meynaf is offline  
Old 31 May 2021, 12:25   #14
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by buzzybee View Post
Game will run on 68020+, as AGA is the target platform. But code will not run in a cached loop as it is too big. Therefore: Could this really be faster, with all these branches? A solution with no branching certainly looks more elegant to me ...
You have to choose whether to be elegant or fast (and my solution may not necessarily be, you have to try it on the real deal).
I also prefer to avoid branches (there are many examples of how I feel about it on the forum), but in some cases they are advantageous, especially on slower machines.


Quote:
Originally Posted by robinsonb5 View Post
Gah - I said beware of typos - my "scc"s should be "scs"!


With that correction made, would it work, or is there still something I'm missing?
Nah , you simply turn the tables but the
or
brings you back to the wrong result.



Quote:
Originally Posted by meynaf View Post
If the code does not fit into the cache, it has better be short. The branch solution looks like it is the shortest.
ross is offline  
Old 31 May 2021, 12:44   #15
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by ross View Post
Nah , you simply turn the tables but the
or
brings you back to the wrong result.

Yeah, I see it now - I think I'm still confused about exactly how the carry flag works on 68k!
robinsonb5 is offline  
Old 31 May 2021, 13:10   #16
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
OK this one works, I think:

Code:
    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w d7,d0                ; fetch y-acceleration in world 
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    smi    d6
    sub.w    d0,d7
    smi    d7
    xor.b    d7,d6 ; D6 is set if either but not both the add or sub generated a negative result.
robinsonb5 is offline  
Old 31 May 2021, 14:31   #17
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by robinsonb5 View Post
OK this one works, I think:
No, it does not .

EDIT:
The problem with your algorithm is that in any case (for both the addition and the subtraction) you are doing it on signed values, while what you are asked to do is to do it for unsigned values.
That's why in my algorithm I have to first change it, if needed, for both, and then make a comparison.

Last edited by ross; 31 May 2021 at 14:58.
ross is offline  
Old 31 May 2021, 15:07   #18
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by ross View Post
No, it does not .
LOL - OK, what am I missing?

This testbench tests all eight permutations of +/- Lo/Hi for each operand, and gives the expected result? (0xaa in D1 at the end, or 0x55 if you reverse HI and LO)


The inputs do have to be within the range +/- 16383, however.


Code:
    ORG    $1000
    
HI equ 5
LO equ 4 
   
START:                  ; first instruction of program
            
    moveq    #0,d1

    move.w  #-LO,d7
    swap    d7
    move.w #-HI,d7
    bsr    abscmp    ; set
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #-HI,d7
    swap    d7
    move.w #-LO,d7
    bsr    abscmp    ; clr
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #LO,d7
    swap    d7
    move.w #-HI,d7
    bsr    abscmp ; set
    move.b    d6,d1
    lsl.w    #1,d1

    move.w  #HI,d7
    swap    d7
    move.w  #-LO,d7
    bsr    abscmp ; clr
    move.b    d6,d1
    lsl.w    #1,d1

    move.w  #-LO,d7 ; lt
    swap    d7
    move.w #HI,d7
    bsr    abscmp ; set
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #-HI,d7 ; gt
    swap    d7
    move.w #LO,d7
    bsr    abscmp ; clr
    move.b    d6,d1
    lsl.w    #1,d1

    move.w #LO,d7 ; lt
    swap    d7
    move.w #HI,d7
    bsr    abscmp ; set
    move.b    d6,d1
    lsl.w    #1,d1
                
    move.w #HI,d7 ; lt
    swap    d7
    move.w #LO,d7
    bsr    abscmp ; clr
    move.b    d6,d1

    lsr.w    #7,d1
               
    SIMHALT 

abscmp:
    move.w d7,d0
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    smi    d6
    sub.w    d0,d7
    smi    d7
    eor.b    d7,d6
    rts

    END START
(Using EASy68k to run / sim the code)
robinsonb5 is offline  
Old 31 May 2021, 15:24   #19
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
I grab what Buzzybee wrote and give an example.

0.w = x-acceleration (<0 = move left, >0 = move right)
2.w= y-acceleration (< 0 = move up, > 0 = move down)

objectListAcc dc.w a,b ;x-acceleration,y-acceleration
Where b>a, generic low values, positive.

set d6=0 if object moves up/down - y-acceleration>x-acceleration
set d6=$ff if object moves left/right y-acceleration<x-acceleration

I'm expecting in this case d6=0 because y-acceleration>x-acceleration (b>a)

Now step by step in your algorithm.

move.l objectListAcc(a2),d7 ; get x- and y-acceleration
d7.h=a; d7.w=b

move.w d7,d0 ; fetch y-acceleration in world
d0.w=b

move.w viewPosition+vPyAccConvertWorldToView(pc),d6
sub.w d6,d0 ; convert to y-acceleration in view
we do not care of this

swap d7
d7.w=a

move.w d7,d6
d6.w=a

add.w d0,d6
d6=a+b=positive value

smi d6
d6.b=0

sub.w d0,d7
d7=a-b=negative value (as b>a)

smi d7
d7.b=$FF

eor.b d7,d6
d6=$00^$ff=$ff <- wrong!

I don't know how else to describe it to you
ross is offline  
Old 31 May 2021, 16:26   #20
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by ross View Post
I'm expecting in this case d6=0 because y-acceleration>x-acceleration (b>a)

...

I don't know how else to describe it to you
Oh I see - the sense of the output is inverted. I thought you were talking about something more fundamental that I was overlooking (and of course there still might be, but I do believe the algorithm itself works for input values in the range +/-16384.)

OK, in that case, switch the operands:
Code:
    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w viewPosition+vPyAccConvertWorldToView(pc),d6
    sub.w d6,d7    ; convert to y-acceleration in view
    move.w d7,d0
    move.w    d7,d6
    swap    d7
    add.w    d7,d6
    smi    d6
    sub.w    d7,d0
    smi    d7
    eor.b    d7,d6
Alternatively, if buzzybee can tolerate the output being inverted, then the first subtraction can be reversed, saving a move, since its result being negated doesn't matter.

Code:
    move.l objectListAcc(a2),d7    ; get x- and y-acceleration
    move.w viewPosition+vPyAccConvertWorldToView(pc),d0
    sub.w d7,d0 ; convert to y-acceleration in view
    swap    d7
    move.w    d7,d6
    add.w    d0,d6
    smi    d6
    sub.w    d0,d7
    smi    d7
    eor.b    d7,d6
robinsonb5 is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Absolute Inebriation heavy project.WHDLoad 3 31 January 2021 16:57
Absolute addressing Old_Bob Coders. Asm / Hardware 9 20 September 2018 10:36
Clairvoyance by Absolute Nibbler request.Demos 1 19 August 2018 03:41
Absolute beginner! JonniR New to Emulation or Amiga scene 8 25 February 2012 21:49
Absolute Beginner rick4676 New to Emulation or Amiga scene 3 11 December 2005 11:06

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:44.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10029 seconds with 13 queries