English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 09 June 2022, 00:52   #221
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by a/b View Post
How about this (020+, as you mentioned)?
This bfins version is nice.
Now someone need to time it in real machine versus the generic one

Quote:
Originally Posted by phx View Post
EDIT: Wow... ross and me posted in the same minute again. How likely is that?

Last edited by ross; 09 June 2022 at 00:59. Reason: because :p
ross is offline  
Old 09 June 2022, 00:59   #222
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by a/b View Post
How about this (020+, as you mentioned)?
Interesting. I wonder if something like that is also possible for more than 32 bit shifts?
Maybe jotd wants to replace LSR by ASR, unless he wants an unsigned shift.
phx is offline  
Old 09 June 2022, 01:04   #223
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
EDIT: removed the nocturnal nonsense about temporal coincidences

Quote:
Originally Posted by phx View Post
Interesting. I wonder if something like that is also possible for more than 32 bit shifts?
Yes, but inserting
D1:D0
in memory --> slow...

So it is better:
Code:
        moveq   #32,d2
        sub.l   d3,d2
        bgt.b   .1
        move.l  d1,d0
        neg.l   d2
        moveq   #0,d1
        lsr.l   d2,d0
        rts
.1:	bfins	d1,d0{d2:d3}
	rol.l	d2,d0
	lsr.l	d3,d1
        rts
Provided that in fact bfins is faster than the 3 separate instructions ..

Last edited by ross; 09 June 2022 at 08:56.
ross is offline  
Old 09 June 2022, 09:02   #224
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
You don't necessarily need to do 32-n.
What about :
Code:
 moveq #-1,d2
 lsl.l d3,d2
 eor.l d1,d0
 and.l d2,d0
 eor.l d1,d0
 ror.l d3,d0
 asr.l d3,d1
meynaf is offline  
Old 09 June 2022, 09:09   #225
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by meynaf View Post
You don't necessarily need to do 32-n.
Nice, I like the EOR tricks.

Speed is probably the same.
ross is offline  
Old 02 August 2022, 04:27   #226
dansalvato
Registered User
 
Join Date: Jun 2009
Location: United States
Posts: 57
I'm pretty new and have a fear of being outclassed by the veterans, but I wanted to share this optimized sin/cos function I came up with: LINK

It takes your byte- or word-length angle and returns a word-length sin in d0, and cos in d1. It returns values from -256 (0xff00) to 255 (0x00ff). At 6 instructions, it's highly suitable for inline ASM to avoid the overhead of a subroutine. It takes 44 cycles and 10 memory reads. The lookup table is 256 bytes.

Code:
; This snippet assumes the lookup table pointer is in a0
moveq     #64,d1
add.b     d0,d1
ext.w     d1
ext.w     d0
move.b    (a0,d1.w),d1
move.b    (a0,d0.w),d0
And if you don't need cos, it's only 2 instructions:

Code:
ext.w     d0
move.b    (a0,d0.w),d0
The "trick" is less in the code above, but more about the lookup table, and how it leverages signed values. Your input value is sign-extended so that angles 128 to 255 are negative (-128 to -1). Sine values are also negative in this range, so the high byte of your sign-extended input value is also used for the resulting sin/cos. So you can think of the lookup table as being word-length signed values, but the upper byte has been discarded from each entry, because it's provided by your input. Finally, the lookup table pointer is actually the center of the table, rather than the top, so that your signed index gets the correct data in either direction. Finally, the sign extension doubles as a convenient way of clearing the upper byte of your angle, something you'd otherwise have to do manually to avoid indexing out of bounds.

The tradeoff here is that you might want your resulting data to be in a different or more precise range than this provides, such as -0x8000 to 0x7fff. It kind of depends on how you need to shift/multiply the data after you retrieve it. I find this works especially well if you're using it for objects whose positions have 4- or 8-bit fractional values.

Last edited by dansalvato; 02 August 2022 at 09:21.
dansalvato is offline  
Old 02 August 2022, 07:49   #227
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
If you extend the table to cover 90 degrees more (repeat the 64 first entries at the end) you can do the double lookup in 32 cycles (and 7 memory accesses):

Code:
    ext.w     d0
    move.b    64(a0,d0.w),d1
    move.b    (a0,d0.w),d0
Doesn't extend as nicely to larger tables (or word sized values), but the idea of having an extra pi/2 values at the end (or start) of a sin/cos table can often be used for a slight speed-up at the cost of extra memory usage.
paraj is offline  
Old 02 August 2022, 09:16   #228
dansalvato
Registered User
 
Join Date: Jun 2009
Location: United States
Posts: 57
Quote:
Originally Posted by paraj View Post
If you extend the table to cover 90 degrees more (repeat the 64 first entries at the end) you can do the double lookup in 32 cycles (and 7 memory accesses):
It doesn't work for this specific implementation, because the sign of the angle is used as the sign of the resulting value, which is how I only retrieve a byte but have a range of -256 to 255. So, d1 needs to be separately sign-extended before fetching the result.

Even if I traded a bit of precision and made the table values signed bytes (-128 to 127), I'd have to sign-extend the result before I can use it in further calculations, so there's no gain. Another option is giving the table full word-sized values, but then there's the overhead of clearing the upper byte from your angle and doubling it to fetch from the table—plus, the cos value is then at an inconvenient displacement of 128.
dansalvato is offline  
Old 02 August 2022, 17:21   #229
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
D'oh, of course you're right. Brain fart on my side. Apologies.
paraj is offline  
Old 07 August 2022, 02:52   #230
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
@dansalvato, it's a bad feeling to feel that way ("great code has already been written"), and also a good challenge because you could be great, not all code has been written. And a good feeling that someone cares about things like this nowadays with one line of code being translated to a kilobyte of code with megabytes and more of dependencies to even run at all. Pure binary math is very fun

Current size record for sine calc is 32b (Raylight/PWL recently beat my Bhaskara-based 36b) and you can easily make an interpolating one in 50b (2008).

A nice goal is func(angle, ampl) real-time in a few cycles within a not too bad aberration.
Photon is offline  
Old 07 August 2022, 03:24   #231
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Quote:
Originally Posted by Photon View Post
Current size record for sine calc is 32b (Raylight/PWL recently beat my Bhaskara-based 36b) and you can easily make an interpolating one in 50b (2008).
What are the sine parameters and error constraints? We did a compo a while ago here on EAB with: 1024 entries, 16384 amplitude, max. 5% error.
My 2nd order parabolic is 24 bytes, or 38 bytes very accurate (<1% error): http://eab.abime.net/showthread.php?t=106304
a/b is offline  
Old 07 August 2022, 03:31   #232
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
@dansalvato, you see?

@a/b cool, what's the aberration for the 24b one?
Photon is offline  
Old 07 August 2022, 04:34   #233
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Jobbo did some number crunching and put all the data into this spreadsheet: https://docs.google.com/spreadsheets...it?usp=sharing ...
Column D is a reference sine, my stuff is in V (24, lower acc) and Z (38, higher acc).
The lower accuracy version is probably not accurate enough, the higher accuracy version is probably too accurate :P as we also wanted to break the 1% threshold (so a better size vs. accuracy trade off is possible, reason why I asked about error constraints).
a/b is offline  
Old 22 February 2023, 14:45   #234
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
While working on a log2(pow2?) function I realized that the bitfield instructions can be used to test and clear a register in one instruction:
Code:
bfclr d0{#0:#32}
bne NotClear
The alternative would be to store, clear and re-test the value (might be faster in some cases, but I like the compactness).

My NextPowerOf2 function ended up being 6 instructions (valid for values 2 -> 2^31) or 6 non-branching instructions (valid for values 2 -> 2^31-1). (If the answer to 1 as input is 1 then that can be considered working too.)
NorthWay is offline  
Old 22 February 2023, 16:20   #235
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
Quote:
Originally Posted by NorthWay View Post
While working on a log2(pow2?) function I realized that the bitfield instructions can be used to test and clear a register in one instruction:
I just realized the bset/bchg/bclr instructions also test before modifying the destination
hooverphonique is offline  
Old 22 February 2023, 17:37   #236
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by hooverphonique View Post
I just realized the bset/bchg/bclr instructions also test before modifying the destination
Better late than never...
phx is offline  
Old 02 March 2023, 11:25   #237
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
If you want a conditional rotate by 1 and prefer it to be branch free then sCC is your friend:
Code:
(needs already cleared upper 3 bytes of d0) (no it doesn't)
sne d0
rol.l d0,d1
Though 68000 might be doing all the rotates and not only 5 bits worth of it?

Last edited by NorthWay; 02 March 2023 at 12:35. Reason: I knew better it doesn't need to clear... thanks meynaf
NorthWay is offline  
Old 02 March 2023, 12:16   #238
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by NorthWay View Post
(needs already cleared upper 3 bytes of d0)
It does not. Upper bytes are unimportant here.


Quote:
Originally Posted by NorthWay View Post
Though 68000 might be doing all the rotates and not only 5 bits worth of it?
Even 68000 should do only 31 rotates. However these 31 shifts are gonna take very long, much longer than what a branch would.
Actually, on 020-030 the above isn't faster than a branch either.
meynaf is offline  
Old 02 March 2023, 19:56   #239
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
It's only the low 6 bits that matter (modulo 64, so 0-63 shifts/rotates). And yes, it actually does 63 operations if the low 6 bits are all set to 1.
a/b is offline  
Old 04 March 2023, 15:59   #240
remz
Registered User
 
Join Date: May 2022
Location: Canada
Posts: 138
Curiosity: Why were 6 bits allocated to shift&rotate instructions instead of 5?
Is there any practicality to rotate more than 32?
remz is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68000 boot code billt Coders. General 15 05 May 2012 20:13
Wasted Dreams on 68000 sanjyuubi support.Games 5 27 May 2011 17:11
680x0 to 68000 Counia Hardware mods 1 01 March 2011 10:18
quitting on 68000? Hungry Horace project.WHDLoad 60 19 December 2006 20:17
3D code and/or internet code for Blitz Basic 2.1 EdzUp Retrogaming General Discussion 0 10 February 2002 11:40

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 06:52.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.25192 seconds with 16 queries