09 June 2022, 00:52 | #221 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,515
|
This bfins version is nice.
Now someone need to time it in real machine versus the generic one Last edited by ross; 09 June 2022 at 00:59. Reason: because :p |
09 June 2022, 00:59 | #222 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,569
|
|
09 June 2022, 01:04 | #223 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,515
|
EDIT: removed the nocturnal nonsense about temporal coincidences
Quote:
D1:D0in memory --> slow... So it is better: Code:
moveq #32,d2 sub.l d3,d2 bgt.b .1 move.l d1,d0 neg.l d2 moveq #0,d1 lsr.l d2,d0 rts .1: bfins d1,d0{d2:d3} rol.l d2,d0 lsr.l d3,d1 rts Last edited by ross; 09 June 2022 at 08:56. |
|
09 June 2022, 09:02 | #224 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,377
|
You don't necessarily need to do 32-n.
What about : Code:
moveq #-1,d2 lsl.l d3,d2 eor.l d1,d0 and.l d2,d0 eor.l d1,d0 ror.l d3,d0 asr.l d3,d1 |
09 June 2022, 09:09 | #225 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,515
|
|
02 August 2022, 04:27 | #226 |
Registered User
Join Date: Jun 2009
Location: United States
Posts: 57
|
I'm pretty new and have a fear of being outclassed by the veterans, but I wanted to share this optimized sin/cos function I came up with: LINK
It takes your byte- or word-length angle and returns a word-length sin in d0, and cos in d1. It returns values from -256 (0xff00) to 255 (0x00ff). At 6 instructions, it's highly suitable for inline ASM to avoid the overhead of a subroutine. It takes 44 cycles and 10 memory reads. The lookup table is 256 bytes. Code:
; This snippet assumes the lookup table pointer is in a0 moveq #64,d1 add.b d0,d1 ext.w d1 ext.w d0 move.b (a0,d1.w),d1 move.b (a0,d0.w),d0 Code:
ext.w d0 move.b (a0,d0.w),d0 The tradeoff here is that you might want your resulting data to be in a different or more precise range than this provides, such as -0x8000 to 0x7fff. It kind of depends on how you need to shift/multiply the data after you retrieve it. I find this works especially well if you're using it for objects whose positions have 4- or 8-bit fractional values. Last edited by dansalvato; 02 August 2022 at 09:21. |
02 August 2022, 07:49 | #227 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,284
|
If you extend the table to cover 90 degrees more (repeat the 64 first entries at the end) you can do the double lookup in 32 cycles (and 7 memory accesses):
Code:
ext.w d0 move.b 64(a0,d0.w),d1 move.b (a0,d0.w),d0 |
02 August 2022, 09:16 | #228 | |
Registered User
Join Date: Jun 2009
Location: United States
Posts: 57
|
Quote:
Even if I traded a bit of precision and made the table values signed bytes (-128 to 127), I'd have to sign-extend the result before I can use it in further calculations, so there's no gain. Another option is giving the table full word-sized values, but then there's the overhead of clearing the upper byte from your angle and doubling it to fetch from the table—plus, the cos value is then at an inconvenient displacement of 128. |
|
02 August 2022, 17:21 | #229 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,284
|
D'oh, of course you're right. Brain fart on my side. Apologies.
|
07 August 2022, 02:52 | #230 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,752
|
@dansalvato, it's a bad feeling to feel that way ("great code has already been written"), and also a good challenge because you could be great, not all code has been written. And a good feeling that someone cares about things like this nowadays with one line of code being translated to a kilobyte of code with megabytes and more of dependencies to even run at all. Pure binary math is very fun
Current size record for sine calc is 32b (Raylight/PWL recently beat my Bhaskara-based 36b) and you can easily make an interpolating one in 50b (2008). A nice goal is func(angle, ampl) real-time in a few cycles within a not too bad aberration. |
07 August 2022, 03:24 | #231 | |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,087
|
Quote:
My 2nd order parabolic is 24 bytes, or 38 bytes very accurate (<1% error): http://eab.abime.net/showthread.php?t=106304 |
|
07 August 2022, 03:31 | #232 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,752
|
@dansalvato, you see?
@a/b cool, what's the aberration for the 24b one? |
07 August 2022, 04:34 | #233 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,087
|
Jobbo did some number crunching and put all the data into this spreadsheet: https://docs.google.com/spreadsheets...it?usp=sharing ...
Column D is a reference sine, my stuff is in V (24, lower acc) and Z (38, higher acc). The lower accuracy version is probably not accurate enough, the higher accuracy version is probably too accurate :P as we also wanted to break the 1% threshold (so a better size vs. accuracy trade off is possible, reason why I asked about error constraints). |
22 February 2023, 14:45 | #234 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 862
|
While working on a log2(pow2?) function I realized that the bitfield instructions can be used to test and clear a register in one instruction:
Code:
bfclr d0{#0:#32} bne NotClear My NextPowerOf2 function ended up being 6 instructions (valid for values 2 -> 2^31) or 6 non-branching instructions (valid for values 2 -> 2^31-1). (If the answer to 1 as input is 1 then that can be considered working too.) |
22 February 2023, 16:20 | #235 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,654
|
|
22 February 2023, 17:37 | #236 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,569
|
|
02 March 2023, 11:25 | #237 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 862
|
If you want a conditional rotate by 1 and prefer it to be branch free then sCC is your friend:
Code:
(needs already cleared upper 3 bytes of d0) (no it doesn't) sne d0 rol.l d0,d1 Last edited by NorthWay; 02 March 2023 at 12:35. Reason: I knew better it doesn't need to clear... thanks meynaf |
02 March 2023, 12:16 | #238 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,377
|
It does not. Upper bytes are unimportant here.
Quote:
Actually, on 020-030 the above isn't faster than a branch either. |
|
02 March 2023, 19:56 | #239 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,087
|
It's only the low 6 bits that matter (modulo 64, so 0-63 shifts/rotates). And yes, it actually does 63 operations if the low 6 bits are all set to 1.
|
04 March 2023, 15:59 | #240 |
Registered User
Join Date: May 2022
Location: Canada
Posts: 147
|
Curiosity: Why were 6 bits allocated to shift&rotate instructions instead of 5?
Is there any practicality to rotate more than 32? |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68000 boot code | billt | Coders. General | 15 | 05 May 2012 20:13 |
Wasted Dreams on 68000 | sanjyuubi | support.Games | 5 | 27 May 2011 17:11 |
680x0 to 68000 | Counia | Hardware mods | 1 | 01 March 2011 10:18 |
quitting on 68000? | Hungry Horace | project.WHDLoad | 60 | 19 December 2006 20:17 |
3D code and/or internet code for Blitz Basic 2.1 | EdzUp | Retrogaming General Discussion | 0 | 10 February 2002 11:40 |
|
|