23 March 2021, 16:13 | #81 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
|
24 March 2021, 05:56 | #82 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
I came up with my own higher precision version in 40bytes. Not sure if anyone can spot some extra size optimizations, it's late here!
Code:
moveq #0,d0 move.w #512,d1 .loop: move.w d1,d2 mulu d0,d2 // d2 = (512-x)*x move.l d2,d3 lsl.l #8,d3 lsl.l #5,d3 // d3 = 8192*(512-x)*x lsr.l #3,d2 move.w #40960,d4 sub.w d2,d4 // d4 = 40960-(512-x)*x/8 divu d4,d3 // d3 = (8192*(512-x)*x) / (40960-(512-x)*x/8) move.w d3,(a0)+ neg.w d3 move.w d3,(1022,a0) addq #1,d0 subq #1,d1 bgt.s .loop |
24 March 2021, 09:18 | #83 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
d0/d1 is essentially whan I'm doing with d0/a1 (but without a muls, my 26 byter was doing this with a muls). You can eliminate the multiplication by increasing counter1 by counter2's value and then decreasing counter2 by 2. The product is now counter1's value.
0*512 = 0, d = +511+2 1*511 = 511, d = +511 2*510 = 1020, d = +509 3*509 = 1527, d = +507 ... becomes 511+2 0 511 0+511 509 0+511+509=1020 507 0+511+509+507=1527 ... You still have to initialize and increment both, it only eliminates the multiplication (-2 bytes), but since you need the product in 2 registers (d2 and d3) you still need an additional move so it goes back to 40 bytes. However it's faster. Last edited by a/b; 24 March 2021 at 09:23. |
26 March 2021, 17:17 | #84 | |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Quote:
Divisor includes 5*pi^2, and with pi being 512 this is then 327680. Swap does no good (kills precision), numbers have to be further divided by >5, so 8. And then I remembered your code, 40960 and those crazy shifts. So I've integrated 2 optimizations into your code and it's 38 bytes now: - first opt is what I said in my previous post (replace multiplication with a1 increments), which then enables - second opt is that you can replace 40960-a*b with a counter that starts at 40960 (actually 327680=5*65536 pre-shift) and then decrease it by the same amount you increase a*b (which is a1) Code:
moveq #0,d0 moveq #5,d1 swap d1 move.w #511+2,a1 .loop move.l d0,d3 move.l d1,d2 lsl.l #8,d3 lsl.l #8-3,d3 lsr.l #3,d2 divu.w d2,d3 move.w d3,(a0)+ neg.w d3 move.w d3,(1022,a0) subq.l #2,a1 sub.l a1,d1 add.l a1,d0 bne.b .loop Last edited by a/b; 26 March 2021 at 17:37. Reason: typos |
|
26 March 2021, 20:09 | #85 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
Excellent!
Yeah, I started from Bhaskara for my version. Funnily enough I had the same moveq #5 and swap trick in some version but could not get the rest reorganized. The precision is still not as good as your last version. |
26 March 2021, 21:59 | #86 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
Bhaskara approximation is great for it's time, but carries some degree of error
|
26 March 2021, 22:18 | #87 |
Registered User
Join Date: Jun 2020
Location: Druidia
Posts: 386
|
My one is in the spreadsheet "Jobbo 2", it's totally respectable in terms of error.
|
02 April 2021, 18:28 | #88 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
I'm writing a very tiny intro and I need a circular 256 byte array of unsigned 2 quadrants sine values (to add it to a bitmap pointer).
I ended up with: Code:
;0->$ff->0 circular, len=256 moveq #0,d0 movea.w #(256*4)+(8/2)-1,a1 .l subq.l #8,a1 move.w d0,-(sp) move.b (sp)+,(a0)+ add.l a1,d0 bgt.b .l This thread is full of excellent ideas and it's worth taking advantage of them |
03 April 2021, 12:05 | #89 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,709
|
Pushing a word to the stack, and popping only a byte back. You absolute madman!
I assume you backup the stack pointer before the loop and restore it afterwards? |
03 April 2021, 12:28 | #90 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
The stack automatically stays word aligned. You can fast left shift a byte by 8 by pushing a byte and popping a word too
|
03 April 2021, 13:39 | #91 | |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,709
|
Quote:
Is this really faster than lsl.w #8,Dn though? Maybe on a 68000 system with fastmem? |
|
03 April 2021, 13:51 | #92 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
2*8 vs. 6+8*2 cycles.
|
03 April 2021, 14:03 | #93 |
Registered User
Join Date: Sep 2009
Location: Norway
Posts: 1,709
|
But it's slower on 68020+, right?
|
03 April 2021, 17:13 | #94 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Right.
But you need one more register and one more instruction --- The effect is this, just a couple of hundred bytes. Inspired by Alcatrax/Kefrens bars, massive blitter feedback effect and plane tricks. Notice the 'Spectrum' color clash . Of course I will use other colors and maybe other patterns as well.. 50fps on A500 (full overscan), but all DMA per line used (and this is a problem, I will probably be forced to remove something..). EDIT: removed, too many fainting have been reported Last edited by ross; 03 April 2021 at 18:41. |
03 April 2021, 18:07 | #95 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,209
|
Are you trying to model what the inside of my brain looked like last night
|
03 April 2021, 18:40 | #96 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
|
03 April 2021, 18:42 | #97 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,214
|
|
20 May 2023, 17:03 | #98 |
Registered User
Join Date: Apr 2023
Location: "Hamcastle"
Posts: 20
|
I recall an antique sine calc routine by ray. I've never used it and I'm not so sure if it fits in here since it stores longwords, nor do I know anything about its accuracy, but here we go:
Code:
;============================== ;= ;= 32 bytes sinus table generator ;= for a 16.16 fixedpoint sinus table with 1024 entries ;= ;= by ray//.tSCc. 2001 ;============================== section text lea.l sinus(pc),a0 move.w #512-1,d0 .gen_loop move.w d0,d1 subi.w #256,d1 muls.w d1,d1 subi.l #$10000,d1 move.l d1,512*4(a0) sub.l d1,(a0)+ dbra d0,.gen_loop ;---- section bss sinus ds.l 1024 |
20 May 2023, 20:46 | #99 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
|
|
21 May 2023, 02:19 | #100 |
Registered User
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 171
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
horiz. size & vert. size greyed out in some configurations | honx | support.WinUAE | 3 | 15 August 2020 21:14 |
Coding Competition #1 | DanScott | Coders. Asm / Hardware | 83 | 04 May 2020 22:31 |
Looking to join team/coder for competition | nobody | Coders. Contest | 2 | 16 October 2018 09:11 |
Anyone up for an ASM coding competition? | DanScott | Coders. Asm / Hardware | 526 | 22 September 2018 21:38 |
it's a sin | SquawkBox | Member Introductions | 2 | 17 February 2016 23:26 |
|
|