Coder Competition - Size Coding Sin Table Generator - Page 5

DanScott · 23 March 2021, 16:13

Quote:

Originally Posted by Jobbo

Do you have a fix for your accurate version? It's still in there despite the bug.

Go with my 58 byte version, 54 byte version needs more bytes adding to fix that overrun

Jobbo · 24 March 2021, 05:56

I came up with my own higher precision version in 40bytes. Not sure if anyone can spot some extra size optimizations, it's late here!

Code:

	moveq	#0,d0
	move.w	#512,d1
.loop:
	move.w	d1,d2
	mulu	d0,d2		// d2 = (512-x)*x

	move.l	d2,d3
	lsl.l	#8,d3
	lsl.l	#5,d3		// d3 = 8192*(512-x)*x

	lsr.l	#3,d2
	move.w	#40960,d4
	sub.w	d2,d4		// d4 = 40960-(512-x)*x/8

	divu	d4,d3		// d3 = (8192*(512-x)*x) / (40960-(512-x)*x/8)

	move.w	d3,(a0)+
	neg.w	d3
	move.w	d3,(1022,a0)

	addq	#1,d0
	subq	#1,d1
	bgt.s	.loop

a/b · 24 March 2021, 09:18

d0/d1 is essentially whan I'm doing with d0/a1 (but without a muls, my 26 byter was doing this with a muls). You can eliminate the multiplication by increasing counter1 by counter2's value and then decreasing counter2 by 2. The product is now counter1's value.
0*512 = 0, d = +511+2
1*511 = 511, d = +511
2*510 = 1020, d = +509
3*509 = 1527, d = +507
...
becomes
511+2 0
511 0+511
509 0+511+509=1020
507 0+511+509+507=1527
...

You still have to initialize and increment both, it only eliminates the multiplication (-2 bytes), but since you need the product in 2 registers (d2 and d3) you still need an additional move so it goes back to 40 bytes. However it's faster.

a/b · 26 March 2021, 17:17

Quote:

Originally Posted by Jobbo

I came up with my own higher precision version in 40bytes. Not sure if anyone can spot some extra size optimizations, it's late here!

I was experimenting with Bhaskara's algorithm today, and with a 64-bit div it could *almost* be done in 32 bytes (it still overflows at +/-pi). It works with rather large numbers (when you include 16384 amplitude and 1024 table size)

.
Divisor includes 5*pi^2, and with pi being 512 this is then 327680. Swap does no good (kills precision), numbers have to be further divided by >5, so 8. And then I remembered your code, 40960 and those crazy shifts.

So I've integrated 2 optimizations into your code and it's 38 bytes now:
- first opt is what I said in my previous post (replace multiplication with a1 increments), which then enables
- second opt is that you can replace 40960-a*b with a counter that starts at 40960 (actually 327680=5*65536 pre-shift) and then decrease it by the same amount you increase a*b (which is a1)

Code:

	moveq	#0,d0
	moveq	#5,d1
	swap	d1
	move.w	#511+2,a1
.loop	move.l	d0,d3
	move.l	d1,d2
	lsl.l	#8,d3
	lsl.l	#8-3,d3
	lsr.l	#3,d2
	divu.w	d2,d3
	move.w	d3,(a0)+
	neg.w	d3
	move.w	d3,(1022,a0)
	subq.l	#2,a1
	sub.l	a1,d1
	add.l	a1,d0
	bne.b	.loop

Jobbo · 26 March 2021, 20:09

Excellent!

Yeah, I started from Bhaskara for my version.

Funnily enough I had the same moveq #5 and swap trick in some version but could not get the rest reorganized.

The precision is still not as good as your last version.

DanScott · 26 March 2021, 21:59

Bhaskara approximation is great for it's time, but carries some degree of error

Jobbo · 26 March 2021, 22:18

My one is in the spreadsheet "Jobbo 2", it's totally respectable in terms of error.

ross · 02 April 2021, 18:28

I'm writing a very tiny intro and I need a circular 256 byte array of unsigned 2 quadrants sine values (to add it to a bitmap pointer).

I ended up with:

Code:

	;0->$ff->0 circular, len=256
	moveq	#0,d0
	movea.w	#(256*4)+(8/2)-1,a1
.l	subq.l	#8,a1
    	move.w  d0,-(sp)
	move.b	(sp)+,(a0)+
	add.l	a1,d0
	bgt.b	.l

16 bytes and it works pretty well!
This thread is full of excellent ideas and it's worth taking advantage of them

8bitbubsy · 03 April 2021, 12:05

Pushing a word to the stack, and popping only a byte back. You absolute madman!

I assume you backup the stack pointer before the loop and restore it afterwards?

Antiriad_UK · 03 April 2021, 12:28

Quote:

Originally Posted by 8bitbubsy

Pushing a word to the stack, and popping only a byte back. You absolute madman!

I assume you backup the stack pointer before the loop and restore it afterwards?

The stack automatically stays word aligned. You can fast left shift a byte by 8 by pushing a byte and popping a word too

8bitbubsy · 03 April 2021, 13:39

Quote:

Originally Posted by Antiriad_UK

The stack automatically stays word aligned. You can fast left shift a byte by 8 by pushing a byte and popping a word too

Wow, didn't know that! You learn something new every day.

Is this really faster than lsl.w #8,Dn though? Maybe on a 68000 system with fastmem?

a/b · 03 April 2021, 13:51

2*8 vs. 6+8*2 cycles.

8bitbubsy · 03 April 2021, 14:03

But it's slower on 68020+, right?

ross · 03 April 2021, 17:13

Right.
But you need one more register and one more instruction

---
The effect is this, just a couple of hundred bytes. Inspired by Alcatrax/Kefrens bars, massive blitter feedback effect and plane tricks.
Notice the 'Spectrum' color clash

. Of course I will use other colors and maybe other patterns as well..
50fps on A500 (full overscan), but all DMA per line used (and this is a problem, I will probably be forced to remove something..).

EDIT: removed, too many fainting have been reported

DanScott · 03 April 2021, 18:07

Are you trying to model what the inside of my brain looked like last night

ross · 03 April 2021, 18:40

Quote:

Originally Posted by DanScott

Are you trying to model what the inside of my brain looked like last night

Somewhere you have to take a cue

Better if I remove the image, I get a headache just looking at it

Thomas Richter · 03 April 2021, 18:42

Quote:

Originally Posted by Antiriad_UK

The stack automatically stays word aligned. You can fast left shift a byte by 8 by pushing a byte and popping a word too

That does not work in general. Pushing a byte will decrement the stack pointer by 2, but it will not clear the extra byte.

Rotschi · 20 May 2023, 17:03

I recall an antique sine calc routine by ray. I've never used it and I'm not so sure if it fits in here since it stores longwords, nor do I know anything about its accuracy, but here we go:

Code:

;==============================
;=
;=  32 bytes sinus table generator
;= for a 16.16 fixedpoint sinus table with 1024 entries
;=
;= by ray//.tSCc.      2001
;==============================

        section    text

        lea.l    sinus(pc),a0            
        move.w    #512-1,d0         

 .gen_loop
        move.w    d0,d1                  
        subi.w    #256,d1                
        muls.w    d1,d1                    
        subi.l    #$10000,d1            

        move.l    d1,512*4(a0)         

        sub.l    d1,(a0)+             

 
        dbra    d0,.gen_loop
;----
        section    bss
  sinus   ds.l    1024

phx · 20 May 2023, 20:46

Quote:

Originally Posted by Rotschi

nor do I know anything about its accuracy

Although no real sinus it is probably good enough for simple effects.

Quote:

Code:

        lea.l    sinus(pc),a0

I highly doubt this is Amiga code! Looks like Atari, or X68000, or some console?

Cyprian · 21 May 2023, 02:19

Quote:

Originally Posted by phx

I highly doubt this is Amiga code! Looks like Atari, or X68000, or some console?

yep:

Code:

ray//.tSCc.

anyway, what's wrong with "lea.l sinus(pc),a0"?

24 March 2021, 09:18	#83
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,039	d0/d1 is essentially whan I'm doing with d0/a1 (but without a muls, my 26 byter was doing this with a muls). You can eliminate the multiplication by increasing counter1 by counter2's value and then decreasing counter2 by 2. The product is now counter1's value. 0512 = 0, d = +511+2 1511 = 511, d = +511 2510 = 1020, d = +509 3509 = 1527, d = +507 ... becomes 511+2 0 511 0+511 509 0+511+509=1020 507 0+511+509+507=1527 ... You still have to initialize and increment both, it only eliminates the multiplication (-2 bytes), but since you need the product in 2 registers (d2 and d3) you still need an additional move so it goes back to 40 bytes. However it's faster. Last edited by a/b; 24 March 2021 at 09:23.

02 April 2021, 18:28	#88
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	I'm writing a very tiny intro and I need a circular 256 byte array of unsigned 2 quadrants sine values (to add it to a bitmap pointer). I ended up with: Code: ;0->$ff->0 circular, len=256 moveq #0,d0 movea.w #(256*4)+(8/2)-1,a1 .l subq.l #8,a1 move.w d0,-(sp) move.b (sp)+,(a0)+ add.l a1,d0 bgt.b .l 16 bytes and it works pretty well! This thread is full of excellent ideas and it's worth taking advantage of them

03 April 2021, 17:13	#94
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Right. But you need one more register and one more instruction --- The effect is this, just a couple of hundred bytes. Inspired by Alcatrax/Kefrens bars, massive blitter feedback effect and plane tricks. Notice the 'Spectrum' color clash . Of course I will use other colors and maybe other patterns as well.. 50fps on A500 (full overscan), but all DMA per line used (and this is a problem, I will probably be forced to remove something..). EDIT: removed, too many fainting have been reported Last edited by ross; 03 April 2021 at 18:41.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
horiz. size & vert. size greyed out in some configurations	honx	support.WinUAE	3	15 August 2020 21:14
Coding Competition #1	DanScott	Coders. Asm / Hardware	83	04 May 2020 22:31
Looking to join team/coder for competition	nobody	Coders. Contest	2	16 October 2018 09:11
Anyone up for an ASM coding competition?	DanScott	Coders. Asm / Hardware	526	22 September 2018 21:38
it's a sin	SquawkBox	Member Introductions	2	17 February 2016 23:26

26 March 2021, 20:09	#85
Jobbo Registered User Join Date: Jun 2020 Location: Druidia Posts: 386	Excellent! Yeah, I started from Bhaskara for my version. Funnily enough I had the same moveq #5 and swap trick in some version but could not get the rest reorganized. The precision is still not as good as your last version.

26 March 2021, 21:59	#86
DanScott Lemon. / Core Design Join Date: Mar 2016 Location: Tier 5 Posts: 1,209	Bhaskara approximation is great for it's time, but carries some degree of error

26 March 2021, 22:18	#87
Jobbo Registered User Join Date: Jun 2020 Location: Druidia Posts: 386	My one is in the spreadsheet "Jobbo 2", it's totally respectable in terms of error.

03 April 2021, 12:05	#89
8bitbubsy Registered User Join Date: Sep 2009 Location: Norway Posts: 1,709	Pushing a word to the stack, and popping only a byte back. You absolute madman! I assume you backup the stack pointer before the loop and restore it afterwards?

03 April 2021, 13:51	#92
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,039	28 vs. 6+82 cycles.

03 April 2021, 14:03	#93
8bitbubsy Registered User Join Date: Sep 2009 Location: Norway Posts: 1,709	But it's slower on 68020+, right?

03 April 2021, 18:07	#95
DanScott Lemon. / Core Design Join Date: Mar 2016 Location: Tier 5 Posts: 1,209	Are you trying to model what the inside of my brain looked like last night

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)