Any Sine Scroller Optimizations I've Missed?

Antiriad_UK · 09 April 2019, 11:24

Hi all,
I'm putting today a small demo. One of the features is a sine scroller and I'm looking for some more optimizations to try and get it faster.

Bit of background. I did some amiga demo coding in 1991/92 and I've decided that I wanted to put together a demo with some of my old routines "done proper" (much of my old source won't even compile, and looks awful compared to how I program now) and also some of the later stuff I never released. And also some things I never worked out how to do but want to try (glenz vectors I'm looking at you). The limits I've set myself for this are:
- Amiga A500 oldskool spec.
- Must run without glitch A1200/A4000/accelerated with real fast ram
- No self modified code
- No blitter nasty/Chipmem abuse with skipping of BlitWaits

So far in this demo I've got a sine scroller, wireframe/filled convex vectors, inconvex transforming vectors, bob snake.

The sine scroller looks as attached. The screen is 352 pixels wide and 178 high. There are two bitplanes active (for lame shadow effect). 16 colour logo at the top. Font is 28pixels high. Without music that is leaving me with about 10 scanlines of time free. Font size of 30 starts skipping frames. There are two modulated sine waves to give those interesting effects rather than straight sine.

Ideally I wanted 32pixels high, which I can achieve by ditching the overscan and logo. But I really don't like scrollers that aren't overscan

My basic routine is the same as one I did in 1990. Scrolling message in a back buffer is sliced into pixel wide masks and blitted to the screen buffer.

Current optimizations done:
- No screen clear, first blit of each word is a straight copy with some padding to "wipe" the screen.
- Sine values are premultiplied by the screen width (two sines complicates this slightly, and looking at screenshot I may have a rounding error here)
- Each word loop is unrolled (macro below)
- Loading blitter registers as fast as possible after blit wait
- Reduced memory access to bare minimum, all registers are filled. Some are using both high and low words to store data with swaps as needed.

Code:

; Do OR blit - used for the 3rd-16th blits of a word
BlitSine	MACRO
	move.l	a3,a6				;scr adr
	swap	d6				;sine1 offset to loword
	add.w 	(a0,d6.w),a6		        ;add sine 1 (premult)
	swap	d6				;sine2 offset to loword
	add.w	(a0,d6.w),a6		        ;add sine 2 (premult)
	add.l	d3,d6				;Increase sine1/sine2 offsets and mask to 0-2047 (1024 sine entries, in words)
	and.l	d5,d6				;And with SineScroller_Sine_Offset_Mask (two offsets anded at once)

	lea	BLTBPTH(a5),a1			;for quick blitter loading

	move.w	d7,DMACON(a5)			;Blitter nasty on
.bw\@	btst.b	d1,DMACONR(a5)			;Blitwait
	bne.s	.bw\@
	move.w	d4,DMACON(a5)			;Blitter nasty off

	; write blt registers as fast as possible
	move.w	d2,(a2)				;BLTAFWM($44)
	move.l	a6,(a1)+			;BLTBPTH - $4c
	move.l	a4,(a1)+			;BLTAPTH - $50
	move.l	a6,(a1)+			;BLTDPTH - $54
	move.w	d0,(a1)				;BLTSIZE - $58

	ror.w 	#1,d2				;rot mask
	ENDM

I saw on Photon's site he made this statement:

Quote:

These optimizations together allow a full-screen, 25px high sine scroller, and there is rastertime to spare.
Note that to show the principle and general optimization techniques, I've chosen the second fastest sine scroller technique. Know that there is also the C2P-shift type sine scroller - and that the source here can be adapted to mask several bit-slices to single blits, too, to optimize it substantially. Try it!

I'm not familiar with the C2P method? What is that? I'd guess it's a complete rewrite so I doubt I'd look at that - but interested in the idea.

Also I can't see the meaning in the "mask several bit-slices to single blits" - I'm still not thinking in "insane optimizations mode" yet

Apart from that, any other tricks?

meeku · 09 April 2019, 12:09

Screenshot looks nice, I see what you mean with the precision issue. You could try using a small amount of fixed point for the offset with rounding to get rid of the odd pixel gaps.
What about using the cpu and blitter combined? Maybe 8 pixels with cpu and 24 with blitter assuming 68000 cpu (I've not personally done a sine scroll on Amiga.. only on pc back in the day)?

zero · 10 April 2019, 10:35

You are blitting the text one one-pixel column at a time, is that what you are saying?

The fastest way to do this is usually to plot the start and end pixels of each vertical span with the CPU, and then use the blitter to do a per-column vertical fill in one pass. Doing that I was able to do unlimited height text (whole screen filled).

Antiriad_UK · 10 April 2019, 11:11

Yes I'm essentially blit copying 352 columns of 1pixel.

Lol, I must admit I didn't even realise the blitter could do vertical fills. Have you got a screenshot of what that looked like to put it into perspective?

DanScott · 10 April 2019, 13:21

Have a look at Digital Innovation demo, I think I was one of the first to do the blitter vertical fill method.. albeit rather badly coded back then

Vertical fill is really useful for a lot of techniques... bitmap scaling etc..

You could also take a look at Blast From The Past.. 1 pixel zooming 2 bitplane sinescroll I did.

Innerloop was this:

Code:

.BlitLoop
		rept	16
		move.w	(a0)+,d1
		move.w	(a0)+,d0
		lea		(a1,d0.w),a3
		add.w	(a2,d1.w),a3
		BlitWait_Inline
		move.w	(a0)+,(a4)			;BltCon0
		move.w	(a0)+,BLTAFWM(a6)
		move.l	a3,BLTAPTH(a6)
		move.l	d5,BLTBPTH(a6)
		move.l	d5,BLTDPTH(a6)
		move.w	d6,(a5)				;BltSize
		endr
		
		addq.l	#2,d5
		dbf		d7,.BlitLoop

ross · 10 April 2019, 15:23

When you write wrong code and create interesting effect

Antiriad_UK · 11 April 2019, 00:00

Quote:

Originally Posted by ross

when you write wrong code and create interesting effect :d

:d

Antiriad_UK · 11 April 2019, 00:10

Quote:

Originally Posted by DanScott

Have a look at Digital Innovation demo, I think I was one of the first to do the blitter vertical fill method.. albeit rather badly coded back then

Vertical fill is really useful for a lot of techniques... bitmap scaling etc..

You could also take a look at Blast From The Past.. 1 pixel zooming 2 bitplane sinescroll I did.

Innerloop was this:

Code:

.BlitLoop
		rept	16
		move.w	(a0)+,d1
		move.w	(a0)+,d0
		lea		(a1,d0.w),a3
		add.w	(a2,d1.w),a3
		BlitWait_Inline
		move.w	(a0)+,(a4)			;BltCon0
		move.w	(a0)+,BLTAFWM(a6)
		move.l	a3,BLTAPTH(a6)
		move.l	d5,BLTBPTH(a6)
		move.l	d5,BLTDPTH(a6)
		move.w	d6,(a5)				;BltSize
		endr
		
		addq.l	#2,d5
		dbf		d7,.BlitLoop

Shit, you did that? Awesome. I remember seeing that scroller and having no idea wtf was going as my sine scroller was 15px at the time. 1990/91 was my fave time for getting demos in the post an being baffled.

I actually rewatched that a few weeks ago when I was getting back into this and archiving my amiga disks and still not understanding what was going on. I thought it looked like some vector letters with maybe some sine tricks in there. But I just re-watched it after what you said and noticed the letter B looked like a bitmap font. So I've still no idea what is going on there. It's like a copy and stretch but with a 1px sine in there.

I just got a glenz vector running, so I can cross that off my bucket list

zero · 11 April 2019, 11:06

My favourite example of this technique is the Crystal intro with the bunny at the start and the kind of water effect.

I can't remember exactly how you set the blitter up for it now, from memory you have first source as line 0, second source as line 1, and destination line 1, and XOR function.

Essentially it just keeps copying the same word from one line to the next, xoring with the next line so that it "picks up" bits and smears them downwards.

ross · 11 April 2019, 19:43

Sininsanity:

No split, no modulotrick, no charanim, pure blitter craziness.

(and some magic

)

Galahad/FLT · 11 April 2019, 20:08

Quote:

Originally Posted by ross

Sininsanity:

No split, no modulotrick, no charanim, pure blitter craziness.

(and some magic

)

Sinecredible!

ross · 11 April 2019, 20:50

Quote:

Originally Posted by Galahad/FLT

Sinecredible!

Hi Sir Galahad!
Actually my post is a bit off-topic because it lack the -scroll part

.

Anyway only my laziness prevented from using different tiles (16x*ny resolution supported).
And there are some glitches.. (they are well hidden

).

One day maybe a real usable routine will be born.

ross · 11 April 2019, 21:17

Part of the wild loop:

Code:

.w1	btst	d3,(a6)
	bne.b	.w1
.b0	move.l	a5,bltdpt-2(a6)
	move.w	#$0100,bltcon0-2(a6)
	move.w	d1,bltsize-2(a6)

	lea	giro/8*1(a0),a5	
	move.w	d2,bltamod-2(a6)
	move.w	d2,bltbmod-2(a6)
	move.w	(a3)+,bltcdat-2(a6)	; mask
	move.l	a0,bltapt-2(a6)		; from big shift buffer
	move.l	a5,bltbpt-2(a6)
.w2	btst	d3,(a6)
	bne.b	.w2
	move.w	#$0DD8,bltcon0-2(a6)
	move.w	#(h-1)<<6|1,bltsize-2(a6)

.b1	move.w	(a3)+,d1
	beq.b	.nx
	lea	(a4),a5
	lea	-128/8*2(a5),a4
.w3	btst	d3,(a6)
	bne.b	.w3
	move.w	d3,bltamod-2(a6)
	move.w	d3,bltbmod-2(a6)
	move.w	d1,bltcdat-2(a6)
	move.l	a4,bltapt-2(a6)
	move.l	a5,bltbpt-2(a6)
	move.l	a4,bltdpt-2(a6)
	move.w	#(h+1)<<6|1,bltsize-2(a6)

Ugh, i'm writing to blitter registers when previous operation is running

No destination update

Minterm $D8

Antiriad_UK · 12 April 2019, 09:20

Quote:

Originally Posted by ross

Sininsanity:

No split, no modulotrick, no charanim, pure blitter craziness.

(and some magic

)

What the hell!

12 April 2019, 12:02

One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers.

My sine scroll inner loop use only 6 instructions for each column :

move.w (a0)+,d3
lea (a1,d3.w),a4
move.l a4,a2
movem.l a2-a4,(a5)
move.w #1<<4,$44(a6)
move.w d0,$58(a6)

Exemple here :
https://github.com/se-bo/amiga/blob/...llerscroll_2.s

Antiriad_UK · 13 April 2019, 09:26

Quote:

Originally Posted by Rmt

One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers.

Cheers. Although I'm targeting an old school A500 spec I want it to run on an A1200 with real fast mem ok (yeah I know this is a self imposed limitation

). When I enabled blit nasty and removed the blitwait it was glitching on an A1200 (emulated). I read some other posts on here that said that there may be a free DMA cycle even with blit nasty on. I'll recheck.

ross · 13 April 2019, 10:29

Quote:

Originally Posted by Rmt

One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers.

Even if this can works on an A500 you are calling for problem on big machines (there are different ways to respect the end of blitter operations,
in any case you have to do it and think that your code must work everywhere).
Anyway these are not the most important optimizations, only instrumental for the 'big picture' and if you need to squeeze some cycles

.
The right use of BLTPRI can be really useful to 'overlay' jobs between blitter and CPU, so isn't a on/off rules.

Quote:

Originally Posted by Antiriad_UK

I read some other posts on here that said that there may be a free DMA cycle even with blit nasty on. I'll recheck.

Yep, see blitter cycles on HRM.
Sometimes I have also changed the channels used even if I had to do the same operation because some combinations leave free CPU/Chip access cycles.

hooverphonique · 13 April 2019, 16:38

Quote:

Originally Posted by Antiriad_UK

Cheers. Although I'm targeting an old school A500 spec I want it to run on an A1200 with real fast mem ok (yeah I know this is a self imposed limitation

). When I enabled blit nasty and removed the blitwait it was glitching on an A1200 (emulated). I read some other posts on here that said that there may be a free DMA cycle even with blit nasty on. I'll recheck.

Nasty has no effect on code running from real fastmem.

Photon · 14 April 2019, 00:18

Quote:

Originally Posted by Antiriad_UK

I saw on Photon's site he made this statement:

Quote:

Originally Posted by Photon

These optimizations together allow a full-screen, 25px high sine scroller, and there is rastertime to spare.
Note that to show the principle and general optimization techniques, I've chosen the second fastest sine scroller technique. Know that there is also the C2P-shift type sine scroller - and that the source here can be adapted to mask several bit-slices to single blits, too, to optimize it substantially. Try it!

I'm not familiar with the C2P method? What is that? I'd guess it's a complete rewrite so I doubt I'd look at that - but interested in the idea.

Also I can't see the meaning in the "mask several bit-slices to single blits" - I'm still not thinking in "insane optimizations mode" yet

This was for a less steep learning curve, but I still wanted to make clear there was a more optimal technique. Sine is a vertical shift or skew operation; the shift-by-mask part of a C2P.

The article is written to allow easy modification of the source to combine mask %1 to %11 if two slices are the same Y. Or %10001 if the slices are apart, but same Y ("curve dip"). And so on.

Also, the article series selected effects from a certain year.

In 1988, sine scrollers started out with reduced wave resolution, e.g. slice width 2 or 4 to save time (and maybe put something else on screen), but 1px was soon reached. To demonstrate 1px, the curve must slant >=45 degrees at some point on the curve. Once you reach that, you can maximize screen height to demonstrate that you don't clear the whole screen I guess, but... the real measure remains the size of the scrollbuffer.

To go beyond ~320x25 you must combine bit-slices. This will increase the size a little to say ~320x64, but will not come near this faster technique giving 336x258 in Blu Sky.

ross · 14 April 2019, 01:03

Quote:

Originally Posted by Photon

but will not come near this faster technique giving 336x258..

A scroll of this size (seeing from the video it looks like a ~4 blits / 16px, mine is ~3 because of softer slope)
requires relatively less buffer cleaning than a multistring one.
When you have to combine several scrolls (also superimposed) then you really have to do dirty tricks..

10 April 2019, 13:21	#5
DanScott Lemon. / Core Design Join Date: Mar 2016 Location: Tier 5 Posts: 1,212	Have a look at Digital Innovation demo, I think I was one of the first to do the blitter vertical fill method.. albeit rather badly coded back then Vertical fill is really useful for a lot of techniques... bitmap scaling etc.. You could also take a look at Blast From The Past.. 1 pixel zooming 2 bitplane sinescroll I did. Innerloop was this: Code: .BlitLoop rept 16 move.w (a0)+,d1 move.w (a0)+,d0 lea (a1,d0.w),a3 add.w (a2,d1.w),a3 BlitWait_Inline move.w (a0)+,(a4) ;BltCon0 move.w (a0)+,BLTAFWM(a6) move.l a3,BLTAPTH(a6) move.l d5,BLTBPTH(a6) move.l d5,BLTDPTH(a6) move.w d6,(a5) ;BltSize endr addq.l #2,d5 dbf d7,.BlitLoop

10 April 2019, 15:23	#6
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,475	When you write wrong code and create interesting effect Attached Thumbnails

11 April 2019, 19:43	#10
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,475	Sininsanity: No split, no modulotrick, no charanim, pure blitter craziness. (and some magic ) Attached Thumbnails

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Sine scroller	pmc	Coders. Tutorials	95	02 July 2017 16:40
Help needed with first sine text scroller	nandius_c	Coders. Asm / Hardware	12	23 June 2015 00:02
[vasmm68k] what are these optimizations ?	pixel	Coders. Asm / Hardware	4	23 May 2014 19:31
Sine scroller - dycp	blazeb	Coders. Asm / Hardware	6	02 May 2012 10:08
Flickering sine scroller	pmc	Coders. Tutorials	4	24 June 2009 09:19

09 April 2019, 12:09	#2
meeku Registered User Join Date: Apr 2019 Location: Kings Lynn Posts: 17	Screenshot looks nice, I see what you mean with the precision issue. You could try using a small amount of fixed point for the offset with rounding to get rid of the odd pixel gaps. What about using the cpu and blitter combined? Maybe 8 pixels with cpu and 24 with blitter assuming 68000 cpu (I've not personally done a sine scroll on Amiga.. only on pc back in the day)?

10 April 2019, 10:35	#3
zero Registered User Join Date: Jun 2016 Location: UK Posts: 428	You are blitting the text one one-pixel column at a time, is that what you are saying? The fastest way to do this is usually to plot the start and end pixels of each vertical span with the CPU, and then use the blitter to do a per-column vertical fill in one pass. Doing that I was able to do unlimited height text (whole screen filled).

10 April 2019, 11:11	#4
Antiriad_UK OCS forever! Join Date: Mar 2019 Location: Birmingham, UK Posts: 418	Yes I'm essentially blit copying 352 columns of 1pixel. Lol, I must admit I didn't even realise the blitter could do vertical fills. Have you got a screenshot of what that looked like to put it into perspective?

11 April 2019, 11:06	#9
zero Registered User Join Date: Jun 2016 Location: UK Posts: 428	My favourite example of this technique is the Crystal intro with the bunny at the start and the kind of water effect. I can't remember exactly how you set the blitter up for it now, from memory you have first source as line 0, second source as line 1, and destination line 1, and XOR function. Essentially it just keeps copying the same word from one line to the next, xoring with the next line so that it "picks up" bits and smears them downwards.

12 April 2019, 12:02	#15
Rmt Posts: n/a	One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers. My sine scroll inner loop use only 6 instructions for each column : move.w (a0)+,d3 lea (a1,d3.w),a4 move.l a4,a2 movem.l a2-a4,(a5) move.w #1<<4,$44(a6) move.w d0,$58(a6) Exemple here : https://github.com/se-bo/amiga/blob/...llerscroll_2.s

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)