English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 09 April 2019, 11:24   #1
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Any Sine Scroller Optimizations I've Missed?

Hi all,
I'm putting today a small demo. One of the features is a sine scroller and I'm looking for some more optimizations to try and get it faster.

Bit of background. I did some amiga demo coding in 1991/92 and I've decided that I wanted to put together a demo with some of my old routines "done proper" (much of my old source won't even compile, and looks awful compared to how I program now) and also some of the later stuff I never released. And also some things I never worked out how to do but want to try (glenz vectors I'm looking at you). The limits I've set myself for this are:
- Amiga A500 oldskool spec.
- Must run without glitch A1200/A4000/accelerated with real fast ram
- No self modified code
- No blitter nasty/Chipmem abuse with skipping of BlitWaits

So far in this demo I've got a sine scroller, wireframe/filled convex vectors, inconvex transforming vectors, bob snake.

The sine scroller looks as attached. The screen is 352 pixels wide and 178 high. There are two bitplanes active (for lame shadow effect). 16 colour logo at the top. Font is 28pixels high. Without music that is leaving me with about 10 scanlines of time free. Font size of 30 starts skipping frames. There are two modulated sine waves to give those interesting effects rather than straight sine.

Ideally I wanted 32pixels high, which I can achieve by ditching the overscan and logo. But I really don't like scrollers that aren't overscan

My basic routine is the same as one I did in 1990. Scrolling message in a back buffer is sliced into pixel wide masks and blitted to the screen buffer.

Current optimizations done:
- No screen clear, first blit of each word is a straight copy with some padding to "wipe" the screen.
- Sine values are premultiplied by the screen width (two sines complicates this slightly, and looking at screenshot I may have a rounding error here)
- Each word loop is unrolled (macro below)
- Loading blitter registers as fast as possible after blit wait
- Reduced memory access to bare minimum, all registers are filled. Some are using both high and low words to store data with swaps as needed.

Code:
; Do OR blit - used for the 3rd-16th blits of a word
BlitSine	MACRO
	move.l	a3,a6				;scr adr
	swap	d6				;sine1 offset to loword
	add.w 	(a0,d6.w),a6		        ;add sine 1 (premult)
	swap	d6				;sine2 offset to loword
	add.w	(a0,d6.w),a6		        ;add sine 2 (premult)
	add.l	d3,d6				;Increase sine1/sine2 offsets and mask to 0-2047 (1024 sine entries, in words)
	and.l	d5,d6				;And with SineScroller_Sine_Offset_Mask (two offsets anded at once)

	lea	BLTBPTH(a5),a1			;for quick blitter loading

	move.w	d7,DMACON(a5)			;Blitter nasty on
.bw\@	btst.b	d1,DMACONR(a5)			;Blitwait
	bne.s	.bw\@
	move.w	d4,DMACON(a5)			;Blitter nasty off

	; write blt registers as fast as possible
	move.w	d2,(a2)				;BLTAFWM($44)
	move.l	a6,(a1)+			;BLTBPTH - $4c
	move.l	a4,(a1)+			;BLTAPTH - $50
	move.l	a6,(a1)+			;BLTDPTH - $54
	move.w	d0,(a1)				;BLTSIZE - $58

	ror.w 	#1,d2				;rot mask
	ENDM
I saw on Photon's site he made this statement:
Quote:
These optimizations together allow a full-screen, 25px high sine scroller, and there is rastertime to spare.
Note that to show the principle and general optimization techniques, I've chosen the second fastest sine scroller technique. Know that there is also the C2P-shift type sine scroller - and that the source here can be adapted to mask several bit-slices to single blits, too, to optimize it substantially. Try it!
I'm not familiar with the C2P method? What is that? I'd guess it's a complete rewrite so I doubt I'd look at that - but interested in the idea.

Also I can't see the meaning in the "mask several bit-slices to single blits" - I'm still not thinking in "insane optimizations mode" yet

Apart from that, any other tricks?
Attached Thumbnails
Click image for larger version

Name:	28pxrasterlines.png
Views:	301
Size:	12.4 KB
ID:	62736  
Antiriad_UK is offline  
Old 09 April 2019, 12:09   #2
meeku
Registered User
 
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
Screenshot looks nice, I see what you mean with the precision issue. You could try using a small amount of fixed point for the offset with rounding to get rid of the odd pixel gaps.
What about using the cpu and blitter combined? Maybe 8 pixels with cpu and 24 with blitter assuming 68000 cpu (I've not personally done a sine scroll on Amiga.. only on pc back in the day)?
meeku is offline  
Old 10 April 2019, 10:35   #3
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
You are blitting the text one one-pixel column at a time, is that what you are saying?

The fastest way to do this is usually to plot the start and end pixels of each vertical span with the CPU, and then use the blitter to do a per-column vertical fill in one pass. Doing that I was able to do unlimited height text (whole screen filled).
zero is offline  
Old 10 April 2019, 11:11   #4
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Yes I'm essentially blit copying 352 columns of 1pixel.

Lol, I must admit I didn't even realise the blitter could do vertical fills. Have you got a screenshot of what that looked like to put it into perspective?
Antiriad_UK is offline  
Old 10 April 2019, 13:21   #5
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
Have a look at Digital Innovation demo, I think I was one of the first to do the blitter vertical fill method.. albeit rather badly coded back then

Vertical fill is really useful for a lot of techniques... bitmap scaling etc..

You could also take a look at Blast From The Past.. 1 pixel zooming 2 bitplane sinescroll I did.

Innerloop was this:

Code:
.BlitLoop
		rept	16
		move.w	(a0)+,d1
		move.w	(a0)+,d0
		lea		(a1,d0.w),a3
		add.w	(a2,d1.w),a3
		BlitWait_Inline
		move.w	(a0)+,(a4)			;BltCon0
		move.w	(a0)+,BLTAFWM(a6)
		move.l	a3,BLTAPTH(a6)
		move.l	d5,BLTBPTH(a6)
		move.l	d5,BLTDPTH(a6)
		move.w	d6,(a5)				;BltSize
		endr
		
		addq.l	#2,d5
		dbf		d7,.BlitLoop
DanScott is offline  
Old 10 April 2019, 15:23   #6
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
When you write wrong code and create interesting effect

Attached Thumbnails
Click image for larger version

Name:	output.gif
Views:	954
Size:	108.0 KB
ID:	62747  
ross is offline  
Old 11 April 2019, 00:00   #7
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Quote:
Originally Posted by ross View Post
when you write wrong code and create interesting effect :d

:d
Antiriad_UK is offline  
Old 11 April 2019, 00:10   #8
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Quote:
Originally Posted by DanScott View Post
Have a look at Digital Innovation demo, I think I was one of the first to do the blitter vertical fill method.. albeit rather badly coded back then

Vertical fill is really useful for a lot of techniques... bitmap scaling etc..

You could also take a look at Blast From The Past.. 1 pixel zooming 2 bitplane sinescroll I did.

Innerloop was this:

Code:
.BlitLoop
		rept	16
		move.w	(a0)+,d1
		move.w	(a0)+,d0
		lea		(a1,d0.w),a3
		add.w	(a2,d1.w),a3
		BlitWait_Inline
		move.w	(a0)+,(a4)			;BltCon0
		move.w	(a0)+,BLTAFWM(a6)
		move.l	a3,BLTAPTH(a6)
		move.l	d5,BLTBPTH(a6)
		move.l	d5,BLTDPTH(a6)
		move.w	d6,(a5)				;BltSize
		endr
		
		addq.l	#2,d5
		dbf		d7,.BlitLoop
Shit, you did that? Awesome. I remember seeing that scroller and having no idea wtf was going as my sine scroller was 15px at the time. 1990/91 was my fave time for getting demos in the post an being baffled.

I actually rewatched that a few weeks ago when I was getting back into this and archiving my amiga disks and still not understanding what was going on. I thought it looked like some vector letters with maybe some sine tricks in there. But I just re-watched it after what you said and noticed the letter B looked like a bitmap font. So I've still no idea what is going on there. It's like a copy and stretch but with a 1px sine in there.

I just got a glenz vector running, so I can cross that off my bucket list
Antiriad_UK is offline  
Old 11 April 2019, 11:06   #9
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
My favourite example of this technique is the Crystal intro with the bunny at the start and the kind of water effect.

I can't remember exactly how you set the blitter up for it now, from memory you have first source as line 0, second source as line 1, and destination line 1, and XOR function.

Essentially it just keeps copying the same word from one line to the next, xoring with the next line so that it "picks up" bits and smears them downwards.
zero is offline  
Old 11 April 2019, 19:43   #10
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
Sininsanity:


No split, no modulotrick, no charanim, pure blitter craziness.

(and some magic )
Attached Thumbnails
Click image for larger version

Name:	output2.gif
Views:	928
Size:	901.5 KB
ID:	62772  
ross is offline  
Old 11 April 2019, 20:08   #11
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,997
Quote:
Originally Posted by ross View Post
Sininsanity:


No split, no modulotrick, no charanim, pure blitter craziness.

(and some magic )
Sinecredible!
Galahad/FLT is offline  
Old 11 April 2019, 20:50   #12
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
Quote:
Originally Posted by Galahad/FLT View Post
Sinecredible!
Hi Sir Galahad!
Actually my post is a bit off-topic because it lack the -scroll part .

Anyway only my laziness prevented from using different tiles (16x*ny resolution supported).
And there are some glitches.. (they are well hidden ).

One day maybe a real usable routine will be born.
ross is offline  
Old 11 April 2019, 21:17   #13
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
Part of the wild loop:

Code:
.w1	btst	d3,(a6)
	bne.b	.w1
.b0	move.l	a5,bltdpt-2(a6)
	move.w	#$0100,bltcon0-2(a6)
	move.w	d1,bltsize-2(a6)

	lea	giro/8*1(a0),a5	
	move.w	d2,bltamod-2(a6)
	move.w	d2,bltbmod-2(a6)
	move.w	(a3)+,bltcdat-2(a6)	; mask
	move.l	a0,bltapt-2(a6)		; from big shift buffer
	move.l	a5,bltbpt-2(a6)
.w2	btst	d3,(a6)
	bne.b	.w2
	move.w	#$0DD8,bltcon0-2(a6)
	move.w	#(h-1)<<6|1,bltsize-2(a6)

.b1	move.w	(a3)+,d1
	beq.b	.nx
	lea	(a4),a5
	lea	-128/8*2(a5),a4
.w3	btst	d3,(a6)
	bne.b	.w3
	move.w	d3,bltamod-2(a6)
	move.w	d3,bltbmod-2(a6)
	move.w	d1,bltcdat-2(a6)
	move.l	a4,bltapt-2(a6)
	move.l	a5,bltbpt-2(a6)
	move.l	a4,bltdpt-2(a6)
	move.w	#(h+1)<<6|1,bltsize-2(a6)
Ugh, i'm writing to blitter registers when previous operation is running
No destination update
Minterm $D8

ross is offline  
Old 12 April 2019, 09:20   #14
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Quote:
Originally Posted by ross View Post
Sininsanity:


No split, no modulotrick, no charanim, pure blitter craziness.

(and some magic )
What the hell!
Antiriad_UK is offline  
Old 12 April 2019, 12:02   #15
Rmt
 
Posts: n/a
One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers.

My sine scroll inner loop use only 6 instructions for each column :

move.w (a0)+,d3
lea (a1,d3.w),a4
move.l a4,a2
movem.l a2-a4,(a5)
move.w #1<<4,$44(a6)
move.w d0,$58(a6)

Exemple here :
https://github.com/se-bo/amiga/blob/...llerscroll_2.s
 
Old 13 April 2019, 09:26   #16
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Quote:
Originally Posted by Rmt View Post
One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers.
Cheers. Although I'm targeting an old school A500 spec I want it to run on an A1200 with real fast mem ok (yeah I know this is a self imposed limitation ). When I enabled blit nasty and removed the blitwait it was glitching on an A1200 (emulated). I read some other posts on here that said that there may be a free DMA cycle even with blit nasty on. I'll recheck.
Antiriad_UK is offline  
Old 13 April 2019, 10:29   #17
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
Quote:
Originally Posted by Rmt View Post
One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers.
Even if this can works on an A500 you are calling for problem on big machines (there are different ways to respect the end of blitter operations,
in any case you have to do it and think that your code must work everywhere).
Anyway these are not the most important optimizations, only instrumental for the 'big picture' and if you need to squeeze some cycles .
The right use of BLTPRI can be really useful to 'overlay' jobs between blitter and CPU, so isn't a on/off rules.

Quote:
Originally Posted by Antiriad_UK View Post
I read some other posts on here that said that there may be a free DMA cycle even with blit nasty on. I'll recheck.
Yep, see blitter cycles on HRM.
Sometimes I have also changed the channels used even if I had to do the same operation because some combinations leave free CPU/Chip access cycles.
ross is offline  
Old 13 April 2019, 16:38   #18
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
Quote:
Originally Posted by Antiriad_UK View Post
Cheers. Although I'm targeting an old school A500 spec I want it to run on an A1200 with real fast mem ok (yeah I know this is a self imposed limitation ). When I enabled blit nasty and removed the blitwait it was glitching on an A1200 (emulated). I read some other posts on here that said that there may be a free DMA cycle even with blit nasty on. I'll recheck.
Nasty has no effect on code running from real fastmem.
hooverphonique is offline  
Old 14 April 2019, 00:18   #19
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,604
Quote:
Originally Posted by Antiriad_UK
I saw on Photon's site he made this statement:
Quote:
Originally Posted by Photon
These optimizations together allow a full-screen, 25px high sine scroller, and there is rastertime to spare.
Note that to show the principle and general optimization techniques, I've chosen the second fastest sine scroller technique. Know that there is also the C2P-shift type sine scroller - and that the source here can be adapted to mask several bit-slices to single blits, too, to optimize it substantially. Try it!
I'm not familiar with the C2P method? What is that? I'd guess it's a complete rewrite so I doubt I'd look at that - but interested in the idea.

Also I can't see the meaning in the "mask several bit-slices to single blits" - I'm still not thinking in "insane optimizations mode" yet
This was for a less steep learning curve, but I still wanted to make clear there was a more optimal technique. Sine is a vertical shift or skew operation; the shift-by-mask part of a C2P.

The article is written to allow easy modification of the source to combine mask %1 to %11 if two slices are the same Y. Or %10001 if the slices are apart, but same Y ("curve dip"). And so on.

Also, the article series selected effects from a certain year.

In 1988, sine scrollers started out with reduced wave resolution, e.g. slice width 2 or 4 to save time (and maybe put something else on screen), but 1px was soon reached. To demonstrate 1px, the curve must slant >=45 degrees at some point on the curve. Once you reach that, you can maximize screen height to demonstrate that you don't clear the whole screen I guess, but... the real measure remains the size of the scrollbuffer.

To go beyond ~320x25 you must combine bit-slices. This will increase the size a little to say ~320x64, but will not come near this faster technique giving 336x258 in Blu Sky.

Photon is offline  
Old 14 April 2019, 01:03   #20
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
Quote:
Originally Posted by Photon View Post
but will not come near this faster technique giving 336x258..


A scroll of this size (seeing from the video it looks like a ~4 blits / 16px, mine is ~3 because of softer slope)
requires relatively less buffer cleaning than a multistring one.
When you have to combine several scrolls (also superimposed) then you really have to do dirty tricks..

Last edited by ross; 14 April 2019 at 01:59. Reason: little rephrasing
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sine scroller pmc Coders. Tutorials 95 02 July 2017 16:40
Help needed with first sine text scroller nandius_c Coders. Asm / Hardware 12 23 June 2015 00:02
[vasmm68k] what are these optimizations ? pixel Coders. Asm / Hardware 4 23 May 2014 19:31
Sine scroller - dycp blazeb Coders. Asm / Hardware 6 02 May 2012 10:08
Flickering sine scroller pmc Coders. Tutorials 4 24 June 2009 09:19

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 01:35.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11959 seconds with 16 queries