09 April 2019, 11:24 | #1 | |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Any Sine Scroller Optimizations I've Missed?
Hi all,
I'm putting today a small demo. One of the features is a sine scroller and I'm looking for some more optimizations to try and get it faster. Bit of background. I did some amiga demo coding in 1991/92 and I've decided that I wanted to put together a demo with some of my old routines "done proper" (much of my old source won't even compile, and looks awful compared to how I program now) and also some of the later stuff I never released. And also some things I never worked out how to do but want to try (glenz vectors I'm looking at you). The limits I've set myself for this are: - Amiga A500 oldskool spec. - Must run without glitch A1200/A4000/accelerated with real fast ram - No self modified code - No blitter nasty/Chipmem abuse with skipping of BlitWaits So far in this demo I've got a sine scroller, wireframe/filled convex vectors, inconvex transforming vectors, bob snake. The sine scroller looks as attached. The screen is 352 pixels wide and 178 high. There are two bitplanes active (for lame shadow effect). 16 colour logo at the top. Font is 28pixels high. Without music that is leaving me with about 10 scanlines of time free. Font size of 30 starts skipping frames. There are two modulated sine waves to give those interesting effects rather than straight sine. Ideally I wanted 32pixels high, which I can achieve by ditching the overscan and logo. But I really don't like scrollers that aren't overscan My basic routine is the same as one I did in 1990. Scrolling message in a back buffer is sliced into pixel wide masks and blitted to the screen buffer. Current optimizations done: - No screen clear, first blit of each word is a straight copy with some padding to "wipe" the screen. - Sine values are premultiplied by the screen width (two sines complicates this slightly, and looking at screenshot I may have a rounding error here) - Each word loop is unrolled (macro below) - Loading blitter registers as fast as possible after blit wait - Reduced memory access to bare minimum, all registers are filled. Some are using both high and low words to store data with swaps as needed. Code:
; Do OR blit - used for the 3rd-16th blits of a word BlitSine MACRO move.l a3,a6 ;scr adr swap d6 ;sine1 offset to loword add.w (a0,d6.w),a6 ;add sine 1 (premult) swap d6 ;sine2 offset to loword add.w (a0,d6.w),a6 ;add sine 2 (premult) add.l d3,d6 ;Increase sine1/sine2 offsets and mask to 0-2047 (1024 sine entries, in words) and.l d5,d6 ;And with SineScroller_Sine_Offset_Mask (two offsets anded at once) lea BLTBPTH(a5),a1 ;for quick blitter loading move.w d7,DMACON(a5) ;Blitter nasty on .bw\@ btst.b d1,DMACONR(a5) ;Blitwait bne.s .bw\@ move.w d4,DMACON(a5) ;Blitter nasty off ; write blt registers as fast as possible move.w d2,(a2) ;BLTAFWM($44) move.l a6,(a1)+ ;BLTBPTH - $4c move.l a4,(a1)+ ;BLTAPTH - $50 move.l a6,(a1)+ ;BLTDPTH - $54 move.w d0,(a1) ;BLTSIZE - $58 ror.w #1,d2 ;rot mask ENDM Quote:
Also I can't see the meaning in the "mask several bit-slices to single blits" - I'm still not thinking in "insane optimizations mode" yet Apart from that, any other tricks? |
|
09 April 2019, 12:09 | #2 |
Registered User
Join Date: Apr 2019
Location: Kings Lynn
Posts: 17
|
Screenshot looks nice, I see what you mean with the precision issue. You could try using a small amount of fixed point for the offset with rounding to get rid of the odd pixel gaps.
What about using the cpu and blitter combined? Maybe 8 pixels with cpu and 24 with blitter assuming 68000 cpu (I've not personally done a sine scroll on Amiga.. only on pc back in the day)? |
10 April 2019, 10:35 | #3 |
Registered User
Join Date: Jun 2016
Location: UK
Posts: 428
|
You are blitting the text one one-pixel column at a time, is that what you are saying?
The fastest way to do this is usually to plot the start and end pixels of each vertical span with the CPU, and then use the blitter to do a per-column vertical fill in one pass. Doing that I was able to do unlimited height text (whole screen filled). |
10 April 2019, 11:11 | #4 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Yes I'm essentially blit copying 352 columns of 1pixel.
Lol, I must admit I didn't even realise the blitter could do vertical fills. Have you got a screenshot of what that looked like to put it into perspective? |
10 April 2019, 13:21 | #5 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
Have a look at Digital Innovation demo, I think I was one of the first to do the blitter vertical fill method.. albeit rather badly coded back then
Vertical fill is really useful for a lot of techniques... bitmap scaling etc.. You could also take a look at Blast From The Past.. 1 pixel zooming 2 bitplane sinescroll I did. Innerloop was this: Code:
.BlitLoop rept 16 move.w (a0)+,d1 move.w (a0)+,d0 lea (a1,d0.w),a3 add.w (a2,d1.w),a3 BlitWait_Inline move.w (a0)+,(a4) ;BltCon0 move.w (a0)+,BLTAFWM(a6) move.l a3,BLTAPTH(a6) move.l d5,BLTBPTH(a6) move.l d5,BLTDPTH(a6) move.w d6,(a5) ;BltSize endr addq.l #2,d5 dbf d7,.BlitLoop |
10 April 2019, 15:23 | #6 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
|
When you write wrong code and create interesting effect
|
11 April 2019, 00:00 | #7 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
|
11 April 2019, 00:10 | #8 | |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Quote:
I actually rewatched that a few weeks ago when I was getting back into this and archiving my amiga disks and still not understanding what was going on. I thought it looked like some vector letters with maybe some sine tricks in there. But I just re-watched it after what you said and noticed the letter B looked like a bitmap font. So I've still no idea what is going on there. It's like a copy and stretch but with a 1px sine in there. I just got a glenz vector running, so I can cross that off my bucket list |
|
11 April 2019, 11:06 | #9 |
Registered User
Join Date: Jun 2016
Location: UK
Posts: 428
|
My favourite example of this technique is the Crystal intro with the bunny at the start and the kind of water effect.
I can't remember exactly how you set the blitter up for it now, from memory you have first source as line 0, second source as line 1, and destination line 1, and XOR function. Essentially it just keeps copying the same word from one line to the next, xoring with the next line so that it "picks up" bits and smears them downwards. |
11 April 2019, 19:43 | #10 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
|
Sininsanity:
No split, no modulotrick, no charanim, pure blitter craziness. (and some magic ) |
11 April 2019, 20:08 | #11 |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,997
|
|
11 April 2019, 20:50 | #12 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
|
Hi Sir Galahad!
Actually my post is a bit off-topic because it lack the -scroll part . Anyway only my laziness prevented from using different tiles (16x*ny resolution supported). And there are some glitches.. (they are well hidden ). One day maybe a real usable routine will be born. |
11 April 2019, 21:17 | #13 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
|
Part of the wild loop:
Code:
.w1 btst d3,(a6) bne.b .w1 .b0 move.l a5,bltdpt-2(a6) move.w #$0100,bltcon0-2(a6) move.w d1,bltsize-2(a6) lea giro/8*1(a0),a5 move.w d2,bltamod-2(a6) move.w d2,bltbmod-2(a6) move.w (a3)+,bltcdat-2(a6) ; mask move.l a0,bltapt-2(a6) ; from big shift buffer move.l a5,bltbpt-2(a6) .w2 btst d3,(a6) bne.b .w2 move.w #$0DD8,bltcon0-2(a6) move.w #(h-1)<<6|1,bltsize-2(a6) .b1 move.w (a3)+,d1 beq.b .nx lea (a4),a5 lea -128/8*2(a5),a4 .w3 btst d3,(a6) bne.b .w3 move.w d3,bltamod-2(a6) move.w d3,bltbmod-2(a6) move.w d1,bltcdat-2(a6) move.l a4,bltapt-2(a6) move.l a5,bltbpt-2(a6) move.l a4,bltdpt-2(a6) move.w #(h+1)<<6|1,bltsize-2(a6) No destination update Minterm $D8 |
12 April 2019, 09:20 | #14 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
|
12 April 2019, 12:02 | #15 |
Posts: n/a
|
One of the most important optimisation with sine scroll is to not wait for blitter and use BLTPRI bit in DMACON register. If you use several bitplanes, use interleaved bitplanes. You can also use movem for loading blitter registers.
My sine scroll inner loop use only 6 instructions for each column : move.w (a0)+,d3 lea (a1,d3.w),a4 move.l a4,a2 movem.l a2-a4,(a5) move.w #1<<4,$44(a6) move.w d0,$58(a6) Exemple here : https://github.com/se-bo/amiga/blob/...llerscroll_2.s |
13 April 2019, 09:26 | #16 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Cheers. Although I'm targeting an old school A500 spec I want it to run on an A1200 with real fast mem ok (yeah I know this is a self imposed limitation ). When I enabled blit nasty and removed the blitwait it was glitching on an A1200 (emulated). I read some other posts on here that said that there may be a free DMA cycle even with blit nasty on. I'll recheck.
|
13 April 2019, 10:29 | #17 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
|
Quote:
in any case you have to do it and think that your code must work everywhere). Anyway these are not the most important optimizations, only instrumental for the 'big picture' and if you need to squeeze some cycles . The right use of BLTPRI can be really useful to 'overlay' jobs between blitter and CPU, so isn't a on/off rules. Quote:
Sometimes I have also changed the channels used even if I had to do the same operation because some combinations leave free CPU/Chip access cycles. |
||
13 April 2019, 16:38 | #18 | |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
|
Quote:
|
|
14 April 2019, 00:18 | #19 | ||
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,604
|
Quote:
The article is written to allow easy modification of the source to combine mask %1 to %11 if two slices are the same Y. Or %10001 if the slices are apart, but same Y ("curve dip"). And so on. Also, the article series selected effects from a certain year. In 1988, sine scrollers started out with reduced wave resolution, e.g. slice width 2 or 4 to save time (and maybe put something else on screen), but 1px was soon reached. To demonstrate 1px, the curve must slant >=45 degrees at some point on the curve. Once you reach that, you can maximize screen height to demonstrate that you don't clear the whole screen I guess, but... the real measure remains the size of the scrollbuffer. To go beyond ~320x25 you must combine bit-slices. This will increase the size a little to say ~320x64, but will not come near this faster technique giving 336x258 in Blu Sky. |
||
14 April 2019, 01:03 | #20 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,475
|
A scroll of this size (seeing from the video it looks like a ~4 blits / 16px, mine is ~3 because of softer slope) requires relatively less buffer cleaning than a multistring one. When you have to combine several scrolls (also superimposed) then you really have to do dirty tricks.. Last edited by ross; 14 April 2019 at 01:59. Reason: little rephrasing |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sine scroller | pmc | Coders. Tutorials | 95 | 02 July 2017 16:40 |
Help needed with first sine text scroller | nandius_c | Coders. Asm / Hardware | 12 | 23 June 2015 00:02 |
[vasmm68k] what are these optimizations ? | pixel | Coders. Asm / Hardware | 4 | 23 May 2014 19:31 |
Sine scroller - dycp | blazeb | Coders. Asm / Hardware | 6 | 02 May 2012 10:08 |
Flickering sine scroller | pmc | Coders. Tutorials | 4 | 24 June 2009 09:19 |
|
|