English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 07 April 2019, 01:01   #1
ebenupton
 
Posts: n/a
Driving blitter from copper

I've been doing a bit of Amiga hacking recently and, having hit a bit of a motivation trough, thought I'd share where I've got to in the hope of getting a bit of feedback/constructive criticism.

My goal has been to explore how much performance I can squeeze out of the baseline OCS/ECS hardware, compared to what I could do as a teenager back in the 1990s. My sample application is a simple 50Hz horizontally scrolling platform game, using some old graphic assets I had to hand (credit to Nick Lee for those). Although the assets don't really require it, I'm running in EHB mode to make things a bit more challenging from a bandwidth perspective.



The approach I've been pursuing (and I know this has been done before in various ways) is to use the copper to perform all writes to blitter registers: where normally during frame n I would have directly controlled the blitter to generate the contents of frame n+1, instead I am writing a copper list to run during frame n+1 to generate frame n+2.

I like this approach, because if I run in blitter nice mode, then while there is work for it to do the blitter will soak up any spare cycles left by the 68k. This is efficient, and means that our goal in writing software becomes solely to reduce instruction fetch bandwidth. So for example:

mulu #33,d5

always beats:

move.w d5,d6
lsl.w #5,d5
add.w d6,d5

A source tarball is here: http://sphinx.mythic-beasts.com/~eupton/cleo.tgz

If you'd like a quick look at the game, I've also built it as a replacement A600 Kickstart ROM.

Release version is here: http://sphinx.mythic-beasts.com/~eupton/cleo.rom
Debug version is here: http://sphinx.mythic-beasts.com/~eupton/cleo_vis.rom

The debug version has some visualisation aids to help with optimisation. It prints the scanline at which the "worst" frame so far encountered completed both its CPU and blitter work, and shows a pair of stacked bar charts at the left of the screen.

The left-hand bar chart has colors for the various phases of CPU work:
  • cyan: erase dirty rectangles
  • magenta: update edges for scroll
  • yellow: run player and boomerang
  • red: run game objects
  • green: CPU idle
Remember, when we say "erase dirty rectangles" we actually mean "write copper list to erase dirty rectangles".

The right-hand bar chart simply displays blue for blitter busy and red for blitter idle.

Code is a bit of a mess at the moment, but should be fairly self-explanatory. As I say, feedback is very welcome.

Last edited by ebenupton; 07 April 2019 at 23:01.
 
Old 07 April 2019, 03:19   #2
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
You seem really committed! Want a challenge? How about trying to make a copper-driven blitter queue that even crosses frame boundaries? In other words, it works even when VBLANK resets the copper address pointer?

It's been a while since I last looked at this but something like:

Code:
      DC.W COP1LCL, A1
      DC.W COP2LCL, B1
A1:     
      DC.W $0001, 0000   ; wait for blitter finish - VBLANK takes us here
      DC.W $0501, $7F01 ; check for possible "danger zone"
      DC.W COPJMP2, $0000 ; early enough to load blitter registers
      DC.W $0701, $7F00 ; possible VBLANK coming - wait till past "danger zone"
B1:
      DC.W BLTCON0, xxxx
      DC.W BLTCON1, xxxx
...
      DC.W BLTSIZE, blitsize
      DC.W COP1LCL, A2
      DC.W COP2LCL, B2
A2:
      DC.W $0001, 0000 ;wait for blitter finish
      DC.W $0501, $0701
      DC.W COPJMP2, $0000
      DC.W $0701, $7F01
B2:
      DC.W DC.W BLTCON0, xxxx

etc. etc.
The vpos of 5 and 7 are there to wait for the possible VBLANK that might be coming (NTSC -- PAL is different).

Several things can happen. The first is that VBLANK is several lines away so there's time to load blitter registers and get the blitter started. That's when VPOS is less than 5 (again NTSC).

The second possibility is that VPOS is larger than both 5 and 7 (but less than 255, obviously), which means no VBLANK risk.

The third is that VBLANK is imminent. If VBLANK happens, the copper list is reset to a blitter wait (A1, A2, etc.)

Not sure I've got it right. VPOS values might need some fiddling.

You also need the CPU to reprogram all the bitplane registers and such.

Still, with a generic blitter queue, you can start running a new blitter list and filling the frame buffer immediately. USE the CIAs for getting the CPU timing right. Launch the new blitter list when the display reaches line $F5 (again, NTSC).

Anyone getting this to work right will be making a big contribution. All sorts of things could benefit. Especially 3d.

Good luck!

Last edited by mc6809e; 08 April 2019 at 02:10. Reason: One wrong bit made a wait into a skip -- fixed
mc6809e is offline  
Old 07 April 2019, 13:45   #3
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Even if the idea of a kickstart ROM is interesting it is not a desirable practice...
So attached two ADF versions (both normal and debug) with a special bootblock that kick ROM code.

Works in any machine with minimum ECS Agnus (1MB chip RAM and to avoid sprite bug in _vis version).
Cheers.
Attached Files
File Type: zip cleo.zip (200.0 KB, 579 views)

Last edited by ross; 07 April 2019 at 14:19. Reason: ECS Agnus required...
ross is offline  
Old 07 April 2019, 15:15   #4
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by mc6809e View Post
How about trying to make a copper-driven blitter queue that even crosses frame boundaries? In other words, it works even when VBLANK resets the copper address pointer?
I was also thinking about such a thing at some point. I was toying around with another approach that probably is less powerful and requires some cpu assistance, but may be easier to implement. It's totally untested and could be nonsense, but this may be a good place to share it:

ADKCON/ADKCONR is a register that can be written by the copper and read by the CPU, so it can be abused for information exchange between those two. During game play, audio is typically active, but disk and serial is not, so the disk (and UART) related bits 08-14 are available, giving a 7-bit code.

After every write to BLTSIZE in your copper list, add a CMOVE to ADKCON that sets the index of the blit. During VBlank, check the value of ADKCONR and restart/rebuild the copper list with the next index. If it's the last index, indicating that everything started in the last frame, wait for blitter to finish and go to the next list.

There's a significant possibility, however, that a vblank occurs between the writes to BLTSIZE and ADKCON. For 50 blits per frame it would be approx. every 30s if I'm not mistaken. With common blits this simply repeats a blit that was already carried out, which does no harm, but if the blit changes its source there may be some corruption (e.g. in-place fill).

As writes to ADKCON can only either set or delete bits, it is inconvenient to use a linear index 0-127, as one needs up to two writes to ADKCON, it's better to construct a series of numbers where n+1 is constructed from n only by setting or deleting bits. For example, [0,1,3,2,6,7,5,4,12,13,15,10,11,9,8,14] is such a series for numbers less than 16.

As said, it's untested and I may have overlooked something significant.

Last edited by chb; 07 April 2019 at 15:24.
chb is offline  
Old 07 April 2019, 16:29   #5
ebenupton
 
Posts: n/a
Quote:
Originally Posted by ross View Post
Even if the idea of a kickstart ROM is interesting it is not a desirable practice...
My aim was to be able to demo it in emulation without worrying about Kickstart copyright annoyance.

Quote:
Originally Posted by ross View Post
So attached two ADF versions (both normal and debug) with a special bootblock that kick ROM code.
Thanks!

Quote:
Originally Posted by ross View Post
Works in any machine with minimum ECS Agnus (1MB chip RAM and to avoid sprite bug in _vis version).
Cheers.
What was the bug? I've only been testing with A600 emulation so would have missed anything that affects OCS.
 
Old 07 April 2019, 16:34   #6
drHirudo
Amiga user
 
drHirudo's Avatar
 
Join Date: Nov 2008
Location: Sofia / Bulgaria
Posts: 455
Quote:
Originally Posted by ross View Post
Even if the idea of a kickstart ROM is interesting it is not a desirable practice...
So attached two ADF versions (both normal and debug) with a special bootblock that kick ROM code.

Works in any machine with minimum ECS Agnus (1MB chip RAM and to avoid sprite bug in _vis version).
Cheers.
Why not just use ReloKick?
drHirudo is offline  
Old 07 April 2019, 16:52   #7
ebenupton
 
Posts: n/a
Quote:
Originally Posted by mc6809e View Post
You seem really committed! Want a challenge? How about trying to make a copper-driven blitter queue that even crosses frame boundaries? In other words, it works even when VBLANK resets the copper address pointer?
I've been thinking about this, because at some point I'd like to be able to do much busier scenes, updated at 25Hz while scrolling the frontbuffer on alternate frames (the old Team 17 50Hz trick). My current idea is quite simple:

Code:
A1:
      DC.W $0001, 0000 ;wait for blitter finish
      DC.W BLTCON0, xxxx
      DC.W BLTCON1, xxxx
...
      DC.W BLTSIZE, blitsize
; danger zone
      DC.W COP1LCL, A2
A2:
      DC.W $0001, 0000 ;wait for blitter finish
      DC.W BLTCON0, xxxx
      DC.W BLTCON1, xxxx
...
      DC.W BLTSIZE, blitsize
; danger zone
      DC.W COP1LCL, A3
A3:
      DC.W $0001, 0000 ;wait for blitter finish

etc. etc.
This relies on the fact that the blit operations typically used in 2d games (copy, fill, cookie cutter) are idempotent (i.e. we can repeat an operation and still get correct output). The "danger zone" is if VBLANK occurs between the write to BLTSIZE and COP1LCL; if that happens we end up repeating the blit that just started.

Is there any reason this won't work? You mention 3d: perhaps there are XOR-type operations there that aren't idempotent?

Also, I'm interested in your comment about the line on which VBLANK occurs. The manuals seem a bit vague about this: is it written down anywhere?
 
Old 07 April 2019, 17:13   #8
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by ebenupton View Post
My aim was to be able to demo it in emulation without worrying about Kickstart copyright annoyance.
No copyright annoyance, the ADFs works also with AROS ROM.
And usually is not easy for users to manage a non-standard ROM (and your ROM need a special handling, in A600 $f80000 is the base..).

Quote:
What was the bug? I've only been testing with A600 emulation so would have missed anything that affects OCS.
There's an HW bug in OCS that prevent sprite fetch when you are back-to-back with first BPL fetch.
In your case, with DDFSTRT=$30, 2nd word of sprite 6 is missed.
So an ECS machine is required
ross is offline  
Old 07 April 2019, 17:20   #9
ebenupton
 
Posts: n/a
Quote:
Originally Posted by ross View Post
There's an HW bug in OCS that prevent sprite fetch when you are back-to-back with first BPL fetch.
In your case, with DDFSTRT=$30, 2nd word of sprite 6 is missed.
So an ECS machine is required
Interesting, thanks. I was an A600 owner, so I'd become used to only losing sprite 7 with DDFSTRT=$30.
 
Old 07 April 2019, 17:52   #10
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by ebenupton View Post
Also, I'm interested in your comment about the line on which VBLANK occurs. The manuals seem a bit vague about this: is it written down anywhere?
According to Toni Wilen, vblank interrupt is at the beginning of line 0 (exception: line 1 for A1000):
http://eab.abime.net/showpost.php?p=...&postcount=192
chb is offline  
Old 07 April 2019, 18:26   #11
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by drHirudo View Post
Why not just use ReloKick?
Because ReloKick works with a standard ROM.
And in any case is much much simpler with an ADF..
ross is offline  
Old 07 April 2019, 18:53   #12
zzbylu
Saberman
 
zzbylu's Avatar
 
Join Date: Dec 2016
Location: Kielce/Poland
Posts: 326
Small gameplay:
[ Show youtube player ]
zzbylu is offline  
Old 07 April 2019, 19:14   #13
grond
Registered User
 
Join Date: Jun 2015
Location: Germany
Posts: 1,918
That looks pretty cool! I also like the graphics style. Can you do a copper shade for the background or is that too difficult with the copper-controlled blits?
grond is offline  
Old 07 April 2019, 19:29   #14
ebenupton
 
Posts: n/a
Quote:
Originally Posted by grond View Post
That looks pretty cool! I also like the graphics style. Can you do a copper shade for the background or is that too difficult with the copper-controlled blits?
This is the drawback with this technique. You'd have to do copper bars with CIA interrupts. I still think it would pay off: might give it a try.
 
Old 08 April 2019, 02:12   #15
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Very nice! Looks awesome!
mc6809e is offline  
Old 08 April 2019, 02:50   #16
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Quote:
Originally Posted by ebenupton View Post
I've been thinking about this, because at some point I'd like to be able to do much busier scenes, updated at 25Hz while scrolling the frontbuffer on alternate frames (the old Team 17 50Hz trick). My current idea is quite simple:

Code:
A1:
      DC.W $0001, 0000 ;wait for blitter finish
      DC.W BLTCON0, xxxx
      DC.W BLTCON1, xxxx
...
      DC.W BLTSIZE, blitsize
; danger zone
      DC.W COP1LCL, A2
A2:
      DC.W $0001, 0000 ;wait for blitter finish
      DC.W BLTCON0, xxxx
      DC.W BLTCON1, xxxx
...
      DC.W BLTSIZE, blitsize
; danger zone
      DC.W COP1LCL, A3
A3:
      DC.W $0001, 0000 ;wait for blitter finish

etc. etc.
This relies on the fact that the blit operations typically used in 2d games (copy, fill, cookie cutter) are idempotent (i.e. we can repeat an operation and still get correct output). The "danger zone" is if VBLANK occurs between the write to BLTSIZE and COP1LCL; if that happens we end up repeating the blit that just started.

Is there any reason this won't work? You mention 3d: perhaps there are XOR-type operations there that aren't idempotent?
Good question. Some of the better informed around here might be able to tell you. I think there are a few tricks typically done for efficiency that might fail.

Quote:
Also, I'm interested in your comment about the line on which VBLANK occurs. The manuals seem a bit vague about this: is it written down anywhere?
Either line 0 or line 1, depending on whether it's the A1000 or later.

I think what you're after though is information about the last line before VBLANK comes and the copper is restarted. The trouble is that the copper only sees 8 of the 9 bits necessary to store the full VPOS. That's why the "danger area" uses numbers like 5 and 7 -- because the copper might be seeing lines 5 through 7, OR it might only look like lines 5 or 7 when really VPOS is 256+5 = 261. If VPOS is really 261 then that's the last line just before the copper restarts.

For the A1000 things are a little different. Check to see if you're at line 0. If you are, then wait for line 2.

The pattern of the copper list should still be the same, though.
mc6809e is offline  
Old 08 April 2019, 03:01   #17
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
Quote:
Originally Posted by ebenupton View Post
This is the drawback with this technique. You'd have to do copper bars with CIA interrupts. I still think it would pay off: might give it a try.
I think it may be within the realm of possibilities to build a Copper list capable of managing both duties assuming some constraints are set to reduce the required complexity.

Since the Blitter runs in "nasty mode", the durations of blits can be precomputed while constructing the blit list and it becomes possible to interleave them with color changes: if we know that a given blit will take between
min
and
max
scanlines to proceed, then the Copper can be freed of blit duty during at least
min
scanlines which can be used freely for color changes.

This means that if blits can be organized (or split?) to last for relatively constant durations, then the Copper driven color changes can be adjusted to occur during the time intervals where it is safe to wait for the video beam rather than the Blitter.

Conversely, assuming that Copper driven color changes occur only at fixed known intervals, the blit list could be sorted to ensure that blits overlap nicely in-between.

This may turn out to be slightly less efficient than a single purpose approach but since the cost of using CIAs + CPU for color changes is much higher than using the Copper, this might be worth it.
ReadOnlyCat is offline  
Old 08 April 2019, 03:26   #18
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
My previous post about color changes interleaving made me think that since nasty-mode Blitter timings are fairly computable/predictable it should be doable to organize the Copper list in a way which makes it relatively easy to handle and detect the
VBLANK
transition.

Assuming one can organize blits and/or predict their duration precisely enough, the Copper list could be constructed in a way that slightly ahead of the
VBLANK
transition it would insert a large blit that takes a known number of scanlines and is guaranteed to terminate well into the next frame. The Copper could then be instructed to simply jump into next-frame's list where it would wait for that blit to finish normally.

It seems feasible for blit-list construction routines to keep track of the current DMA budget while they queue up new blits and to insert a previously-put-aside large blit when they detect that the budget is almost exhausted.

The advantage of such a technique is that it would allow to lift the requirement to use only idempotent blits.
ReadOnlyCat is offline  
Old 08 April 2019, 15:08   #19
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Don't overcomplicate things, the Copper can handle for itself quite complex flows.

The key is to detach blitter queue from normal copper flux.
How to do this? Some hints:
- remember that Copper pointers are buffered
- use Copper1 only for normal jobs (like video split and copper gradients)
- use Copper2 only for blitter queue
- BFD bit works also for SKIP instruction
Even if simple the Copper is a real co-processor and you can control the flow and make subroutines.

Only at the last blit you can call for a CPU aid (even a Copper IRQ suffice), then check where you are, switch buffer and start a new queue.
Your only concern is a frame rate drop if you do not complete the queue in one frame time.

Practical snippet:
Code:
	lea	copper,a0
	move.l	a0,$dff080
	lea	blitter_queue,a0
	move.l	a0,$dff084
        ....
	

copper
	dc.w	$0082,§1
	dc.l	$8001ff00
	dc.l	$ffff0001	;if (blitter busy)*
	dc.l	$00880000	; JMP to §1 (normal flux)
	dc.l	$008a0000	;else JSR to next blit on queue 

§1	dc.l	$01800fff
	dc.w	$0082,§2
	dc.l	$8101ff00
	dc.l	$ffff0001
	dc.l	$00880000
	dc.l	$008a0000

§2	dc.l	$01800888
	dc.w	$0082,§3
	....
	....

$138	dc.w	$0082,copper
	dc.l	$fffffffe 


blitter_queue
	dc.w	$0086,~1	;setup next subroutine
	dc.l	$00540003	;setup blitter
	dc.l	$00560000
	....
	dc.w	$0058,start0	;start blitter
	dc.l	$00880000	;RTS
	
~1	dc.w	$0086,~2
	dc.l	$00540003
	dc.l	$00568000
	....
	dc.w	$0058,start1
	dc.l	$00880000

~2	dc.w	$0086,~3
	....


*Why $FFFF? So I can recognize immediately a SKIP point on copper list :)
Cheers

Last edited by ross; 08 April 2019 at 15:58. Reason: $dff0084 :D
ross is offline  
Old 08 April 2019, 15:19   #20
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,406
Quote:
Originally Posted by ebenupton View Post
This is the drawback with this technique. You'd have to do copper bars with CIA interrupts. I still think it would pay off: might give it a try.
This sounds like it might be an excellent idea (doing Copper bars with CIA interrupts and blitting using just the Copper). You could even consider doing all the display setup per frame using the CPU.

This would essentially allow you to 'not care' about the Blitter in the main loop of the program, other than setting up the changes in what to blit per frame. And crucially, it would do that while still allowing most (all?) of the nifty 'Copper tricks' - albeit now via CPU so slower than normal due to interrupt overhead.
roondar is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Combining copper scrolling with copper background phx Coders. Asm / Hardware 16 13 February 2021 12:41
Best way to mix blitting with copper and copper effects roondar Coders. Asm / Hardware 3 12 September 2016 13:12
Blitter busy flag with blitter DMA off? NorthWay Coders. Asm / Hardware 9 23 February 2014 21:05
Avoiding copper strobe/blitter bug mc6809e Coders. Asm / Hardware 31 28 November 2013 08:09
Blitter using the copper... h0ffman Coders. Asm / Hardware 9 23 February 2012 08:25

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:07.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.43503 seconds with 16 queries