Blitter fill timing - Page 2

mc6809e · 15 April 2013, 23:40

Quote:

Originally Posted by leonard

I agree blitter interrupt is not the fastest way of doing blitter queue on Amiga. But I can't use COPPER because I use it to do some "sync to screen" stuff. So I guess I can't mix blitter commands because I may lose some sync points with the electron beam.

The CPU interrupt seems very long compared to the ATARI-ST, but if you confirm it's normal, then I have to take that into account.

Thanks!

The time between the end of a blit and the processing of the interrupt by the CPU should actually be slightly shorter (in terms of cycles) for the Amiga than the ST as the ST has the overhead of bus arbitration.

If you're doing 3d, my guess is that the latency you're seeing is a result of interrupts occurring during the instruction just before a DIVS or MULS. Worst case is interrupt during MOVEM.L m->D0-D7/A0-A7 with DIVS in prefetch queue. Well over 300 cycles on a 68000.

20 April 2013, 21:56

One trick about clearing: For a lot of 3d stuff, I (and probably many other people) did the clearing with the linedraw instead of clearing the full buffer- so it worked like this:
1. Draw lines with xor into an empty screen-size buffer (back buffer)
2. blitter fill from the back buffer into a front buffer. The size of this blit is calculated by the overlap of the bounding box for the current frame's object, and bounding box for whatever was drawin on this front buffer last frame. I often recalculated the bounding box for each bitplane, as it sometimes saved a little extra.
3. draw the lines with xor again into the back buffer -this will clear it

Also, about the 3D Demo II glenz, I cheated the heck out of it - If I remember correctly, I pre-rotated the vertices, and I may even have precalced the front/back facing, I was just trying to see how many lines I could push through by using the blitter linedraw. I did, however, have quite a bit of cpu cycles leftover, so I added 2 small texturemapped faces on there, if you watch carefully. I didn't understand why the cpu work didn't hurt the blitter time back then, but Toni's explanation makes perfect sense.

Also, big props to Paradroid for using bitplane pointers for the third bitplane - it works perfectly because his glenz (like most others) are exactly 5 colors (4+bg) - in 3d demo 2 I used 7 colors (6+bg), so it wouldn't have worked for me.

Also, I don't think I ever used blitter interrupts - I was a big fan of copper blits. If you want to use blitter interrupts, I think it makes sense to just use them for the big blits (fill and clear operations), but not for every line etc.)

BTW, it's of course really hard to use copper blits if you have effects that run slower than 50fps. also, if you have raster bars only at some parts of the screen (so not every single line), you can still use copper blits during parts of the screen, and interrupt blits during the rest.

Toni Wilen · 21 April 2013, 09:25

Quote:

Originally Posted by leonard

I agree blitter interrupt is not the fastest way of doing blitter queue on Amiga. But I can't use COPPER because I use it to do some "sync to screen" stuff. So I guess I can't mix blitter commands because I may lose some sync points with the electron beam.

Use copper to trigger interrupts. Interrupt routine can do multiple CPU blits "normally". (Of course this also gets too slow if you need lots of separate interrupts but normally you only need one or two)

hooverphonique · 21 April 2013, 22:01

Quote:

Originally Posted by Paradroid

> But then you say when clip arrive, you switch to a four bitplan blitter routine.
At this point I've switched to a smaller display area,192x192. At this size I can fill 4 bitplanes in just under half a frame, leaving the other half for clearing (which uses both blitter and CPU) and drawing the lines.

As far as I can make out, you use 5 colors + bg.. why did you use 4 bitplanes for that (i presume that the mid-screen color changes are just the copper changing the palette) ?

Paradroid · 22 April 2013, 19:28

Quote:

Originally Posted by hooverphonique

As far as I can make out, you use 5 colors + bg.. why did you use 4 bitplanes for that (i presume that the mid-screen color changes are just the copper changing the palette) ?

For a regular glenz object you only need 3 bitplanes: 1 to represent the overall shape of the object, then another for the inner and outer surfaces which only require half the triangles drawing to them.

Once you start clipping object you can no longer use a single plane to represent the shape as the inner and outer outlines will be different, meaning you have to draw the entirety of the inner and outer surfaces to separate planes, which in my effect's case meant 2 planes each.

For example:

Of course I could have drawn some holes into the planes represented on the right to get more colours into it, but then that would have caused me a major headache for the glenz that zooms in at the beginning which uses the copper/bitplane spans to define the convex shape of the entire object.

Astrofra · 29 April 2013, 10:32

This is discussion is so insanely cool and full of technical details I'm so fond of

mc6809e · 13 May 2013, 22:36

Quote:

Originally Posted by Toni Wilen

btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)

I just spent a good deal of the weekend playing with the v -3 option.

The visualization of DMA channel allocation is great! Thanks for this! Highly recommended.

Your comments about blitter clears and the the A channel are especially interesting. The visualization shows that blitter clears are mostly a waste of time unless they're done during during the vertical overscan areas.

I noticed some programs/demos that carefully timed this to happen to make the most of available accesses.

Also interesting are the number of empty DMA slots in some programs, especially Atari ST conversions. Starglider, even though there's some blitter usage, really doesn't do much to overlap blits with computation, leaving the bus idle more than needs be. And Starglider II doesn't appear to use blitter area fill at all, which surprised me, though sprites are used (unlike Starglider I).

Anyway, incredibly enlightening. Made my weekend!

hooverphonique · 14 May 2013, 22:04

Quote:

Originally Posted by mc6809e

Your comments about blitter clears and the the A channel are especially interesting. The visualization shows that blitter clears are mostly a waste of time unless they're done during during the vertical overscan areas.

I noticed some programs/demos that carefully timed this to happen to make the most of available accesses.

you mean during the vertical blank period? otherwise it must be lot's of very short blits

mc6809e · 14 May 2013, 22:32

Quote:

Originally Posted by hooverphonique

you mean during the vertical blank period? otherwise it must be lot's of very short blits

Oh, I mean during those scanlines above the top of the visible display and below the bottom of the visible display.

At one time I had my Amiga hooked up to an old rough black and white CRT via the composite output. Anything above the top visible line or below the bottom visible line disappeared under the edges of the display, so for me, anything outside those 200 lines (NTSC) was overscan.

Toni Wilen · 15 May 2013, 21:08

Quote:

Your comments about blitter clears and the the A channel are especially interesting. The visualization shows that blitter clears are mostly a waste of time unless they're done during during the vertical overscan areas.

Why is it waste during bitplane area? For example if you have 4 plane lores, blitter cycles fit perfectly between bitplane cycles (and CPU can still do some stuff during horizontal blanking). Not optimal but not much wasted either.

Quote:

Also interesting are the number of empty DMA slots in some programs, especially Atari ST conversions. Starglider, even though there's some blitter usage, really doesn't do much to overlap blits with computation, leaving the bus idle more than needs be

I assume those are long instructions (big shifts, muls and divs). I think blitter should be always active (with nasty bit cleared if blit has no idle cycles) during CPU calculations (for example 3d geometry) for optimum results.

This also explains why copper started blits are optimal, very high chip bus usage, both CPU and blitter can run at the same time, cycles are never wasted for blitter waits.

It would be interesting to see how much different programs waste time for CPU blitter waits. Result may be quite unexpected...

mc6809e · 26 February 2014, 08:38

Not sure how I missed answering this...

Quote:

Originally Posted by Toni Wilen

Quote:

Originally Posted by mc6809e

Your comments about blitter clears and the the A channel are especially interesting. The visualization shows that blitter clears are mostly a waste of time unless they're done during during the vertical overscan areas.

Why is it waste during bitplane area? For example if you have 4 plane lores, blitter cycles fit perfectly between bitplane cycles (and CPU can still do some stuff during horizontal blanking). Not optimal but not much wasted either.

Correct me if I'm wrong but doesn't the idle blitter cycle during a clear require a slot that is not used by other DMA, except maybe the CPU? That's what I thought was "very interesting" when I read your earlier comment.

Maybe I just misunderstood, but the current emulator seems to support the idea.

Starglider shows this. It does nearly the worst possible thing to clear the buffer, btw. It starts a buffer clear with the blitter at around vpos 5 and busy waits so that about half way through the clear it hits the first visible scan line and starts running at a little faster than half speed (or quarter speed compared to MOVEMs+blitter running in the vertical overscan/blanking area.)

Toni Wilen · 26 February 2014, 10:56

Yeah, it is wrong, I am not sure what I was thinking.. Blitter cycles (idle or not) always require free cycle. Blitter idle cycles are usable by the CPU.

Cycle diagram with 4 lores planes + D clear would be:

PDP-PDP-PDP-PDP- (P = bitplane, D = blitter D, - = blitter idle cycle, CPU can use it)

-> It is always waste of free cycles if program starts D clear and then immediately starts waiting for the blitter.

Cyprian · 21 August 2014, 19:55

Toni, what about cycle diagram for lines without video dma (top/bottom border) active?
It will be:
-D---D---D---D--
or
-D-D-D-D-
?
thanks

Toni Wilen · 21 August 2014, 21:27

Quote:

Originally Posted by Cyprian

Toni, what about cycle diagram for lines without video dma (top/bottom border) active?
It will be:
-D---D---D---D--
or
-D-D-D-D-
?
thanks

It same as in HRM. HRM blitter diagrams show blitter DMA usage when there is no other active channels. (and they are correct)

= -D-D-D-D..

mc6809e · 21 August 2014, 22:59

Not sure if it's obvious or not, Cyprian, but each of those DMA cycles is two CPU cycles long.

Since some of those blitter cycles are idle cycles during a clear, there are times like during overscan/blanking when the CPU can run full speed while the blitter also clears at the same time. Some have even used both the CPU's movem instruction and blitter in combination to clear buffers up to twice as fast as with the CPU alone.

For example, the DMA sequence would look something like:

DwDwDwDwDwD
a a a a a a

where 'w' is a CPU write to memory, 'D' is the D channel of the blitter writing to memory, and 'a' is approximately when the CPU puts the address on the bus during the first two CPU cycles of a CPU memory access.

I think one of the things that prevented programmers from getting the most out of the Amiga early on was an overemphasis on the "odd cycle/even cycle" description of the relation between the CPU/blitter/other DMA.

The CPU and blitter are very dynamic when it comes to accessing chipram.

A three plane display with blitter clear might look like this:

DwD2w3D1wDw2D3w1DwD2w3D1wDw2D3w1DwD2w3D1

In this case the blitter and CPU take turns using the DMA cycle opened up by the missing fourth plane. The odd/even model suggests that this is impossible, but it happens on real hardware.

Cyprian · 22 August 2014, 00:03

Quote:

Originally Posted by Toni Wilen

It same as in HRM. HRM blitter diagrams show blitter DMA usage when there is no other active channels. (and they are correct)

= -D-D-D-D..

thanks Toni and mc6809e for explanation.
we know that in this case the D channel can writes data to memory every second memory slot. I'm just wondering, why it can't do that during bitplane area like that:
PDPDPDPDPDPDPDPD

Is it caused by that?

Quote:

Originally Posted by Toni Wilen

Blitter cycles (idle or not) always require free cycle.

If yes, what is behind that strategy? Why it needs idle cycles during bitplane area?
thanks

Toni Wilen · 22 August 2014, 08:10

Quote:

Originally Posted by Cyprian

If yes, what is behind that strategy? Why it needs idle cycles during bitplane area?

I think it is some shared Agnus resource that is needed by all DMA channels. Possibly adder (used to inc/dec DMA pointers by +-2 or by modulo) or some other ALU-like element.

mc6809e · 22 August 2014, 09:47

Quote:

Originally Posted by Toni Wilen

I think it is some shared Agnus resource that is needed by all DMA channels. Possibly adder (used to inc/dec DMA pointers by +-2 or by modulo) or some other ALU-like element.

What about the register buses? There is that bltddat dummy register that's connected to writes.

Are there any known issues with blitter/copper DMA happening in the cycle before a disk write DMA cycle?

It would be interesting if during disk write DMA the previous cycle was an idle cycle, too.

Edit: well, during a disk read I guess since a read fills memory.

Photon · 23 August 2014, 01:27

If you mean the twist-scroller, it's filled not with the normal vector filling method but with a copy-and-xor-to-line-above blit.

If you mean filled vectors, well a fill blit takes exactly the same time as a copy blit ($9f0) of the same area. The formula is in Hardware Reference Manual for calculating the time - or you could just do it and measure the time by setting and clearing the background color. Something like IDK, 60-80 scanlines maybe (out of 312)? If you do it during the vertical blanking when there's no competing bitplane DMA running.

Copyblits (and therefore fill-blits) leave cycles free for the CPU, so you can do things while it fills (hint hint!)

Toni Wilen · 23 August 2014, 08:34

Quote:

Originally Posted by Photon

If you mean filled vectors, well a fill blit takes exactly the same time as a copy blit ($9f0) of the same area.

Not correct, there are 4 channel combinations with fill that add extra idle cycle, including A to D copy with fill = A to D with fill takes 3 cycles, not 2.

This explains why CPU has free cycles. Normal copy will not give any cycles for CPU (if nasty bit is set)

HRM diagram is only correct if there are no other DMA activity, no fill, no linedraw. (EDIT: Only the very first rare HRM revision has extra fill information!)

20 April 2013, 21:56	#22
hannibal Posts: n/a	One trick about clearing: For a lot of 3d stuff, I (and probably many other people) did the clearing with the linedraw instead of clearing the full buffer- so it worked like this: 1. Draw lines with xor into an empty screen-size buffer (back buffer) 2. blitter fill from the back buffer into a front buffer. The size of this blit is calculated by the overlap of the bounding box for the current frame's object, and bounding box for whatever was drawin on this front buffer last frame. I often recalculated the bounding box for each bitplane, as it sometimes saved a little extra. 3. draw the lines with xor again into the back buffer -this will clear it Also, about the 3D Demo II glenz, I cheated the heck out of it - If I remember correctly, I pre-rotated the vertices, and I may even have precalced the front/back facing, I was just trying to see how many lines I could push through by using the blitter linedraw. I did, however, have quite a bit of cpu cycles leftover, so I added 2 small texturemapped faces on there, if you watch carefully. I didn't understand why the cpu work didn't hurt the blitter time back then, but Toni's explanation makes perfect sense. Also, big props to Paradroid for using bitplane pointers for the third bitplane - it works perfectly because his glenz (like most others) are exactly 5 colors (4+bg) - in 3d demo 2 I used 7 colors (6+bg), so it wouldn't have worked for me. Also, I don't think I ever used blitter interrupts - I was a big fan of copper blits. If you want to use blitter interrupts, I think it makes sense to just use them for the big blits (fill and clear operations), but not for every line etc.) BTW, it's of course really hard to use copper blits if you have effects that run slower than 50fps. also, if you have raster bars only at some parts of the screen (so not every single line), you can still use copper blits during parts of the screen, and interrupt blits during the rest. Last edited by prowler; 20 April 2013 at 23:59. Reason: Back-to-back posts merged; please use the Edit button.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Clipping line for blitter fill	leonard	Coders. Asm / Hardware	12	27 April 2013 12:03
80 GB HD to fill!	fatboy	Amiga scene	16	20 July 2011 14:13
Sector fill pattern	absence	Coders. General	7	21 March 2009 21:50
WinUAE blitter <-> bitplane DMA timing accuracy?	Photon	Coders. General	1	24 November 2004 18:06
Fill 'em	Tim Janssen	request.Old Rare Games	1	27 June 2003 09:25

29 April 2013, 10:32	#26
Astrofra Amos Basic Join Date: Feb 2013 Location: Orleans \| France Age: 49 Posts: 85	This is discussion is so insanely cool and full of technical details I'm so fond of

26 February 2014, 10:56	#32
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,553	Yeah, it is wrong, I am not sure what I was thinking.. Blitter cycles (idle or not) always require free cycle. Blitter idle cycles are usable by the CPU. Cycle diagram with 4 lores planes + D clear would be: PDP-PDP-PDP-PDP- (P = bitplane, D = blitter D, - = blitter idle cycle, CPU can use it) -> It is always waste of free cycles if program starts D clear and then immediately starts waiting for the blitter.

21 August 2014, 19:55	#33
Cyprian Registered User Join Date: Jul 2014 Location: Warsaw/Poland Posts: 192	Toni, what about cycle diagram for lines without video dma (top/bottom border) active? It will be: -D---D---D---D-- or -D-D-D-D- ? thanks

21 August 2014, 22:59	#35
mc6809e Registered User Join Date: Jan 2012 Location: USA Posts: 372	Not sure if it's obvious or not, Cyprian, but each of those DMA cycles is two CPU cycles long. Since some of those blitter cycles are idle cycles during a clear, there are times like during overscan/blanking when the CPU can run full speed while the blitter also clears at the same time. Some have even used both the CPU's movem instruction and blitter in combination to clear buffers up to twice as fast as with the CPU alone. For example, the DMA sequence would look something like: DwDwDwDwDwD a a a a a a where 'w' is a CPU write to memory, 'D' is the D channel of the blitter writing to memory, and 'a' is approximately when the CPU puts the address on the bus during the first two CPU cycles of a CPU memory access. I think one of the things that prevented programmers from getting the most out of the Amiga early on was an overemphasis on the "odd cycle/even cycle" description of the relation between the CPU/blitter/other DMA. The CPU and blitter are very dynamic when it comes to accessing chipram. A three plane display with blitter clear might look like this: DwD2w3D1wDw2D3w1DwD2w3D1wDw2D3w1DwD2w3D1 In this case the blitter and CPU take turns using the DMA cycle opened up by the missing fourth plane. The odd/even model suggests that this is impossible, but it happens on real hardware.

23 August 2014, 01:27	#39
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,650	If you mean the twist-scroller, it's filled not with the normal vector filling method but with a copy-and-xor-to-line-above blit. If you mean filled vectors, well a fill blit takes exactly the same time as a copy blit ($9f0) of the same area. The formula is in Hardware Reference Manual for calculating the time - or you could just do it and measure the time by setting and clearing the background color. Something like IDK, 60-80 scanlines maybe (out of 312)? If you do it during the vertical blanking when there's no competing bitplane DMA running. Copyblits (and therefore fill-blits) leave cycles free for the CPU, so you can do things while it fills (hint hint!)

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)