Blitter fill timing

leonard · 07 April 2013, 21:16

Hi,

I just saw a really cool demo from "Revision 2013" party. ( http://pouet.net/prod.php?which=61182 ). It's a AMIGA 500 (OCS chipset) demo featuring some really nice effects.
In the end scroller, the author explain some tricks he use, and I'm curious about one thing. In a glenz vector part, author claim he has to use bitplan trick because the blitter is not able to fill three bitplan screen at 50hz. I'm ATARI-ST programer, I coded amiga stuff too (never released) but it was on A1200 so timings are not the same I guess.

Can someone tell me exactly how % of VBL take a complete 3d fill of a 320*256 screen, in one bitplan, on a A500 OCS? (I did't find that on google

)

Thanks in advance
Leonard / OXYGENE

Toni Wilen · 08 April 2013, 10:58

A1200 chipset DMA timing is exact same as long as FMODE=0. CPU speed can be much faster (instruction cache, faster instructions and 32-bit wide bus to chipmem).

320x256 single bitplane blitter fill takes about 70 scanlines if all DMA slots are free.

I'd estimate (too lazy to calculate anything) it is possible to fill 3 planes in one frame (with 3 bitplane display visible) but there would not be time left for anything else.

It will get much worse if overscan is used.

leonard · 08 April 2013, 14:01

Thanks Toni.

So when we all see classic "glenz vector" objects such as the famous "glenz vector 48 faces" of the HardWeird demo, it's only possible because the glenz vector does not cover the complete screen. ( maybe it's 200*200 pixels?).
If it cover the whole screen, there is not enough blitter time to draw 320*256 pixels.

Other questions: imagine the CPU is filling some memory, does it slow-down the blitter a bit? Or is the CPU totally on different cycle than blitter? In other words, would it be possible to draw some polygons with CPU for free, during the time blitter did all its work, driven by a pre-computed COPPER list?

leonard · 08 April 2013, 14:23

Oh BTW, there is 312 "time" scanlines per VBL in PAL, am I right? So three bitplans * 70 scanlines = 210 scanlines to fill 3*320*256, so there is 102 free scanlines, right? (about 1/3).

diablothe2nd · 08 April 2013, 16:37

very impressive demo

i'm curious to know how they did it too

Paradroid · 09 April 2013, 07:38

Quote:

Originally Posted by leonard

Oh BTW, there is 312 "time" scanlines per VBL in PAL, am I right? So three bitplans * 70 scanlines = 210 scanlines to fill 3*320*256, so there is 102 free scanlines, right? (about 1/3).

No, because the DMA is being used to fetch the bitplanes for the display too, so it ends up taking practically a whole frame to fill 3*320*256 and that's without any other clearing or polygon rendering.

Quote:

Originally Posted by leonard

Other questions: imagine the CPU is filling some memory, does it slow-down the blitter a bit? Or is the CPU totally on different cycle than blitter? In other words, would it be possible to draw some polygons with CPU for free, during the time blitter did all its work, driven by a pre-computed COPPER list?

This is effectively what I do when the glenz first appears, although I'm only using the CPU to calc the edges rather than fill the polygon, which is achieved via the copper displaying different lines from a pre-drawn triangle.

Note that the blitter can slow down the CPU if the code is running from chip ram. If the blitter nasty bit is set it can even stop the CPU, a feature I rely upon in the plasmas near the beginning of the demo.

Paradroid · 09 April 2013, 08:10

ah, just realised Leonard is the same guy I was talked to about this via email

For the benefit of others, here's some of the relevant info I'd passed on...

> "blitter can't fill 3 bitplans at 50hz"
That was referring to overscan bitplanes. Redux uses a 352x272 display most of the time and the blitter wouldn't even be able to fill clear and fill 2 bitplanes at that resolution, even when using the cpu to help with the clear (well, it might just about do it, but not when drawing a lot of lines too). The area I'm filling with the blitter is clamped around the object, which is why I'm able to keep it at 50Hz.

> But then you say when clip arrive, you switch to a four bitplan blitter routine.
At this point I've switched to a smaller display area,192x192. At this size I can fill 4 bitplanes in just under half a frame, leaving the other half for clearing (which uses both blitter and CPU) and drawing the lines.

> How many time require the blitter to fill a one bitplan, 320*256 pixels screen?
It's totally dependant on what else is active and using the DMA buss, such as number of active bitplanes, audio, sprites, etc. If you stick to a 2 bitplane display, write a fast clear and don't draw too many lines you could stay in 50Hz at that size. I just about managed it in the demo deja-vu with a full screen screen glenz (it only needed 2 bitplanes because you couldn't see the outline of the object), but I just couldn't get it fast enough in overscan... Hmmm, that was a very long time ago, maybe I should try again ^_^

leonard · 09 April 2013, 10:17

Hi Paradroid

Yes I'm the same guy

glad you are on that forum too! I love world record in demos (I get some on ATARI st

) and I always thought 3d was "easy" on amiga. Now I see that it could be a world record to get a 320*256 glenz vector on a standard A500 OCS.

Thanks for all explains, I see now that even mythic hardweird glenz 48 faces is quite small on the screen.

Paradroid · 09 April 2013, 10:41

Quote:

Originally Posted by leonard

Now I see that it could be a world record to get a 320*256 glenz vector on a standard A500 OCS.

I did that in 1992, so you'll need to go even bigger

Actually, IIRC I wasn't using the copper or interrupts for rendering the glenzes in Deja-vu, so it shouldn't be too hard to make them bigger as the CPU was proably just waiting for the blitter to finish half the time.

EDIT:
FYI, the record for OCS glenz faces is at least 192 (see Anarchy's 3D Demo II). Doing that full screen would be nice challenge to take on

leonard · 09 April 2013, 12:42

Quote:

Originally Posted by Paradroid

I did that in 1992, so you'll need to go even bigger

Oh yes but the glenz is 2 bitplans only as you said right? (I mean, it works just because tyhe shape is zoomed so that we don't see the borders)

Quote:

Originally Posted by Paradroid

FYI, the record for OCS glenz faces is at least 192 (see Anarchy's 3D Demo II). Doing that full screen would be nice challenge to take on

Oh I have to look at this demo. As I said I'm much more ATARI-ST demo specialist, I have some lack in my AMIGA demo culture

BTW could you tell me how much time it takes to CLEAR with the blitter, compared to "FILL" (in the same condition of bitplans, sound, copper, etc). Did the CLEAR is twice fast than FILL? or anything else?

Paradroid · 09 April 2013, 13:53

a clear would be more than double the speed of a fill, although even then I wouldn't usually use a pure blitter clear myself. Depending on how you draw the object you may not need a traditional clear at all. For example, it might be quicker to redraw the lines again to wipe the old ones. That would require the fill to do a copy to another buffer rather than writing the result back to itself...

Then again, maybe you might want to use a technique that doesn't need a fill at all. This is why I love programming the amiga, as with every effect, there loads of ways to go about rendering 3D using the cpu, blitter, copper, interrupts, etc, for various tasks in various configurations and orders, so I suggest you just grab yourself a framework if you don't have one already and just experiment. If all you have is blitter memory bandwidth numbers you sure ain't going to be getting anywhere near the potential of the machine.

leonard · 09 April 2013, 14:09

Quote:

Originally Posted by Paradroid

Then again, maybe you might want to use a technique that doesn't need a fill at all. This is why I love programming the amiga, as with every effect, there loads of ways to go about rendering 3D using the cpu, blitter, copper, interrupts, etc, for various tasks in various configurations and orders

Totally agree with you. Beating the glenz vector record require some carefully fintuned balance between blitter, cpu and copper.

Quote:

Originally Posted by Paradroid

I suggest you just grab yourself a framework if you don't have one already and just experiment. If all you have is blitter memory bandwidth numbers you sure ain't going to be getting anywhere near the potential of the machine.

I have my own framework to do ATARI-ST demo, running on Windows ( kernel, packer, track loader, etc). I had done the same on my A1200 long time ago, but I will convert it for windows so I can test stuff myself.

I tryed the built-in debugger of WinUAE. Not really bad but far from good to make devleoppement. What debugger are you using when you develop amiga stuff on windows platform?

Toni Wilen · 09 April 2013, 14:19

Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Both idle and non-idle blitter cycles require DMA cycle that was not used by any other higher priority DMA channel. If this Blitter cycle was not actually used by Blitter (was blitter idle cycle), it becomes available for the CPU.

This is very important undocumented feature that should help to optimize bitplane/blitter/CPU usage even better.

btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)

Paradroid · 09 April 2013, 14:23

I just use the winuae debugger, plus a whole load of verification and unit testing code so I don't need to visit it too often. oh how I miss source level debugging, lol

I've got a snasm devkit here (same as what I was using when making the original RaD) that allows source level debugging of code running on the actual hardware, but I don't have a PC old enough to put it in - for some reason I've kept my 386 and 486 mobos, but not the memory chips or power supplies zzz -_-

leonard · 09 April 2013, 18:42

Quote:

Originally Posted by Toni Wilen

Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Both idle and non-idle blitter cycles require DMA cycle that was not used by any other higher priority DMA channel. If this Blitter cycle was not actually used by Blitter (was blitter idle cycle), it becomes available for the CPU.

This is very important undocumented feature that should help to optimize bitplane/blitter/CPU usage even better.

btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)

Thanks for additional technical details. Let's suppose blitter is filling an area, you say it has 3busy cycles and 1 free per word. Does it means that you get 1/4 of the memory speed from a CPU side. ( even a tower of NOP instructions consume memory by reading opcodes in memory). It means that a single tower of NOPs execute 1/4 slower if blitter is filling at the same time? Or did I miss something?

mc6809e · 09 April 2013, 20:52

Quote:

Originally Posted by Toni Wilen

Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Very interesting.

This suggests that triple buffering is worth trying when using area fill.

Buffer 1 -- display
Buffer 2 -- area fill poly
Buffer 3 -- CPU clear with MOVEMs

Toni Wilen · 09 April 2013, 21:15

Quote:

Originally Posted by leonard

Thanks for additional technical details. Let's suppose blitter is filling an area, you say it has 3busy cycles and 1 free per word.

3 cycles was the total, 1 read, 1 write, 1 is idle. (It can become 4 cycles if 3 or 4 channels are enabled, note that some 3 channel combinations only use 3 cycles even in fill mode, "fill idle" cycle is included only in some combinations)

Quote:

Does it means that you get 1/4 of the memory speed from a CPU side. ( even a tower of NOP instructions consume memory by reading opcodes in memory). It means that a single tower of NOPs execute 1/4 slower if blitter is filling at the same time? Or did I miss something?

Yes (everyone knows that CPU gets slower when blitter is active) but more important is that every blitter idle cycle is _always_ available for the CPU (afaik Agnus has some shared logic that blitter needs during idle cycles) = always do something useful with the CPU instead of just polling blitter finished bit uselessly if you just started blit with idle cycles (This seemed to be the most common way to waste raster time..) Blitter nasty bit also makes no difference if blit has idle cycles.

leonard · 15 April 2013, 18:49

Hi Tony,

I'm working on a test version of doing 3d on A500, OCS. I made some test, showing timing using raster colors (oldskool

). I just wonder how accurate WinUAE is? I mean, I did all my timing tests on winUAE (I don't have A500). I'm interested by Blitter-interrupt (blitter is running, CPU too, and blitter interrupt is used)
Do you think I can "trust" winUAE is this configuration? ( I set "cycle exact")

Toni Wilen · 15 April 2013, 19:01

A500 cycle-exact "should" have perfect timing but I am 100% sure there are some CPU instructions that have wrong cycle usage. Chipset timing should be perfect.

I don't recommend blitter interrupts, at least if there are lots of small blits. 68000 exceptions (including interrupts) have long "startup", ~50 cycles or so and it does not even include saving/restoring registers and RTE.

Small blits gets finished before interrupt even starts

leonard · 15 April 2013, 22:15

Quote:

Originally Posted by Toni Wilen

I don't recommend blitter interrupts, at least if there are lots of small blits. 68000 exceptions (including interrupts) have long "startup", ~50 cycles or so and it does not even include saving/restoring registers and RTE.

I agree blitter interrupt is not the fastest way of doing blitter queue on Amiga. But I can't use COPPER because I use it to do some "sync to screen" stuff. So I guess I can't mix blitter commands because I may lose some sync points with the electron beam.

The CPU interrupt seems very long compared to the ATARI-ST, but if you confirm it's normal, then I have to take that into account.

Thanks!

07 April 2013, 21:16	#1
leonard Registered User Join Date: Apr 2013 Location: paris Posts: 133	Blitter fill timing Hi, I just saw a really cool demo from "Revision 2013" party. ( http://pouet.net/prod.php?which=61182 ). It's a AMIGA 500 (OCS chipset) demo featuring some really nice effects. In the end scroller, the author explain some tricks he use, and I'm curious about one thing. In a glenz vector part, author claim he has to use bitplan trick because the blitter is not able to fill three bitplan screen at 50hz. I'm ATARI-ST programer, I coded amiga stuff too (never released) but it was on A1200 so timings are not the same I guess. Can someone tell me exactly how % of VBL take a complete 3d fill of a 320*256 screen, in one bitplan, on a A500 OCS? (I did't find that on google ) Thanks in advance Leonard / OXYGENE

09 April 2013, 14:23	#14
Paradroid Rock Lobster Join Date: Nov 2012 Location: Macclesfield Age: 49 Posts: 40	I just use the winuae debugger, plus a whole load of verification and unit testing code so I don't need to visit it too often. oh how I miss source level debugging, lol I've got a snasm devkit here (same as what I was using when making the original RaD) that allows source level debugging of code running on the actual hardware, but I don't have a PC old enough to put it in - for some reason I've kept my 386 and 486 mobos, but not the memory chips or power supplies zzz -_- Last edited by Paradroid; 09 April 2013 at 14:56. Reason: missing words :P

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Clipping line for blitter fill	leonard	Coders. Asm / Hardware	12	27 April 2013 12:03
80 GB HD to fill!	fatboy	Amiga scene	16	20 July 2011 14:13
Sector fill pattern	absence	Coders. General	7	21 March 2009 21:50
WinUAE blitter <-> bitplane DMA timing accuracy?	Photon	Coders. General	1	24 November 2004 18:06
Fill 'em	Tim Janssen	request.Old Rare Games	1	27 June 2003 09:25

08 April 2013, 10:58	#2
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,502	A1200 chipset DMA timing is exact same as long as FMODE=0. CPU speed can be much faster (instruction cache, faster instructions and 32-bit wide bus to chipmem). 320x256 single bitplane blitter fill takes about 70 scanlines if all DMA slots are free. I'd estimate (too lazy to calculate anything) it is possible to fill 3 planes in one frame (with 3 bitplane display visible) but there would not be time left for anything else. It will get much worse if overscan is used.

08 April 2013, 14:01	#3
leonard Registered User Join Date: Apr 2013 Location: paris Posts: 133	Thanks Toni. So when we all see classic "glenz vector" objects such as the famous "glenz vector 48 faces" of the HardWeird demo, it's only possible because the glenz vector does not cover the complete screen. ( maybe it's 200200 pixels?). If it cover the whole screen, there is not enough blitter time to draw 320256 pixels. Other questions: imagine the CPU is filling some memory, does it slow-down the blitter a bit? Or is the CPU totally on different cycle than blitter? In other words, would it be possible to draw some polygons with CPU for free, during the time blitter did all its work, driven by a pre-computed COPPER list?

08 April 2013, 14:23	#4
leonard Registered User Join Date: Apr 2013 Location: paris Posts: 133	Oh BTW, there is 312 "time" scanlines per VBL in PAL, am I right? So three bitplans * 70 scanlines = 210 scanlines to fill 3320256, so there is 102 free scanlines, right? (about 1/3).

08 April 2013, 16:37	#5
diablothe2nd Registered User Join Date: Dec 2011 Location: Northamptonshire, UK Age: 41 Posts: 1,236	very impressive demo i'm curious to know how they did it too

09 April 2013, 08:10	#7
Paradroid Rock Lobster Join Date: Nov 2012 Location: Macclesfield Age: 49 Posts: 40	ah, just realised Leonard is the same guy I was talked to about this via email For the benefit of others, here's some of the relevant info I'd passed on... > "blitter can't fill 3 bitplans at 50hz" That was referring to overscan bitplanes. Redux uses a 352x272 display most of the time and the blitter wouldn't even be able to fill clear and fill 2 bitplanes at that resolution, even when using the cpu to help with the clear (well, it might just about do it, but not when drawing a lot of lines too). The area I'm filling with the blitter is clamped around the object, which is why I'm able to keep it at 50Hz. > But then you say when clip arrive, you switch to a four bitplan blitter routine. At this point I've switched to a smaller display area,192x192. At this size I can fill 4 bitplanes in just under half a frame, leaving the other half for clearing (which uses both blitter and CPU) and drawing the lines. *> How many time require the blitter to fill a one bitplan, 320256 pixels screen?** It's totally dependant on what else is active and using the DMA buss, such as number of active bitplanes, audio, sprites, etc. If you stick to a 2 bitplane display, write a fast clear and don't draw too many lines you could stay in 50Hz at that size. I just about managed it in the demo deja-vu with a full screen screen glenz (it only needed 2 bitplanes because you couldn't see the outline of the object), but I just couldn't get it fast enough in overscan... Hmmm, that was a very long time ago, maybe I should try again ^_^

09 April 2013, 10:17	#8
leonard Registered User Join Date: Apr 2013 Location: paris Posts: 133	Hi Paradroid Yes I'm the same guy glad you are on that forum too! I love world record in demos (I get some on ATARI st ) and I always thought 3d was "easy" on amiga. Now I see that it could be a world record to get a 320*256 glenz vector on a standard A500 OCS. Thanks for all explains, I see now that even mythic hardweird glenz 48 faces is quite small on the screen.

09 April 2013, 13:53	#11
Paradroid Rock Lobster Join Date: Nov 2012 Location: Macclesfield Age: 49 Posts: 40	a clear would be more than double the speed of a fill, although even then I wouldn't usually use a pure blitter clear myself. Depending on how you draw the object you may not need a traditional clear at all. For example, it might be quicker to redraw the lines again to wipe the old ones. That would require the fill to do a copy to another buffer rather than writing the result back to itself... Then again, maybe you might want to use a technique that doesn't need a fill at all. This is why I love programming the amiga, as with every effect, there loads of ways to go about rendering 3D using the cpu, blitter, copper, interrupts, etc, for various tasks in various configurations and orders, so I suggest you just grab yourself a framework if you don't have one already and just experiment. If all you have is blitter memory bandwidth numbers you sure ain't going to be getting anywhere near the potential of the machine.

09 April 2013, 14:19	#13
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,502	Technical info here because it is always interesting! Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles. Both have one idle cycle which is usable by the CPU and only by the CPU. Both idle and non-idle blitter cycles require DMA cycle that was not used by any other higher priority DMA channel. If this Blitter cycle was not actually used by Blitter (was blitter idle cycle), it becomes available for the CPU. This is very important undocumented feature that should help to optimize bitplane/blitter/CPU usage even better. btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)

15 April 2013, 18:49	#18
leonard Registered User Join Date: Apr 2013 Location: paris Posts: 133	Hi Tony, I'm working on a test version of doing 3d on A500, OCS. I made some test, showing timing using raster colors (oldskool ). I just wonder how accurate WinUAE is? I mean, I did all my timing tests on winUAE (I don't have A500). I'm interested by Blitter-interrupt (blitter is running, CPU too, and blitter interrupt is used) Do you think I can "trust" winUAE is this configuration? ( I set "cycle exact")

15 April 2013, 19:01	#19
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,502	A500 cycle-exact "should" have perfect timing but I am 100% sure there are some CPU instructions that have wrong cycle usage. Chipset timing should be perfect. I don't recommend blitter interrupts, at least if there are lots of small blits. 68000 exceptions (including interrupts) have long "startup", ~50 cycles or so and it does not even include saving/restoring registers and RTE. Small blits gets finished before interrupt even starts

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)