English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)
-   -   Blitter fill timing (https://eab.abime.net/showthread.php?t=68708)

leonard 07 April 2013 21:16

Blitter fill timing
 
Hi,

I just saw a really cool demo from "Revision 2013" party. ( http://pouet.net/prod.php?which=61182 ). It's a AMIGA 500 (OCS chipset) demo featuring some really nice effects.
In the end scroller, the author explain some tricks he use, and I'm curious about one thing. In a glenz vector part, author claim he has to use bitplan trick because the blitter is not able to fill three bitplan screen at 50hz. I'm ATARI-ST programer, I coded amiga stuff too (never released) but it was on A1200 so timings are not the same I guess.

Can someone tell me exactly how % of VBL take a complete 3d fill of a 320*256 screen, in one bitplan, on a A500 OCS? (I did't find that on google :( )

Thanks in advance
Leonard / OXYGENE

Toni Wilen 08 April 2013 10:58

A1200 chipset DMA timing is exact same as long as FMODE=0. CPU speed can be much faster (instruction cache, faster instructions and 32-bit wide bus to chipmem).

320x256 single bitplane blitter fill takes about 70 scanlines if all DMA slots are free.

I'd estimate (too lazy to calculate anything) it is possible to fill 3 planes in one frame (with 3 bitplane display visible) but there would not be time left for anything else.

It will get much worse if overscan is used.

leonard 08 April 2013 14:01

Thanks Toni.

So when we all see classic "glenz vector" objects such as the famous "glenz vector 48 faces" of the HardWeird demo, it's only possible because the glenz vector does not cover the complete screen. ( maybe it's 200*200 pixels?).
If it cover the whole screen, there is not enough blitter time to draw 320*256 pixels.

Other questions: imagine the CPU is filling some memory, does it slow-down the blitter a bit? Or is the CPU totally on different cycle than blitter? In other words, would it be possible to draw some polygons with CPU for free, during the time blitter did all its work, driven by a pre-computed COPPER list?

leonard 08 April 2013 14:23

Oh BTW, there is 312 "time" scanlines per VBL in PAL, am I right? So three bitplans * 70 scanlines = 210 scanlines to fill 3*320*256, so there is 102 free scanlines, right? (about 1/3).

diablothe2nd 08 April 2013 16:37

very impressive demo :great i'm curious to know how they did it too :D

Paradroid 09 April 2013 07:38

Quote:

Originally Posted by leonard (Post 880280)
Oh BTW, there is 312 "time" scanlines per VBL in PAL, am I right? So three bitplans * 70 scanlines = 210 scanlines to fill 3*320*256, so there is 102 free scanlines, right? (about 1/3).

No, because the DMA is being used to fetch the bitplanes for the display too, so it ends up taking practically a whole frame to fill 3*320*256 and that's without any other clearing or polygon rendering.


Quote:

Originally Posted by leonard (Post 880277)
Other questions: imagine the CPU is filling some memory, does it slow-down the blitter a bit? Or is the CPU totally on different cycle than blitter? In other words, would it be possible to draw some polygons with CPU for free, during the time blitter did all its work, driven by a pre-computed COPPER list?

This is effectively what I do when the glenz first appears, although I'm only using the CPU to calc the edges rather than fill the polygon, which is achieved via the copper displaying different lines from a pre-drawn triangle.

Note that the blitter can slow down the CPU if the code is running from chip ram. If the blitter nasty bit is set it can even stop the CPU, a feature I rely upon in the plasmas near the beginning of the demo.

Paradroid 09 April 2013 08:10

ah, just realised Leonard is the same guy I was talked to about this via email :)

For the benefit of others, here's some of the relevant info I'd passed on...

> "blitter can't fill 3 bitplans at 50hz"
That was referring to overscan bitplanes. Redux uses a 352x272 display most of the time and the blitter wouldn't even be able to fill clear and fill 2 bitplanes at that resolution, even when using the cpu to help with the clear (well, it might just about do it, but not when drawing a lot of lines too). The area I'm filling with the blitter is clamped around the object, which is why I'm able to keep it at 50Hz.

> But then you say when clip arrive, you switch to a four bitplan blitter routine.
At this point I've switched to a smaller display area,192x192. At this size I can fill 4 bitplanes in just under half a frame, leaving the other half for clearing (which uses both blitter and CPU) and drawing the lines.

> How many time require the blitter to fill a one bitplan, 320*256 pixels screen?
It's totally dependant on what else is active and using the DMA buss, such as number of active bitplanes, audio, sprites, etc. If you stick to a 2 bitplane display, write a fast clear and don't draw too many lines you could stay in 50Hz at that size. I just about managed it in the demo deja-vu with a full screen screen glenz (it only needed 2 bitplanes because you couldn't see the outline of the object), but I just couldn't get it fast enough in overscan... Hmmm, that was a very long time ago, maybe I should try again ^_^

leonard 09 April 2013 10:17

Hi Paradroid

Yes I'm the same guy :) glad you are on that forum too! I love world record in demos (I get some on ATARI st :)) and I always thought 3d was "easy" on amiga. Now I see that it could be a world record to get a 320*256 glenz vector on a standard A500 OCS.

Thanks for all explains, I see now that even mythic hardweird glenz 48 faces is quite small on the screen.

Paradroid 09 April 2013 10:41

Quote:

Originally Posted by leonard (Post 880491)
Now I see that it could be a world record to get a 320*256 glenz vector on a standard A500 OCS.

I did that in 1992, so you'll need to go even bigger :cheese

Actually, IIRC I wasn't using the copper or interrupts for rendering the glenzes in Deja-vu, so it shouldn't be too hard to make them bigger as the CPU was proably just waiting for the blitter to finish half the time.

EDIT:
FYI, the record for OCS glenz faces is at least 192 (see Anarchy's 3D Demo II). Doing that full screen would be nice challenge to take on :)

leonard 09 April 2013 12:42

Quote:

Originally Posted by Paradroid (Post 880493)
I did that in 1992, so you'll need to go even bigger :cheese

Oh yes but the glenz is 2 bitplans only as you said right? (I mean, it works just because tyhe shape is zoomed so that we don't see the borders)

Quote:

Originally Posted by Paradroid (Post 880493)
FYI, the record for OCS glenz faces is at least 192 (see Anarchy's 3D Demo II). Doing that full screen would be nice challenge to take on :)

Oh I have to look at this demo. As I said I'm much more ATARI-ST demo specialist, I have some lack in my AMIGA demo culture :)

BTW could you tell me how much time it takes to CLEAR with the blitter, compared to "FILL" (in the same condition of bitplans, sound, copper, etc). Did the CLEAR is twice fast than FILL? or anything else?

Paradroid 09 April 2013 13:53

a clear would be more than double the speed of a fill, although even then I wouldn't usually use a pure blitter clear myself. Depending on how you draw the object you may not need a traditional clear at all. For example, it might be quicker to redraw the lines again to wipe the old ones. That would require the fill to do a copy to another buffer rather than writing the result back to itself...

Then again, maybe you might want to use a technique that doesn't need a fill at all. This is why I love programming the amiga, as with every effect, there loads of ways to go about rendering 3D using the cpu, blitter, copper, interrupts, etc, for various tasks in various configurations and orders, so I suggest you just grab yourself a framework if you don't have one already and just experiment. If all you have is blitter memory bandwidth numbers you sure ain't going to be getting anywhere near the potential of the machine.

leonard 09 April 2013 14:09

Quote:

Originally Posted by Paradroid (Post 880507)
Then again, maybe you might want to use a technique that doesn't need a fill at all. This is why I love programming the amiga, as with every effect, there loads of ways to go about rendering 3D using the cpu, blitter, copper, interrupts, etc, for various tasks in various configurations and orders

Totally agree with you. Beating the glenz vector record require some carefully fintuned balance between blitter, cpu and copper.

Quote:

Originally Posted by Paradroid (Post 880507)
I suggest you just grab yourself a framework if you don't have one already and just experiment. If all you have is blitter memory bandwidth numbers you sure ain't going to be getting anywhere near the potential of the machine.

I have my own framework to do ATARI-ST demo, running on Windows ( kernel, packer, track loader, etc). I had done the same on my A1200 long time ago, but I will convert it for windows so I can test stuff myself.

I tryed the built-in debugger of WinUAE. Not really bad but far from good to make devleoppement. What debugger are you using when you develop amiga stuff on windows platform?

Toni Wilen 09 April 2013 14:19

Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Both idle and non-idle blitter cycles require DMA cycle that was not used by any other higher priority DMA channel. If this Blitter cycle was not actually used by Blitter (was blitter idle cycle), it becomes available for the CPU.

This is very important undocumented feature that should help to optimize bitplane/blitter/CPU usage even better.

btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)

Paradroid 09 April 2013 14:23

I just use the winuae debugger, plus a whole load of verification and unit testing code so I don't need to visit it too often. oh how I miss source level debugging, lol

I've got a snasm devkit here (same as what I was using when making the original RaD) that allows source level debugging of code running on the actual hardware, but I don't have a PC old enough to put it in - for some reason I've kept my 386 and 486 mobos, but not the memory chips or power supplies zzz -_-

leonard 09 April 2013 18:42

Quote:

Originally Posted by Toni Wilen (Post 880510)
Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Both idle and non-idle blitter cycles require DMA cycle that was not used by any other higher priority DMA channel. If this Blitter cycle was not actually used by Blitter (was blitter idle cycle), it becomes available for the CPU.

This is very important undocumented feature that should help to optimize bitplane/blitter/CPU usage even better.

btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)

Thanks for additional technical details. Let's suppose blitter is filling an area, you say it has 3busy cycles and 1 free per word. Does it means that you get 1/4 of the memory speed from a CPU side. ( even a tower of NOP instructions consume memory by reading opcodes in memory). It means that a single tower of NOPs execute 1/4 slower if blitter is filling at the same time? Or did I miss something?

mc6809e 09 April 2013 20:52

Quote:

Originally Posted by Toni Wilen (Post 880510)
Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Very interesting.

This suggests that triple buffering is worth trying when using area fill.

Buffer 1 -- display
Buffer 2 -- area fill poly
Buffer 3 -- CPU clear with MOVEMs

Toni Wilen 09 April 2013 21:15

Quote:

Originally Posted by leonard (Post 880556)
Thanks for additional technical details. Let's suppose blitter is filling an area, you say it has 3busy cycles and 1 free per word.

3 cycles was the total, 1 read, 1 write, 1 is idle. (It can become 4 cycles if 3 or 4 channels are enabled, note that some 3 channel combinations only use 3 cycles even in fill mode, "fill idle" cycle is included only in some combinations)

Quote:

Does it means that you get 1/4 of the memory speed from a CPU side. ( even a tower of NOP instructions consume memory by reading opcodes in memory). It means that a single tower of NOPs execute 1/4 slower if blitter is filling at the same time? Or did I miss something?
Yes (everyone knows that CPU gets slower when blitter is active) but more important is that every blitter idle cycle is _always_ available for the CPU (afaik Agnus has some shared logic that blitter needs during idle cycles) = always do something useful with the CPU instead of just polling blitter finished bit uselessly if you just started blit with idle cycles (This seemed to be the most common way to waste raster time..) Blitter nasty bit also makes no difference if blit has idle cycles.

leonard 15 April 2013 18:49

Hi Tony,

I'm working on a test version of doing 3d on A500, OCS. I made some test, showing timing using raster colors (oldskool :)). I just wonder how accurate WinUAE is? I mean, I did all my timing tests on winUAE (I don't have A500). I'm interested by Blitter-interrupt (blitter is running, CPU too, and blitter interrupt is used)
Do you think I can "trust" winUAE is this configuration? ( I set "cycle exact")

Toni Wilen 15 April 2013 19:01

A500 cycle-exact "should" have perfect timing but I am 100% sure there are some CPU instructions that have wrong cycle usage. Chipset timing should be perfect.

I don't recommend blitter interrupts, at least if there are lots of small blits. 68000 exceptions (including interrupts) have long "startup", ~50 cycles or so and it does not even include saving/restoring registers and RTE.

Small blits gets finished before interrupt even starts :)

leonard 15 April 2013 22:15

Quote:

Originally Posted by Toni Wilen (Post 881901)
I don't recommend blitter interrupts, at least if there are lots of small blits. 68000 exceptions (including interrupts) have long "startup", ~50 cycles or so and it does not even include saving/restoring registers and RTE.

I agree blitter interrupt is not the fastest way of doing blitter queue on Amiga. But I can't use COPPER because I use it to do some "sync to screen" stuff. So I guess I can't mix blitter commands because I may lose some sync points with the electron beam.

The CPU interrupt seems very long compared to the ATARI-ST, but if you confirm it's normal, then I have to take that into account.

Thanks!


All times are GMT +2. The time now is 14:57.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.10103 seconds with 11 queries