English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 07 April 2013, 21:16   #1
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Blitter fill timing

Hi,

I just saw a really cool demo from "Revision 2013" party. ( http://pouet.net/prod.php?which=61182 ). It's a AMIGA 500 (OCS chipset) demo featuring some really nice effects.
In the end scroller, the author explain some tricks he use, and I'm curious about one thing. In a glenz vector part, author claim he has to use bitplan trick because the blitter is not able to fill three bitplan screen at 50hz. I'm ATARI-ST programer, I coded amiga stuff too (never released) but it was on A1200 so timings are not the same I guess.

Can someone tell me exactly how % of VBL take a complete 3d fill of a 320*256 screen, in one bitplan, on a A500 OCS? (I did't find that on google )

Thanks in advance
Leonard / OXYGENE
leonard is offline  
Old 08 April 2013, 10:58   #2
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 45
Posts: 23,676
A1200 chipset DMA timing is exact same as long as FMODE=0. CPU speed can be much faster (instruction cache, faster instructions and 32-bit wide bus to chipmem).

320x256 single bitplane blitter fill takes about 70 scanlines if all DMA slots are free.

I'd estimate (too lazy to calculate anything) it is possible to fill 3 planes in one frame (with 3 bitplane display visible) but there would not be time left for anything else.

It will get much worse if overscan is used.
Toni Wilen is offline  
Old 08 April 2013, 14:01   #3
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Thanks Toni.

So when we all see classic "glenz vector" objects such as the famous "glenz vector 48 faces" of the HardWeird demo, it's only possible because the glenz vector does not cover the complete screen. ( maybe it's 200*200 pixels?).
If it cover the whole screen, there is not enough blitter time to draw 320*256 pixels.

Other questions: imagine the CPU is filling some memory, does it slow-down the blitter a bit? Or is the CPU totally on different cycle than blitter? In other words, would it be possible to draw some polygons with CPU for free, during the time blitter did all its work, driven by a pre-computed COPPER list?
leonard is offline  
Old 08 April 2013, 14:23   #4
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Oh BTW, there is 312 "time" scanlines per VBL in PAL, am I right? So three bitplans * 70 scanlines = 210 scanlines to fill 3*320*256, so there is 102 free scanlines, right? (about 1/3).
leonard is offline  
Old 08 April 2013, 16:37   #5
diablothe2nd
Registered User

diablothe2nd's Avatar
 
Join Date: Dec 2011
Location: Northampton, UK
Age: 37
Posts: 1,232
very impressive demo i'm curious to know how they did it too
diablothe2nd is offline  
Old 09 April 2013, 07:38   #6
Paradroid
Rock Lobster
 
Join Date: Nov 2012
Location: Macclesfield
Age: 45
Posts: 39
Quote:
Originally Posted by leonard View Post
Oh BTW, there is 312 "time" scanlines per VBL in PAL, am I right? So three bitplans * 70 scanlines = 210 scanlines to fill 3*320*256, so there is 102 free scanlines, right? (about 1/3).
No, because the DMA is being used to fetch the bitplanes for the display too, so it ends up taking practically a whole frame to fill 3*320*256 and that's without any other clearing or polygon rendering.


Quote:
Originally Posted by leonard View Post
Other questions: imagine the CPU is filling some memory, does it slow-down the blitter a bit? Or is the CPU totally on different cycle than blitter? In other words, would it be possible to draw some polygons with CPU for free, during the time blitter did all its work, driven by a pre-computed COPPER list?
This is effectively what I do when the glenz first appears, although I'm only using the CPU to calc the edges rather than fill the polygon, which is achieved via the copper displaying different lines from a pre-drawn triangle.

Note that the blitter can slow down the CPU if the code is running from chip ram. If the blitter nasty bit is set it can even stop the CPU, a feature I rely upon in the plasmas near the beginning of the demo.

Last edited by Paradroid; 09 April 2013 at 07:51.
Paradroid is offline  
Old 09 April 2013, 08:10   #7
Paradroid
Rock Lobster
 
Join Date: Nov 2012
Location: Macclesfield
Age: 45
Posts: 39
ah, just realised Leonard is the same guy I was talked to about this via email

For the benefit of others, here's some of the relevant info I'd passed on...

> "blitter can't fill 3 bitplans at 50hz"
That was referring to overscan bitplanes. Redux uses a 352x272 display most of the time and the blitter wouldn't even be able to fill clear and fill 2 bitplanes at that resolution, even when using the cpu to help with the clear (well, it might just about do it, but not when drawing a lot of lines too). The area I'm filling with the blitter is clamped around the object, which is why I'm able to keep it at 50Hz.

> But then you say when clip arrive, you switch to a four bitplan blitter routine.
At this point I've switched to a smaller display area,192x192. At this size I can fill 4 bitplanes in just under half a frame, leaving the other half for clearing (which uses both blitter and CPU) and drawing the lines.

> How many time require the blitter to fill a one bitplan, 320*256 pixels screen?
It's totally dependant on what else is active and using the DMA buss, such as number of active bitplanes, audio, sprites, etc. If you stick to a 2 bitplane display, write a fast clear and don't draw too many lines you could stay in 50Hz at that size. I just about managed it in the demo deja-vu with a full screen screen glenz (it only needed 2 bitplanes because you couldn't see the outline of the object), but I just couldn't get it fast enough in overscan... Hmmm, that was a very long time ago, maybe I should try again ^_^
Paradroid is offline  
Old 09 April 2013, 10:17   #8
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Hi Paradroid

Yes I'm the same guy glad you are on that forum too! I love world record in demos (I get some on ATARI st ) and I always thought 3d was "easy" on amiga. Now I see that it could be a world record to get a 320*256 glenz vector on a standard A500 OCS.

Thanks for all explains, I see now that even mythic hardweird glenz 48 faces is quite small on the screen.
leonard is offline  
Old 09 April 2013, 10:41   #9
Paradroid
Rock Lobster
 
Join Date: Nov 2012
Location: Macclesfield
Age: 45
Posts: 39
Quote:
Originally Posted by leonard View Post
Now I see that it could be a world record to get a 320*256 glenz vector on a standard A500 OCS.
I did that in 1992, so you'll need to go even bigger

Actually, IIRC I wasn't using the copper or interrupts for rendering the glenzes in Deja-vu, so it shouldn't be too hard to make them bigger as the CPU was proably just waiting for the blitter to finish half the time.

EDIT:
FYI, the record for OCS glenz faces is at least 192 (see Anarchy's 3D Demo II). Doing that full screen would be nice challenge to take on

Last edited by Paradroid; 09 April 2013 at 11:06.
Paradroid is offline  
Old 09 April 2013, 12:42   #10
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Quote:
Originally Posted by Paradroid View Post
I did that in 1992, so you'll need to go even bigger
Oh yes but the glenz is 2 bitplans only as you said right? (I mean, it works just because tyhe shape is zoomed so that we don't see the borders)

Quote:
Originally Posted by Paradroid View Post
FYI, the record for OCS glenz faces is at least 192 (see Anarchy's 3D Demo II). Doing that full screen would be nice challenge to take on
Oh I have to look at this demo. As I said I'm much more ATARI-ST demo specialist, I have some lack in my AMIGA demo culture

BTW could you tell me how much time it takes to CLEAR with the blitter, compared to "FILL" (in the same condition of bitplans, sound, copper, etc). Did the CLEAR is twice fast than FILL? or anything else?
leonard is offline  
Old 09 April 2013, 13:53   #11
Paradroid
Rock Lobster
 
Join Date: Nov 2012
Location: Macclesfield
Age: 45
Posts: 39
a clear would be more than double the speed of a fill, although even then I wouldn't usually use a pure blitter clear myself. Depending on how you draw the object you may not need a traditional clear at all. For example, it might be quicker to redraw the lines again to wipe the old ones. That would require the fill to do a copy to another buffer rather than writing the result back to itself...

Then again, maybe you might want to use a technique that doesn't need a fill at all. This is why I love programming the amiga, as with every effect, there loads of ways to go about rendering 3D using the cpu, blitter, copper, interrupts, etc, for various tasks in various configurations and orders, so I suggest you just grab yourself a framework if you don't have one already and just experiment. If all you have is blitter memory bandwidth numbers you sure ain't going to be getting anywhere near the potential of the machine.
Paradroid is offline  
Old 09 April 2013, 14:09   #12
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Quote:
Originally Posted by Paradroid View Post
Then again, maybe you might want to use a technique that doesn't need a fill at all. This is why I love programming the amiga, as with every effect, there loads of ways to go about rendering 3D using the cpu, blitter, copper, interrupts, etc, for various tasks in various configurations and orders
Totally agree with you. Beating the glenz vector record require some carefully fintuned balance between blitter, cpu and copper.

Quote:
Originally Posted by Paradroid View Post
I suggest you just grab yourself a framework if you don't have one already and just experiment. If all you have is blitter memory bandwidth numbers you sure ain't going to be getting anywhere near the potential of the machine.
I have my own framework to do ATARI-ST demo, running on Windows ( kernel, packer, track loader, etc). I had done the same on my A1200 long time ago, but I will convert it for windows so I can test stuff myself.

I tryed the built-in debugger of WinUAE. Not really bad but far from good to make devleoppement. What debugger are you using when you develop amiga stuff on windows platform?
leonard is offline  
Old 09 April 2013, 14:19   #13
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 45
Posts: 23,676
Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Both idle and non-idle blitter cycles require DMA cycle that was not used by any other higher priority DMA channel. If this Blitter cycle was not actually used by Blitter (was blitter idle cycle), it becomes available for the CPU.

This is very important undocumented feature that should help to optimize bitplane/blitter/CPU usage even better.

btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)
Toni Wilen is offline  
Old 09 April 2013, 14:23   #14
Paradroid
Rock Lobster
 
Join Date: Nov 2012
Location: Macclesfield
Age: 45
Posts: 39
I just use the winuae debugger, plus a whole load of verification and unit testing code so I don't need to visit it too often. oh how I miss source level debugging, lol

I've got a snasm devkit here (same as what I was using when making the original RaD) that allows source level debugging of code running on the actual hardware, but I don't have a PC old enough to put it in - for some reason I've kept my 386 and 486 mobos, but not the memory chips or power supplies zzz -_-

Last edited by Paradroid; 09 April 2013 at 14:56. Reason: missing words :P
Paradroid is offline  
Old 09 April 2013, 18:42   #15
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Quote:
Originally Posted by Toni Wilen View Post
Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.

Both idle and non-idle blitter cycles require DMA cycle that was not used by any other higher priority DMA channel. If this Blitter cycle was not actually used by Blitter (was blitter idle cycle), it becomes available for the CPU.

This is very important undocumented feature that should help to optimize bitplane/blitter/CPU usage even better.

btw, WinUAE "dma" debugger can be used to check DMA channel usage ("v" command)
Thanks for additional technical details. Let's suppose blitter is filling an area, you say it has 3busy cycles and 1 free per word. Does it means that you get 1/4 of the memory speed from a CPU side. ( even a tower of NOP instructions consume memory by reading opcodes in memory). It means that a single tower of NOPs execute 1/4 slower if blitter is filling at the same time? Or did I miss something?
leonard is offline  
Old 09 April 2013, 20:52   #16
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 318
Quote:
Originally Posted by Toni Wilen View Post
Technical info here because it is always interesting!

Fill is always at least 3 blitter cycles/word. Plain clear takes 2 blitter cycles.

Both have one idle cycle which is usable by the CPU and only by the CPU.
Very interesting.

This suggests that triple buffering is worth trying when using area fill.

Buffer 1 -- display
Buffer 2 -- area fill poly
Buffer 3 -- CPU clear with MOVEMs
mc6809e is offline  
Old 09 April 2013, 21:15   #17
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 45
Posts: 23,676
Quote:
Originally Posted by leonard View Post
Thanks for additional technical details. Let's suppose blitter is filling an area, you say it has 3busy cycles and 1 free per word.
3 cycles was the total, 1 read, 1 write, 1 is idle. (It can become 4 cycles if 3 or 4 channels are enabled, note that some 3 channel combinations only use 3 cycles even in fill mode, "fill idle" cycle is included only in some combinations)

Quote:
Does it means that you get 1/4 of the memory speed from a CPU side. ( even a tower of NOP instructions consume memory by reading opcodes in memory). It means that a single tower of NOPs execute 1/4 slower if blitter is filling at the same time? Or did I miss something?
Yes (everyone knows that CPU gets slower when blitter is active) but more important is that every blitter idle cycle is _always_ available for the CPU (afaik Agnus has some shared logic that blitter needs during idle cycles) = always do something useful with the CPU instead of just polling blitter finished bit uselessly if you just started blit with idle cycles (This seemed to be the most common way to waste raster time..) Blitter nasty bit also makes no difference if blit has idle cycles.
Toni Wilen is offline  
Old 15 April 2013, 18:49   #18
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Hi Tony,

I'm working on a test version of doing 3d on A500, OCS. I made some test, showing timing using raster colors (oldskool ). I just wonder how accurate WinUAE is? I mean, I did all my timing tests on winUAE (I don't have A500). I'm interested by Blitter-interrupt (blitter is running, CPU too, and blitter interrupt is used)
Do you think I can "trust" winUAE is this configuration? ( I set "cycle exact")
leonard is offline  
Old 15 April 2013, 19:01   #19
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 45
Posts: 23,676
A500 cycle-exact "should" have perfect timing but I am 100% sure there are some CPU instructions that have wrong cycle usage. Chipset timing should be perfect.

I don't recommend blitter interrupts, at least if there are lots of small blits. 68000 exceptions (including interrupts) have long "startup", ~50 cycles or so and it does not even include saving/restoring registers and RTE.

Small blits gets finished before interrupt even starts
Toni Wilen is offline  
Old 15 April 2013, 22:15   #20
leonard
Registered User
leonard's Avatar
 
Join Date: Apr 2013
Location: paris
Posts: 40
Quote:
Originally Posted by Toni Wilen View Post
I don't recommend blitter interrupts, at least if there are lots of small blits. 68000 exceptions (including interrupts) have long "startup", ~50 cycles or so and it does not even include saving/restoring registers and RTE.
I agree blitter interrupt is not the fastest way of doing blitter queue on Amiga. But I can't use COPPER because I use it to do some "sync to screen" stuff. So I guess I can't mix blitter commands because I may lose some sync points with the electron beam.

The CPU interrupt seems very long compared to the ATARI-ST, but if you confirm it's normal, then I have to take that into account.

Thanks!
leonard is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Clipping line for blitter fill leonard Coders. Asm / Hardware 12 27 April 2013 12:03
80 GB HD to fill! fatboy Amiga scene 16 20 July 2011 14:13
Sector fill pattern absence Coders. General 7 21 March 2009 21:50
WinUAE blitter <-> bitplane DMA timing accuracy? Photon Coders. General 1 24 November 2004 18:06
Fill 'em Tim Janssen request.Old Rare Games 1 27 June 2003 09:25

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:16.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.
Page generated in 0.10801 seconds with 13 queries