English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 15 October 2020, 10:57   #1
girv
Mostly Harmless

girv's Avatar
 
Join Date: Aug 2004
Location: Northern Ireland
Posts: 972
Blitter speed in "cookie cut words per frame" ?

I've been looking for a figure on blitter speed in terms of the number of cookie-cut words a base A500 can write at 50fps, in an attempt to set a frame budget in terms of the bobs the game uses (ie: "max 8 enemies and 8 explosions" or whatever). I'm not sure how to calculate it.



The HRM says a cookie cut blit takes 8 ticks per cycle
= 886250 cycles per second on a PAL 7.09Mhz A500
= 17725 cycles per 50Hz frame


Does that equate to 17725*2 = ~35Kb of cookie-cut words written in 1/50s? Equivalent to ~280 16x16, 16-colour interleaved bobs?

There are other, faster blitter ops going on like clears and copies, but I'm ignoring those for simplicity.



FWIW, I'm not doing anything like interrupt or copper driven blitter queues. BLTHOG is set during the blitter wait busy loop. This is old code, quite a lot of it, that I don't have time to rewrite, so I'd just like to get an estimate for the number of bobs I can expect to be able to draw like this.


Any help?
girv is offline  
Old 15 October 2020, 12:02   #2
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,755
Ok, first things first. The 35KB/sec is both correct and incorrect
It's correct that the Blitter can, in theory, do this much cookiecut blitting per frame.

However this is only the case if there is no CPU activity, no Copper activity, no bitplane activity (i.e. no visible screen), no audio, etc. In reality, the Amiga (OCS/ECS systems in particular) use up a significant portion of the cycles for these other activities, which the Blitter would need to be this fast. As such, you'll usually never get even close to 35KB/sec cookiecut blitting on an A500.

Second, cookiecut is only half the story - to blit you also need to restore the background in some fashion every frame. Meaning the actual cost of blitting a bob should include the cycles used to restore the background image to it's original contents. This will cost at least 4 cycles* per word.

And last but not least, in order to blit a cookiecut object you need to keep in mind that most bobs need to be shifted into place by the Blitter and therefore need to be blit with the word needed to shift into.

So, with that in mind: the cost for a 16x16 bob (with shift space & restore, 16 colours) is at least 1536 cycles*. The PAL Amiga has approximately 141800 cycles per frame* (not counting refresh or long/short frames), so the theoretical maximum number of 16x16x4 bobs drawn is then 141800/1536=92 (138 if we don't count the restore).

But as said, the cost of bitplane activity is quite high and there are other DMA sources that steal cycles from the Blitter (including the CPU or Copper to set up the blits), leading a more realistic figure in the ballpark of about 50-60 (75-90 if restore is not counted) such bobs if you do pretty much nothing but draw bobs. This assumes a screen size of 320x256 in 4 bitplanes. Any game/demo logic will further reduce the number.

Edit: one final note, it's almost always more efficient to draw wider objects as the cost for the extra word to shift is half of the total cost for a 16x16 bob, but only 1/3 of the total cost of a 32x32 bob, etc.

*) 7MHz cycles

Last edited by roondar; 15 October 2020 at 13:16. Reason: Corrected some number
roondar is offline  
Old 15 October 2020, 12:24   #3
girv
Mostly Harmless

girv's Avatar
 
Join Date: Aug 2004
Location: Northern Ireland
Posts: 972
Awesome, thanks Some hard figures to work with there. I'd forgotten about shifting costing extra fetches too!

It's using a D=A blit to restore 16x16 background tiles. These are all unshifted and use a dirty flag so multiple bobs on one tile will see the tile redrawn just once. Though one 16x16 bob will probably cause 4 tiles to be redrawn to restore it, and many bobs are 16x24 so that's 6 tiles.

I could probably clip the restore blits to just the damaged lines to mitigate the cost if it comes to that.

I may have to reduce the size of my explosions :/

Last edited by girv; 15 October 2020 at 13:31.
girv is offline  
Old 15 October 2020, 13:28   #4
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,755
Well, the two "standard" ways of Blitter restoring are either to use a small restore buffer per bob blit (so you copy the rectangle the bob is to occupy to a buffer zone before blitting and then restore the background from that buffer the next frame) and to use a 3rd buffer that contains the background but no bobs.

The advantage of the methods above is that you only need to use one or two blits per object to restore the background instead of 4+ and it also is a simpler algorithm, which again speeds it up a bit. Drawback is using more memory, which your solution completely avoids.

My example calculation assumed a 3rd buffer with the background in it that is used to restore, as that has been the fastest way to do it in my experience (I did consider the tile based method, but the extra overhead for blitting 4+ tiles and the associated array accesses turned out to be higher than I thought so I've not used it since).
roondar is offline  
Old 15 October 2020, 13:31   #5
girv
Mostly Harmless

girv's Avatar
 
Join Date: Aug 2004
Location: Northern Ireland
Posts: 972
Yeah, I thought the tile based restore would be faster overall, but perhaps not. I've about 10Kb RAM free for any extra buffers though.
girv is offline  
Old 15 October 2020, 13:48   #6
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,755
It's going to be slower than the 3rd buffer approach. Not so sure about it being slower than the restore buffer approach. Would depend a bit on how much effort fetching the tiles from the array takes and how optimal the blitting routine is (this is me saying I never tried finding the difference )
roondar is offline  
Old 15 October 2020, 13:59   #7
girv
Mostly Harmless

girv's Avatar
 
Join Date: Aug 2004
Location: Northern Ireland
Posts: 972
For a restore buffer approach I guess I'd have to save the backgrounds first before drawing any bobs, but the code has update+draw combined (so update can run in parallel with blits) so it would be a big change.

If I can find 60Kb I could implement the 3rd buffer approach more easily. Something to consider. Thanks
girv is offline  
Old 15 October 2020, 15:31   #8
girv
Mostly Harmless

girv's Avatar
 
Join Date: Aug 2004
Location: Northern Ireland
Posts: 972
For 3rd buffer, I guess you stored just the start address, width (words) and height (lines) of each damaged block, and didn't attempt to merge any overlapping blocks in to single blits?
girv is offline  
Old 15 October 2020, 15:38   #9
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,755
Indeed, what I tend to do is storing the source & destination address, Blitter size register value and D modulo value (assuming I'm not missing a value here - I think this is correct ). One of those per Bob.

Combining blocks could be interesting, but you do run the risk of making the rectangle you draw much bigger than the individual blocks, so it needs some logic to prevent that. As is, I've been thinking about using a more proper "dirty rectangles" approach which would allow combining blocks. But I'm not completely convinced the extra overhead will be covered by the potentially lower blitting cost.
roondar is offline  
Old 15 October 2020, 15:49   #10
girv
Mostly Harmless

girv's Avatar
 
Join Date: Aug 2004
Location: Northern Ireland
Posts: 972
For a full rectangle merger you'd maybe want some spatial hashing to cut down on the number of intersection checks, then work out if the merged block would save cycles compared to the separate blits. With some fudge for the extra CPU waiting on the larger blit perhaps. Probably too complex to throw at a 7Mhz 68000

Maybe there's a version where you can quickly compare start addresses or something to see if they are equal or immediately adjacent. That might happen often enough, I suppose.
girv is offline  
Old 15 October 2020, 21:58   #11
alpine9000
Registered User

 
Join Date: Mar 2016
Location: Australia
Posts: 853
I use the tile based restore in Metro Siege and phx uses it in his recent games (he gave me the idea)

Main advantage is obviously no third buffer or restore buffers which fior me was a must as my artists are always screaming for more chip ram. The performance is not radically different to other approaches. Compared with a third buffer you no longer have to maintain the third buffer when scrolling. And compared with save/restore you obviously skip the “save”. You get advantages when you have lots of overlapping action, but you can’t count on this as you normally must be able to also manage the case where nothing overlaps.

You can also combine the rendering of off screen tiles when scrolling by marking them as dirty.

So while I do encourage you to test out the alternatives, don’t discount your current method as it is a proven and (becoming) commonly used techniques as well.
alpine9000 is offline  
Old 15 October 2020, 23:05   #12
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,755
Hmm, I might have to retest it then. In my tests the 3rd buffer approach was always clearly the fastest option (in particular by a sizeable margin compared to the restore buffer option), even if I had a scrolling screen and needed to update the 3rd buffer tiles.

I always took it to be logical - unless you scroll very quickly, you never need to update more than at most a few tiles per frame vs potentially many extra blits for the tilemap based restore or restore buffer based restore. That said... I do agree that if the performance actually is similar to using 3 buffers then the tilemap restore certainly is the best overall option. Hard to beat that much extra chip memory being made available!

So, perhaps it's time to test again

Last edited by roondar; 15 October 2020 at 23:16.
roondar is offline  
Old 15 October 2020, 23:22   #13
alpine9000
Registered User

 
Join Date: Mar 2016
Location: Australia
Posts: 853
Quote:
Originally Posted by roondar View Post
Hmm, I might have to retest it then. In my tests the 3rd buffer approach was always clearly the fastest option (in particular by a sizeable margin compared to the restore buffer option), even if I had a scrolling screen and needed to update the 3rd buffer tiles.

I always took it to be logical - unless you scroll very quickly, you never need to update more than at most a few tiles per frame vs potentially many extra blits for the tilemap based restore or restore buffer based restore. That said... I do agree that if the performance actually is similar to using 3 buffers then the tilemap restore certainly the best overall option. Hard to beat that much extra chip memory being made available!

So, perhaps it's time to test again
In my tests it was slightly slower, but not that much. All blits being the exact same size and configuration helps save on blitter setup for each tile. It's important that the code for managing dirty tiles is optimised or else the tile method will be a lot slower. Finally, simple A=>D blits are pretty fast compared to cookie cutter, so even if the "restore" phase is a significant percentage slower than a triple buffer approach, the overal performance penalty is not massive and well worth the extra chip ram if you're desperate for it.

edit: Metro Siege scrolls quite quickly in some scenarios, I think even up a full column of tiles in a single frame, so that also factored into my performance comparisons.
alpine9000 is offline  
Old 15 October 2020, 23:27   #14
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,755
Quote:
Originally Posted by alpine9000 View Post
In my tests it was slightly slower, but not that much. All blits being the exact same size and configuration helps save on blitter setup for each tile. It's important that the code for managing dirty tiles is optimised or else the tile method will be a lot slower. Finally, simple A=>D blits are pretty fast compared to cookie cutter, so even if the "restore" phase is a significant percentage slower than a triple buffer approach, the overal performance penalty is not massive and well worth the extra chip ram if you're desperate for it.
Yeah, AD blits are twice as fast as cookie cut blits, that's true.

Like I said, I think your (and PHX's) use of this technique is enough for me to revisit it at least one more time. It's a very nice idea if you can get it to perform because that 3rd buffer ain't cheap. As for my code, I had some real issues last time I tried it. In particular, back then I never found a fast way to get the determination of which tiles to blit (and how many tiles to blit for each bob) to be done, while still being generic enough.
Quote:
edit: Metro Siege scrolls quite quickly in some scenarios, I think even up a full column of tiles in a single frame, so that also factored into my performance comparisons.

That fast!? That's something like 16px/frame! Wow, colour me even more interested in the game
roondar is offline  
Old 15 October 2020, 23:29   #15
Jobbo
Registered User

Jobbo's Avatar
 
Join Date: Jun 2020
Location: Lexington, MA
Posts: 171
With the tile approach you could also specialize the restore for each tile. So, if for example you have a single color tile you could replace the restore with a blitter clear instead of a copy. That would potentially save a little extra time. If you're not running interleaved then you could go so far as to customize the restore for each bitplane of a tile so any blank or white bitplanes are done with a clear.
Jobbo is offline  
Old 15 October 2020, 23:39   #16
alpine9000
Registered User

 
Join Date: Mar 2016
Location: Australia
Posts: 853
Quote:
Originally Posted by roondar View Post
Yeah, AD blits are twice as fast as cookie cut blits, that's true.

Like I said, I think your (and PHX's) use of this technique is enough for me to revisit it at least one more time. It's a very nice idea if you can get it to perform because that 3rd buffer ain't cheap. As for my code, I had some real issues last time I tried it. In particular, back then I never found a fast way to get the determination of which tiles to blit (and how many tiles to blit for each bob) to be done, while still being generic enough.


That fast!? That's something like 16px/frame! Wow, colour me even more interested in the game
I use a word for each column with each bit mapping to a tile. This is ok for a horizontal scrolling game up to 256 high. Means you can check if a column has any dirty bits by checking if the word is zero, and then process the individual bits only when required.
alpine9000 is offline  
Old 15 October 2020, 23:47   #17
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,755
That's quite clever, might try something like that
roondar is offline  
Old 16 October 2020, 09:17   #18
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 1,940
Very Important is to optimize the actual planes' number that need to be full cookie-cut. If you use less plane that screens' you can save some cicles just by making a hole into unused planes, insted doing full cookie-cut.
sandruzzo is offline  
Old 16 October 2020, 09:19   #19
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 1,940
with 3rd buffer you can encode "restore" into coockie-cut, just by left some empty space around bobs based upon their motions' speed
sandruzzo is offline  
Old 16 October 2020, 12:16   #20
AnimaInCorpore
Registered User
 
Join Date: Nov 2012
Location: Willich/Germany
Posts: 182
Quote:
Originally Posted by roondar View Post
But as said, the cost of bitplane activity is quite high and there are other DMA sources that steal cycles from the Blitter (including the CPU or Copper to set up the blits), leading a more realistic figure in the ballpark of about 50-60 (75-90 if restore is not counted) such bobs if you do pretty much nothing but draw bobs.
https://jsfiddle.net/p2d53qLn/2/ says about the theoretical limit (16 x 16 pixels @ 320 x 256 x 4 bpl):
Code:
Drawing speed (Bobs per PAL frame): 

63.81 (draw Bobs + restore background) 
47.86 (save background + draw Bobs + restore background)
AnimaInCorpore is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Weird Issue - Running "Nemiac IV: Director's Cut" from Workbench 3.1 DamienD support.Other 7 27 April 2018 16:39
Blizzard 1230 IV "Do Not Cut" esel support.Hardware 2 16 July 2015 18:09
Single-frame "recorder" emulation? LocalH request.UAE Wishlist 2 19 March 2015 18:49
One "hole" in each scan line to turn off blitter nasty? mc6809e Coders. Asm / Hardware 1 03 July 2012 12:12
Propper speed request when recording with "Disable frame rate" turned on. Ironclaw request.UAE Wishlist 9 02 August 2006 07:21

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:27.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, vBulletin Solutions Inc.
Page generated in 0.13553 seconds with 15 queries