English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 21 October 2016, 20:18   #1
Tigerskunk
Inviyya Dude!
 
Tigerskunk's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,770
How to calculate possible blit times?

What I want to know is, what amount of cycles do I have to blit something to the screen, and how do I calculate what my running blits are consuming of that amount of cycles already?

Currently I am switching the background ($dff180) to different colors before and after my blitting operations to see how much blit time I have left.

And I noticed that I use up around double as much blit time when I do A->D blits than when I am doing Cookie Cutter Blits with all 3 DMA (ABC) Channels involved.

On my current project, I get around 38 32x32 Bobs (with 1 extra word for barrel shifting) in 3 planes unto the screen, where as I get only 18 on the screen with a cookie cutter Minterm..

I'd like to calculate if this is near what the blitter is capable of, or if I lose too many cycles through overhead somewhere before.

Thanks for any ideas and help on this...
Tigerskunk is offline  
Old 21 October 2016, 22:55   #2
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,502
HRM blitter cycle diagram ("Typical Blitter Cycle Sequence") should help but note that it does not explain when fill mode adds 1 extra cycle. Also note that blitter idle cycles ('-' in diagram) need free cycle, they are not true idle cycles. Only difference compared to normal blitter cycle is that CPU can use blitter idle cycles. (blitter nasty makes no difference, CPU will still get the cycle)

For real world results you also need to count bitplane DMA, copper etc.. cycles that steal blitter cycles.. It can get very difficult. UAE DMA debugger can also help.
Toni Wilen is offline  
Old 22 October 2016, 21:26   #3
Tigerskunk
Inviyya Dude!
 
Tigerskunk's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,770
Okay, thanks for the advice...
Tigerskunk is offline  
Old 24 October 2016, 16:11   #4
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
A small warning: this be a wall of text and numbers

I've been fascinated by the blitter and just how fast it is or isn't for a while now and tinkered a bit with making formula's to calculate how many bobs you can blit per frame. These formulas are based on the HRM and I'm happy to share them (see below)

However, do note that these formula's are not 100% accurate as they a) do not count any CPU time spent whatsoever and b) do not take into account special cases where cycles must be idle. As an example, my tests on a basic A500 found that my blitter routines scored somewhere between 60% (first attempts) and 86-88% (after I optimised them) of the figures found.

Do note I only did two formula's: blitting a bob using the copy-draw-restore method and blitting a bob using the draw-restore method. The latter being much faster, but requiring an additional buffer (basically a copy of the screen without any bobs kept in memory at all times).

---
The first step is finding how many DMA cycles you have available.
For PAL Amiga's, the basic formula to find this is:

DMA_cycles_left = (226*312)-display_dma_cycles-refresh_cycles-audio_cycles-sprite_cyces-copper_cycles

Where display_dma_cycles = (display_width/16)*display_height*number_of_planes
refresh_cycles = 312*4
audio_cycles = 312*audio_channels_playing
sprite_cycles = display_height*number_of_sprites*2
copper_cycles = (copper_moves*2)+(copper_waits*3)

Resulting formula for cycles left is then:
DMA_cycles_left = (226*312)-((display_width/16)*display_height*number_of_planes))-(312*4)-(312*audio_channels_playing)-(display_height*number_of_sprites*2)-((copper_moves*2)+(copper_waits*3))

Note that the audio and sprite times in this calculation are worst case results, assuming that each audio channel is played at maximum sample rate and each sprite is shown on each raster line.
Also note that the formula assumes you have one bitmap showing (type doesn't matter so it also works for HAM and DPL) and no splits. If you do have splits, you'll need to calculate the display DMA time for each one separately.

For horizontal scrolling, the display width used for calculating should be 16 pixels more than actually shown to cover for the extra word Denise grabs on every line to enable scrolling.

So for a basic 320*256 scrolling screen without sprites and no audio or copper cyles, the results would be:
DMA_cycles_left = (226*312)-((336/16)*256*5))-(312*4)-(312*0)-(256*0*2)-((0*2)+(0*3))
DMA_cycles_left = 70512-26880-1248-0-0 = 42384

---
The next step is checking how many cycles your bobs take each. I did this for two cases. Note that restoring the background is counted as part of the cost for a bob.

Case one: copy the background area for the bob to a restore buffer, draw the bob, restore the bob from the restore buffer
Case two: draw the bob, restore the bob from the extra background buffer

Costs: copying/restoring takes 2 DMA cycles per word, drawing takes 4 DMA cycles per word.
So, case one costs 2+4+2=8 cycles per word and case two costs 4+2=6 cycles per word.

Because bobs are not always aligned to the nearest 16 pixels, I've added the additional word the blitter processes in these cases into the formula's.

The basic bob formula's then become:
Case one: 8*(width+16)/16*height*planes
Case two: 6*(width+16)/16*height*planes

For a 5 bitplane 32x32 bob, the results would be:
Case one: 8*3*32*5=3840 DMA cycles
Case two: 6*3*32*5=2880 DMA cycles

---
Which means the blitter could draw/restore roughly ~11 and ~14 bobs sized 32x32 pixels respectively on the screen defined in step one, without accounting for CPU use (including any demo/game logic) and idle cycles. You can see these results as a kind of 'theoretical limit'. Actual performance will be lower due to the earlier mentioned causes.

I personally find these formula's useful for getting a feel for what is and isn't possible and what I can roughly expect to be able to blit, not to exhaustively determine the exact number of pixels I can blit.
Hopefully they are of some use to you as well!
roondar is offline  
Old 24 October 2016, 19:29   #5
Tigerskunk
Inviyya Dude!
 
Tigerskunk's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,770
@roondar:

Wow, thanks for that exhaustive read..

It's kind of alike to what I am currently experiencing when drawing and restoring BOBs of that size.

Which is not thaaaaaat much, but can be worked with...
Tigerskunk is offline  
Old 25 October 2016, 10:24   #6
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by Steril707 View Post
@roondar:

Wow, thanks for that exhaustive read..

It's kind of alike to what I am currently experiencing when drawing and restoring BOBs of that size.

Which is not thaaaaaat much, but can be worked with...

Well, there is one way to get more objects on screen cheaply: use sprites!

Amiga sprites are very, very quick to draw (somewhere between 10x and 15x as fast a blitting on a 5 bitplane screen when drawing the same number of pixels, depending on how the blit is done*).

They do have all sorts of limitations compared to bobs, but it is always worthwile to see how much of the objects on screen can be done using sprites instead. Merely displaying the main character as a sprite effectively gives you a 'free' extra object on screen.

Now, admittedly making good use of sprite hardware can be quite tricky, but the results can be worthwile.

I'd personally try to draw all small objects (say <=16x16) as sprites if possible, this has the advantage of alleviating the cost for setting up blits as well so it's kind of a 'double win'.

Traditionally, Amiga games tend to draw the main character as a sprite and do some or all of the bullets/projectiles as sprites as well. Another traditional use for sprites is scoreboards and status displays.
Some games use sprites as a background layer, which does use quite a bit of DMA time but can be quite impressive to look at.

*) Assuming 15 colour attached sprites, these figures double for 3 colour non-attached sprites.
roondar is offline  
Old 25 October 2016, 14:49   #7
Master484
Registered User
 
Master484's Avatar
 
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 524
Thanks for making these calculations Roondar, now we have a scientific formula on Blitter's theoretical BOB drawing ability.

I just tested the "case one" drawing method with Blitz Basic; 32*32 BOBs in 5 bitplane 320*256 screen. The results were: 6 BOBS running at 50 fps, with minimal code that just moved them left and right.

And because the same test in Assembler could display 11 BOBS, this means that Blitz is about 50% slower than ASM, when it comes to blitting operations, and most likely all other operations too.

And when I reduced bitplanes to 4 I got 8 BOBS at 50fps, so two BOBs more, but still slower than ASM blitting in 5 bitplane mode.

It's good to finally have some "numbers on the table" about the blitting speed of ASM...for a long time I've been wondering the Blitz vs ASM speed difference on graphics drawing, and now I know; 50% slower sounds like the correct value.
Master484 is offline  
Old 25 October 2016, 15:40   #8
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by Master484 View Post
Thanks for making these calculations Roondar, now we have a scientific formula on Blitter's theoretical BOB drawing ability.

I just tested the "case one" drawing method with Blitz Basic; 32*32 BOBs in 5 bitplane 320*256 screen. The results were: 6 BOBS running at 50 fps, with minimal code that just moved them left and right.

And because the same test in Assembler could display 11 BOBS, this means that Blitz is about 50% slower than ASM, when it comes to blitting operations, and most likely all other operations too.

And when I reduced bitplanes to 4 I got 8 BOBS at 50fps, so two BOBs more, but still slower than ASM blitting in 5 bitplane mode.

It's good to finally have some "numbers on the table" about the blitting speed of ASM...for a long time I've been wondering the Blitz vs ASM speed difference on graphics drawing, and now I know; 50% slower sounds like the correct value.
Ah, I would like to point out here (which may not have been clear enough considering I made a big wall of text ) that the bob counts I gave didn't account for any CPU use. So these are theoretical maximum numbers assuming the Blitter gets every cycle there is left after display and other DMA channels, which is not actually possible in the 'real world'.

Real world tests I did managed to do around 87% of the numbers, or roughly 9 and 12 bobs respectively (where the first case had some raster time left because there was enough time for about 9,5 bobs but the code only did full 32x32 bobs). I don't know if my blitting code was superfast, superslow or somewhere in between, but those where my results in ASM.



A side note: in general, if you feel that code is CPU bottlenecked, you can try to just blit fewer, but bigger objects - this will lower the relative overhead of the CPU.
roondar is offline  
Old 26 October 2016, 09:10   #9
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
Quote:
Originally Posted by Master484 View Post
Thanks for making these calculations Roondar, now we have a scientific formula on Blitter's theoretical BOB drawing ability.

I just tested the "case one" drawing method with Blitz Basic; 32*32 BOBs in 5 bitplane 320*256 screen. The results were: 6 BOBS running at 50 fps, with minimal code that just moved them left and right.

And because the same test in Assembler could display 11 BOBS, this means that Blitz is about 50% slower than ASM, when it comes to blitting operations, and most likely all other operations too.

And when I reduced bitplanes to 4 I got 8 BOBS at 50fps, so two BOBs more, but still slower than ASM blitting in 5 bitplane mode.

It's good to finally have some "numbers on the table" about the blitting speed of ASM...for a long time I've been wondering the Blitz vs ASM speed difference on graphics drawing, and now I know; 50% slower sounds like the correct value.
In order to have more bobs' on screen you can use triple buffer tecnique, were you can skip one blit operation.

Normally you have to do this:

save screen
draw bob
restore screen

with triple buffer, we have

draw bob
restore screen

not bad!
sandruzzo is offline  
Old 26 October 2016, 10:34   #10
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
Quote:
Originally Posted by sandruzzo View Post
with triple buffer, we have

draw bob
restore screen
Well........... if you want to spend more cpu cycles you can do better than that.

If you keep track of bobs that don't overlap each other then you can do a restore as part of the cookie-cut by having the third buffer as source but your displayed buffer as destination.
You will get issues with dirty deltas, but having empty space around your source bob data and doing a slightly bigger blit (given that the extra space is enough to cover the delta) you can cover the full restore.

Then you need to calculate and do all restores that are missing, and then regular blits (those that overlap the ones you have already done).

I think there might be a number of dual-playfield games that do something very close to this - they just do A->D copy blits for the bobs unless they overlap (and the design probably explicitly makes them not overlap).
NorthWay is offline  
Old 26 October 2016, 10:59   #11
AnimaInCorpore
Registered User
 
Join Date: Nov 2012
Location: Willich/Germany
Posts: 232
Quote:
Originally Posted by roondar View Post
I've been fascinated by the blitter and just how fast it is or isn't for a while now and tinkered a bit with making formula's to calculate how many bobs you can blit per frame. These formulas are based on the HRM and I'm happy to share them (see below)[...]
Thanks for the explanation!

I just made a small JSFiddle from your calculation where you can enter your own values to check the limits.
AnimaInCorpore is offline  
Old 26 October 2016, 13:45   #12
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by NorthWay View Post
Well........... if you want to spend more cpu cycles you can do better than that.

If you keep track of bobs that don't overlap each other then you can do a restore as part of the cookie-cut by having the third buffer as source but your displayed buffer as destination.
You will get issues with dirty deltas, but having empty space around your source bob data and doing a slightly bigger blit (given that the extra space is enough to cover the delta) you can cover the full restore.

Then you need to calculate and do all restores that are missing, and then regular blits (those that overlap the ones you have already done).

I think there might be a number of dual-playfield games that do something very close to this - they just do A->D copy blits for the bobs unless they overlap (and the design probably explicitly makes them not overlap).
I'm relatively certain that this trick should be possible for the first bob restored in any sequence (assuming there is enough 'delta space'), regardless of how and where bobs are located. Not tested it though.

If more bobs are desired then it does become harder - altough the correct setup of bobs to draw ahead of time (i.e. level design) could perhaps make it possible without much CPU overhead.

Quote:
Originally Posted by AnimaInCorpore
Thanks for the explanation!

I just made a small JSFiddle from your calculation where you can enter your own values to check the limits.
Great stuff, thanks

I do note one detail though: this piece of code will not give the correct value for Dual Playfield blitting, as that is a six bitplane screen while blitting for Dual Playfield mode is (usually) only done on three bitplanes.

But that is basically nitpicking

Last edited by roondar; 26 October 2016 at 13:52.
roondar is offline  
Old 27 October 2016, 08:25   #13
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
Quote:
Originally Posted by NorthWay View Post
Well........... if you want to spend more cpu cycles you can do better than that.

If you keep track of bobs that don't overlap each other then you can do a restore as part of the cookie-cut by having the third buffer as source but your displayed buffer as destination.
You will get issues with dirty deltas, but having empty space around your source bob data and doing a slightly bigger blit (given that the extra space is enough to cover the delta) you can cover the full restore.

Then you need to calculate and do all restores that are missing, and then regular blits (those that overlap the ones you have already done).

I think there might be a number of dual-playfield games that do something very close to this - they just do A->D copy blits for the bobs unless they overlap (and the design probably explicitly makes them not overlap).
Even turrican use this tryck
sandruzzo is offline  
Old 28 October 2016, 10:50   #14
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
A small addition about this trick might be fun

When using the trick described by Northway, no restore step is needed (which lowers blitting cost by 2 DMA cycles per word), but the bob needs to have some empty space around it. The size of this empty space is based on how far the bob gets to move in a frame.

I tried to think it trough and got this (but do check if I'm correct, I have not tested this, merely tried figuring it out):

For both the X and Y axis, the bob image source & mask source should have empty space around them equal to twice the maximum distance a bob can move on that axis in one frame.

However, for blitting itself technically only the distance moved is needed. For Y this can be done in a simple way, because adding a line to a bob can be done in steps of one line. For X this is more complicated, because only full words (=16 pixels) can be added. As such, it may be worthwile for movement on the X axis to just add 2x the max movement speed at all times to simplify blitting.

This leads to the following result:
Blit cost = 4*(width+(max_x*2)+16)/16*(height+max_y)*planes
(where width+(max_x*2) is rounded up to the nearest multiple of 16)

So, for a 24x32 bob @ 5 bitplanes with 4 pixels of max movement, we get the following results.
Case three: 4*3*36*5=2160 DMA cycles

That would lead to a maximum of ~19 bobs per frame on a 320x256x5 screen, but as said before, some caveats do apply.

You may also have noted I reduced the X size of the bob - I did this because a width that is an exact multiple of 16 pixels will always end up costing so much more for the extra word needed by the extra space requires that any gains made by blitting in this particular way tend to evaporate.
roondar is offline  
Old 28 October 2016, 12:41   #15
Tigerskunk
Inviyya Dude!
 
Tigerskunk's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,770
Also, you shouldn't let the BOBs crash into foreground playfield graphics on a dual playfield setup.

Might be tight to maneuvre them around then with that large amount of empty space around them that gets blitted as well.

Last edited by Tigerskunk; 28 October 2016 at 12:57.
Tigerskunk is offline  
Old 09 January 2022, 15:08   #16
KONEY
OctaMED Music Composer
 
KONEY's Avatar
 
Join Date: Jan 2009
Location: Venice - Italy
Age: 49
Posts: 666
Hi! I'm sorry to resume such an old thread but I'm doing some research and keep crashing here...



So here's what happens: I'm blitting large (but not enormous) quantity of pixel every frame and they run smoothly. Doing the same while music is playing is less smooth and for what I see when many channels are playing at the same time and with high-pitched sounds it's even less smooth.



I was suspecting DMA sharing was the problem but checking on AHRM I've not found information about how audio playback can affect DMA access of other chips.


Then I made a test running the code with Nasty Blitter enabled: blits were smooth again but audio playback was slowed down so I assumed the problem was somehow about CPU cycles.

Then I'm reading this and I'm not sure anymore again... anyone care to add something?



Quote:
Originally Posted by roondar View Post
Note that the audio and sprite times in this calculation are worst case results, assuming that each audio channel is played at maximum sample rate and each sprite is shown on each raster line.
KONEY is offline  
Old 09 January 2022, 15:18   #17
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,986
If you were not running an interrupt before, it could be that your code was running close to going over a frame, and now that you've introduced an interrupt is what makes it fall over.

Firstly, what playroutine are you using? if its anything other that PHX's playroutine, it could be because of a cpu delay or a dma wait that is too large.

Secondly, starting an interrupt, exiting and RTE according to Toni Wilen uses as much as 70 cycles, if your code is close to running over a frame, the processing of the interrupt could be enough to make it go over.

Also, where have you placed your blitter waits? Before the blit code you want to execute or after?
Galahad/FLT is offline  
Old 09 January 2022, 15:22   #18
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by KONEY View Post
Hi! I'm sorry to resume such an old thread but I'm doing some research and keep crashing here...

So here's what happens: I'm blitting large (but not enormous) quantity of pixel every frame and they run smoothly. Doing the same while music is playing is less smooth and for what I see when many channels are playing at the same time and with high-pitched sounds it's even less smooth.
If your code is already close to the limit of what can be done in a frame, audio cycles can cause indeed you to overrun. Audio DMA uses more cycles when there's more channels and/or higher sample rates being used so this makes sense. Also note that routines to play back music also need some time and this can also vary frame to frame.
Quote:
I was suspecting DMA sharing was the problem but checking on AHRM I've not found information about how audio playback can affect DMA access of other chips.
Audio DMA has priority over CPU and Blitter so if audio DMA needs cycles, the CPU/Blitter can't have them. Because of the way DMA works in the Amiga, the CPU is usually not affected much by this, but the Blitter definitely can be. Note that other DMA sources (bitplanes/sprites/disk/copper) can't actually compete with audio cycles to begin with as they happen at different times, so they are irrelevant.
Quote:
Then I made a test running the code with Nasty Blitter enabled: blits were smooth again but audio playback was slowed down so I assumed the problem was somehow about CPU cycles.
Quite possible. If you play a module, normally the mod player uses an interrupt to deal with any logic code that needs to be done for the player. This runs on the CPU and so can't run if the Blitter is in Nasty Mode and uses all cycles at that time. At that point, the mod player interrupt will be delayed to a point where it can run, which could result in audio slowdown.

Note that this does not happen normally. Usually, there's enough times where mod players can interrupt the CPU so if there's a shortage you normally see GFX slowdown first. However, if you have a *lot* of Blitter cycles, are running in Nasty Mode and very few CPU cycles in your main loop/interrupt then it is indeed possible for the mod player interrupt to be delayed so much it starts being noticeable.
Quote:
Then I'm reading this and I'm not sure anymore again... anyone care to add something?
My best guess is that you'll probably get a better result if you slightly reduce the amount of stuff blit so that the mod player interrupt can actually trigger in time.
roondar is offline  
Old 09 January 2022, 17:00   #19
Rotareneg
Registered User
 
Rotareneg's Avatar
 
Join Date: Sep 2017
Location: Kansas, USA
Posts: 324
You can get a visual display of DMA activity in WinUAE as well: http://eab.abime.net/showpost.php?p=...7&postcount=67
Rotareneg is offline  
Old 09 January 2022, 18:02   #20
KONEY
OctaMED Music Composer
 
KONEY's Avatar
 
Join Date: Jan 2009
Location: Venice - Italy
Age: 49
Posts: 666
Thanks, now I have a better picture! Actally I'm blitting 2x 320+234 bitplanes every frame. Not sure it can be considered big.



Quote:
Originally Posted by roondar View Post
My best guess is that you'll probably get a better result if you slightly reduce the amount of stuff blit so that the mod player interrupt can actually trigger in time.
KONEY is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calculate offset using labels Beska Coders. Asm / Hardware 7 09 May 2016 18:56
Fastest way to blit things on screen Shatterhand Coders. Blitz Basic 13 03 February 2016 10:12
Calculate Time-Tracks Pixel width? AGS Coders. General 22 10 March 2015 19:19
Calculate a color gradient. AGS Coders. Asm / Hardware 13 11 February 2015 11:20
[BlitzBasic] blit outside bitmap error Raislin77it Coders. Blitz Basic 8 08 February 2014 11:42

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 12:10.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.20146 seconds with 15 queries