21 October 2016, 20:18 | #1 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,798
|
How to calculate possible blit times?
What I want to know is, what amount of cycles do I have to blit something to the screen, and how do I calculate what my running blits are consuming of that amount of cycles already?
Currently I am switching the background ($dff180) to different colors before and after my blitting operations to see how much blit time I have left. And I noticed that I use up around double as much blit time when I do A->D blits than when I am doing Cookie Cutter Blits with all 3 DMA (ABC) Channels involved. On my current project, I get around 38 32x32 Bobs (with 1 extra word for barrel shifting) in 3 planes unto the screen, where as I get only 18 on the screen with a cookie cutter Minterm.. I'd like to calculate if this is near what the blitter is capable of, or if I lose too many cycles through overhead somewhere before. Thanks for any ideas and help on this... |
21 October 2016, 22:55 | #2 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,575
|
HRM blitter cycle diagram ("Typical Blitter Cycle Sequence") should help but note that it does not explain when fill mode adds 1 extra cycle. Also note that blitter idle cycles ('-' in diagram) need free cycle, they are not true idle cycles. Only difference compared to normal blitter cycle is that CPU can use blitter idle cycles. (blitter nasty makes no difference, CPU will still get the cycle)
For real world results you also need to count bitplane DMA, copper etc.. cycles that steal blitter cycles.. It can get very difficult. UAE DMA debugger can also help. |
22 October 2016, 21:26 | #3 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,798
|
Okay, thanks for the advice...
|
24 October 2016, 16:11 | #4 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
A small warning: this be a wall of text and numbers
I've been fascinated by the blitter and just how fast it is or isn't for a while now and tinkered a bit with making formula's to calculate how many bobs you can blit per frame. These formulas are based on the HRM and I'm happy to share them (see below) However, do note that these formula's are not 100% accurate as they a) do not count any CPU time spent whatsoever and b) do not take into account special cases where cycles must be idle. As an example, my tests on a basic A500 found that my blitter routines scored somewhere between 60% (first attempts) and 86-88% (after I optimised them) of the figures found. Do note I only did two formula's: blitting a bob using the copy-draw-restore method and blitting a bob using the draw-restore method. The latter being much faster, but requiring an additional buffer (basically a copy of the screen without any bobs kept in memory at all times). --- The first step is finding how many DMA cycles you have available. For PAL Amiga's, the basic formula to find this is: DMA_cycles_left = (226*312)-display_dma_cycles-refresh_cycles-audio_cycles-sprite_cyces-copper_cycles Where display_dma_cycles = (display_width/16)*display_height*number_of_planes refresh_cycles = 312*4 audio_cycles = 312*audio_channels_playing sprite_cycles = display_height*number_of_sprites*2 copper_cycles = (copper_moves*2)+(copper_waits*3) Resulting formula for cycles left is then: DMA_cycles_left = (226*312)-((display_width/16)*display_height*number_of_planes))-(312*4)-(312*audio_channels_playing)-(display_height*number_of_sprites*2)-((copper_moves*2)+(copper_waits*3)) Note that the audio and sprite times in this calculation are worst case results, assuming that each audio channel is played at maximum sample rate and each sprite is shown on each raster line. Also note that the formula assumes you have one bitmap showing (type doesn't matter so it also works for HAM and DPL) and no splits. If you do have splits, you'll need to calculate the display DMA time for each one separately. For horizontal scrolling, the display width used for calculating should be 16 pixels more than actually shown to cover for the extra word Denise grabs on every line to enable scrolling. So for a basic 320*256 scrolling screen without sprites and no audio or copper cyles, the results would be: DMA_cycles_left = (226*312)-((336/16)*256*5))-(312*4)-(312*0)-(256*0*2)-((0*2)+(0*3)) DMA_cycles_left = 70512-26880-1248-0-0 = 42384 --- The next step is checking how many cycles your bobs take each. I did this for two cases. Note that restoring the background is counted as part of the cost for a bob. Case one: copy the background area for the bob to a restore buffer, draw the bob, restore the bob from the restore buffer Case two: draw the bob, restore the bob from the extra background buffer Costs: copying/restoring takes 2 DMA cycles per word, drawing takes 4 DMA cycles per word. So, case one costs 2+4+2=8 cycles per word and case two costs 4+2=6 cycles per word. Because bobs are not always aligned to the nearest 16 pixels, I've added the additional word the blitter processes in these cases into the formula's. The basic bob formula's then become: Case one: 8*(width+16)/16*height*planes Case two: 6*(width+16)/16*height*planes For a 5 bitplane 32x32 bob, the results would be: Case one: 8*3*32*5=3840 DMA cycles Case two: 6*3*32*5=2880 DMA cycles --- Which means the blitter could draw/restore roughly ~11 and ~14 bobs sized 32x32 pixels respectively on the screen defined in step one, without accounting for CPU use (including any demo/game logic) and idle cycles. You can see these results as a kind of 'theoretical limit'. Actual performance will be lower due to the earlier mentioned causes. I personally find these formula's useful for getting a feel for what is and isn't possible and what I can roughly expect to be able to blit, not to exhaustively determine the exact number of pixels I can blit. Hopefully they are of some use to you as well! |
24 October 2016, 19:29 | #5 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,798
|
@roondar:
Wow, thanks for that exhaustive read.. It's kind of alike to what I am currently experiencing when drawing and restoring BOBs of that size. Which is not thaaaaaat much, but can be worked with... |
25 October 2016, 10:24 | #6 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Quote:
Well, there is one way to get more objects on screen cheaply: use sprites! Amiga sprites are very, very quick to draw (somewhere between 10x and 15x as fast a blitting on a 5 bitplane screen when drawing the same number of pixels, depending on how the blit is done*). They do have all sorts of limitations compared to bobs, but it is always worthwile to see how much of the objects on screen can be done using sprites instead. Merely displaying the main character as a sprite effectively gives you a 'free' extra object on screen. Now, admittedly making good use of sprite hardware can be quite tricky, but the results can be worthwile. I'd personally try to draw all small objects (say <=16x16) as sprites if possible, this has the advantage of alleviating the cost for setting up blits as well so it's kind of a 'double win'. Traditionally, Amiga games tend to draw the main character as a sprite and do some or all of the bullets/projectiles as sprites as well. Another traditional use for sprites is scoreboards and status displays. Some games use sprites as a background layer, which does use quite a bit of DMA time but can be quite impressive to look at. *) Assuming 15 colour attached sprites, these figures double for 3 colour non-attached sprites. |
|
25 October 2016, 14:49 | #7 |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 525
|
Thanks for making these calculations Roondar, now we have a scientific formula on Blitter's theoretical BOB drawing ability.
I just tested the "case one" drawing method with Blitz Basic; 32*32 BOBs in 5 bitplane 320*256 screen. The results were: 6 BOBS running at 50 fps, with minimal code that just moved them left and right. And because the same test in Assembler could display 11 BOBS, this means that Blitz is about 50% slower than ASM, when it comes to blitting operations, and most likely all other operations too. And when I reduced bitplanes to 4 I got 8 BOBS at 50fps, so two BOBs more, but still slower than ASM blitting in 5 bitplane mode. It's good to finally have some "numbers on the table" about the blitting speed of ASM...for a long time I've been wondering the Blitz vs ASM speed difference on graphics drawing, and now I know; 50% slower sounds like the correct value. |
25 October 2016, 15:40 | #8 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Quote:
Real world tests I did managed to do around 87% of the numbers, or roughly 9 and 12 bobs respectively (where the first case had some raster time left because there was enough time for about 9,5 bobs but the code only did full 32x32 bobs). I don't know if my blitting code was superfast, superslow or somewhere in between, but those where my results in ASM. A side note: in general, if you feel that code is CPU bottlenecked, you can try to just blit fewer, but bigger objects - this will lower the relative overhead of the CPU. |
|
26 October 2016, 09:10 | #9 | |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
Quote:
Normally you have to do this: save screen draw bob restore screen with triple buffer, we have draw bob restore screen not bad! |
|
26 October 2016, 10:34 | #10 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 854
|
Well........... if you want to spend more cpu cycles you can do better than that.
If you keep track of bobs that don't overlap each other then you can do a restore as part of the cookie-cut by having the third buffer as source but your displayed buffer as destination. You will get issues with dirty deltas, but having empty space around your source bob data and doing a slightly bigger blit (given that the extra space is enough to cover the delta) you can cover the full restore. Then you need to calculate and do all restores that are missing, and then regular blits (those that overlap the ones you have already done). I think there might be a number of dual-playfield games that do something very close to this - they just do A->D copy blits for the bobs unless they overlap (and the design probably explicitly makes them not overlap). |
26 October 2016, 10:59 | #11 | |
Registered User
Join Date: Nov 2012
Location: Willich/Germany
Posts: 235
|
Quote:
I just made a small JSFiddle from your calculation where you can enter your own values to check the limits. |
|
26 October 2016, 13:45 | #12 | ||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Quote:
If more bobs are desired then it does become harder - altough the correct setup of bobs to draw ahead of time (i.e. level design) could perhaps make it possible without much CPU overhead. Quote:
I do note one detail though: this piece of code will not give the correct value for Dual Playfield blitting, as that is a six bitplane screen while blitting for Dual Playfield mode is (usually) only done on three bitplanes. But that is basically nitpicking Last edited by roondar; 26 October 2016 at 13:52. |
||
27 October 2016, 08:25 | #13 | |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
Quote:
|
|
28 October 2016, 10:50 | #14 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
A small addition about this trick might be fun
When using the trick described by Northway, no restore step is needed (which lowers blitting cost by 2 DMA cycles per word), but the bob needs to have some empty space around it. The size of this empty space is based on how far the bob gets to move in a frame. I tried to think it trough and got this (but do check if I'm correct, I have not tested this, merely tried figuring it out): For both the X and Y axis, the bob image source & mask source should have empty space around them equal to twice the maximum distance a bob can move on that axis in one frame. However, for blitting itself technically only the distance moved is needed. For Y this can be done in a simple way, because adding a line to a bob can be done in steps of one line. For X this is more complicated, because only full words (=16 pixels) can be added. As such, it may be worthwile for movement on the X axis to just add 2x the max movement speed at all times to simplify blitting. This leads to the following result: Blit cost = 4*(width+(max_x*2)+16)/16*(height+max_y)*planes (where width+(max_x*2) is rounded up to the nearest multiple of 16) So, for a 24x32 bob @ 5 bitplanes with 4 pixels of max movement, we get the following results. Case three: 4*3*36*5=2160 DMA cycles That would lead to a maximum of ~19 bobs per frame on a 320x256x5 screen, but as said before, some caveats do apply. You may also have noted I reduced the X size of the bob - I did this because a width that is an exact multiple of 16 pixels will always end up costing so much more for the extra word needed by the extra space requires that any gains made by blitting in this particular way tend to evaporate. |
28 October 2016, 12:41 | #15 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,798
|
Also, you shouldn't let the BOBs crash into foreground playfield graphics on a dual playfield setup.
Might be tight to maneuvre them around then with that large amount of empty space around them that gets blitted as well. Last edited by Tigerskunk; 28 October 2016 at 12:57. |
09 January 2022, 15:08 | #16 |
OctaMED Music Composer
Join Date: Jan 2009
Location: Venice - Italy
Age: 50
Posts: 672
|
Hi! I'm sorry to resume such an old thread but I'm doing some research and keep crashing here...
So here's what happens: I'm blitting large (but not enormous) quantity of pixel every frame and they run smoothly. Doing the same while music is playing is less smooth and for what I see when many channels are playing at the same time and with high-pitched sounds it's even less smooth. I was suspecting DMA sharing was the problem but checking on AHRM I've not found information about how audio playback can affect DMA access of other chips. Then I made a test running the code with Nasty Blitter enabled: blits were smooth again but audio playback was slowed down so I assumed the problem was somehow about CPU cycles. Then I'm reading this and I'm not sure anymore again... anyone care to add something? |
09 January 2022, 15:18 | #17 |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,020
|
If you were not running an interrupt before, it could be that your code was running close to going over a frame, and now that you've introduced an interrupt is what makes it fall over.
Firstly, what playroutine are you using? if its anything other that PHX's playroutine, it could be because of a cpu delay or a dma wait that is too large. Secondly, starting an interrupt, exiting and RTE according to Toni Wilen uses as much as 70 cycles, if your code is close to running over a frame, the processing of the interrupt could be enough to make it go over. Also, where have you placed your blitter waits? Before the blit code you want to execute or after? |
09 January 2022, 15:22 | #18 | ||||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Quote:
Quote:
Quote:
Note that this does not happen normally. Usually, there's enough times where mod players can interrupt the CPU so if there's a shortage you normally see GFX slowdown first. However, if you have a *lot* of Blitter cycles, are running in Nasty Mode and very few CPU cycles in your main loop/interrupt then it is indeed possible for the mod player interrupt to be delayed so much it starts being noticeable. Quote:
|
||||
09 January 2022, 17:00 | #19 |
Registered User
Join Date: Sep 2017
Location: Kansas, USA
Posts: 329
|
You can get a visual display of DMA activity in WinUAE as well: http://eab.abime.net/showpost.php?p=...7&postcount=67
|
09 January 2022, 18:02 | #20 |
OctaMED Music Composer
Join Date: Jan 2009
Location: Venice - Italy
Age: 50
Posts: 672
|
Thanks, now I have a better picture! Actally I'm blitting 2x 320+234 bitplanes every frame. Not sure it can be considered big.
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calculate offset using labels | Beska | Coders. Asm / Hardware | 7 | 09 May 2016 18:56 |
Fastest way to blit things on screen | Shatterhand | Coders. Blitz Basic | 13 | 03 February 2016 10:12 |
Calculate Time-Tracks Pixel width? | AGS | Coders. General | 22 | 10 March 2015 19:19 |
Calculate a color gradient. | AGS | Coders. Asm / Hardware | 13 | 11 February 2015 11:20 |
[BlitzBasic] blit outside bitmap error | Raislin77it | Coders. Blitz Basic | 8 | 08 February 2014 11:42 |
|
|