English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 31 October 2019, 12:57   #201
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 1,642
Try to use Guardband Clipping as much as possible to avoid blitters' clipping.
sandruzzo is offline  
Old 31 October 2019, 12:58   #202
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,378
As I understand it, you're clearing the screen with the Blitter, correct?

In that case, there's a couple of optimisations you can do because of how Blitter clear works. (mutually exclusive, so pick one that works for you). Blitter clear only uses half of the DMA cycles, leaving the other half free. So, here's a few ways to exploit this:
  • Run the Blitter Clear starting after the last line of your bitplane display has occurred. This will allow the CPU to run full speed and the Blitter clear to also run full speed.
  • Run the Blitter Clear during the bitplane display. This will cause the CPU to be almost fully locked out during that time, but does make use of Blitter clear interleaving with display DMA, leaving more time outside of the bitplane display.
  • Start the clear after the last line of your bitplane display has occurred and clear part of the screen with the CPU while the Blitter runs. This is the fastest way to clear on the A500, but may be slower than interleaving with CPU calculating instead.

All of these assume you don't actually spend the time the Blitter is busy waiting on it being done.

Which of these is fastest I don't know, but I do know that using the CPU+Blitter to clear is a big improvement in peformance, it theoretically may even get close to being double the speed of using just one or the other.
roondar is offline  
Old 31 October 2019, 13:18   #203
Steril707
Tigerskunk!

Steril707's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 1,080
Still wondering what kind of voodoo Jez San did back in the day for Starglider 2.
Steril707 is offline  
Old 31 October 2019, 13:34   #204
deimos
Registered User

 
Join Date: Jul 2018
Location: Londonish / UK
Posts: 489
Quote:
Originally Posted by roondar View Post
As I understand it, you're clearing the screen with the Blitter, correct?

In that case, there's a couple of optimisations you can do because of how Blitter clear works. (mutually exclusive, so pick one that works for you). Blitter clear only uses half of the DMA cycles, leaving the other half free. So, here's a few ways to exploit this:
  • Run the Blitter Clear starting after the last line of your bitplane display has occurred. This will allow the CPU to run full speed and the Blitter clear to also run full speed.
  • Run the Blitter Clear during the bitplane display. This will cause the CPU to be almost fully locked out during that time, but does make use of Blitter clear interleaving with display DMA, leaving more time outside of the bitplane display.
  • Start the clear after the last line of your bitplane display has occurred and clear part of the screen with the CPU while the Blitter runs. This is the fastest way to clear on the A500, but may be slower than interleaving with CPU calculating instead.

All of these assume you don't actually spend the time the Blitter is busy waiting on it being done.

Which of these is fastest I don't know, but I do know that using the CPU+Blitter to clear is a big improvement in peformance, it theoretically may even get close to being double the speed of using just one or the other.
Correct, in all cases the screen clear is still using the blitter. Currently it is still using separate blits for each bitplane to fill in SKY_BLUE - that's left over from before I moved to interleaved bitmaps. I can fiddle with my palette to make SKY_BLUE colour 15, and do I single blit to clear the screen.

Clearing some of the screen with the CPU is probably too hard if I'm using a queue. Your second option of "Run the Blitter Clear during the bitplane display" would seem to be the natural choice as if it's the first blit the copper triggers it will happen right after the display starts, and if I'm using the CPU for drawing everything else I still have to wait for vertical blank to switch buffers and start drawing.

Edit, of course, it I can clear the screen while I'm building my copper list, instead of adding it to it, that would help.

The remaining question I have is, what is the correct way to clear with the blitter to ensure it doesn't use more DMA cycles than it needs. I think what I have is correct, if I remove the loop to make it a single blit, that is:

Code:
void FillPrimaryDisplay(const UWORD colour) {
    F_START

    APTR bltdpt = backBuffer;

    for (UWORD i = 0; i < PRIMARY_DISPLAY_DEPTH; i++) {
        WaitBlit();

        custom->bltcon0 = DEST | (colour & 1 << i ? 0xff : 0x00);
        custom->bltcon1 = 0;
        custom->bltdpt = bltdpt;
        custom->bltdmod = SCREEN_WIDTH_IN_BYTES * (PRIMARY_DISPLAY_DEPTH - 1);

        custom->bltsize = PRIMARY_DISPLAY_HEIGHT << 6 | SCREEN_WIDTH_IN_WORDS;

        bltdpt += SCREEN_WIDTH_IN_BYTES;
    }

    F_STOP
}
How long does it take for the blitter to clear a screen, anyway?

Last edited by deimos; 31 October 2019 at 13:50.
deimos is offline  
Old 31 October 2019, 13:58   #205
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 198
Quote:
Originally Posted by roondar View Post
  • Run the Blitter Clear during the bitplane display. This will cause the CPU to be almost fully locked out during that time, but does make use of Blitter clear interleaving with display DMA, leaving more time outside of the bitplane display.
I don't think that's a good option if you rely on the CPU for draw + fill and on the blitter only for clearing. Yes, you have more free dma slots outside the display area, but the CPU can only take half of them, with nothing else making use of the rest (except for some audio DMA). Better use option 1 in that case.

EDIT: Blitter uses every second DMA slot in clear mode, so it can clear roughly 70kb per frame; so for a 320*200 window it needs about half a frame with blitter nasty on (display dma doesn't affect that number if <= 4 bpl). More if it runs concurrently in nice mode with CPU.

EDIT2: IMHO the most efficient way for a CPU filler is to run the blitter clear in non-nasty mode during bitplane DMA and trying to do all multiplications/division-heavy code during the same time. Not that easy to do of course. contradicts what I wrote before. I have to think about it a second time.

Last edited by chb; 31 October 2019 at 14:28.
chb is offline  
Old 31 October 2019, 14:13   #206
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,378
Quote:
Originally Posted by deimos View Post
Correct, in all cases the screen clear is still using the blitter. Currently it is still using separate blits for each bitplane to fill in SKY_BLUE - that's left over from before I moved to interleaved bitmaps. I can fiddle with my palette to make SKY_BLUE colour 15, and do I single blit to clear the screen.
Good to know
Quote:
Clearing some of the screen with the CPU is probably too hard if I'm using a queue. Your second option of "Run the Blitter Clear during the bitplane display" would seem to be the natural choice as if it's the first blit the copper triggers it will happen right after the display starts, and if I'm using the CPU for drawing everything else I still have to wait for vertical blank to switch buffers and start drawing.
Do note what chb correctly pointed out. This method is only really useful if you do other stuff with the Blitter when there's no bitplane DMA.
Quote:
Edit, of course, it I can clear the screen while I'm building my copper list, instead of adding it to it, that would help.
That would work, yes
Quote:
The remaining question I have is, what is the correct way to clear with the blitter to ensure it doesn't use more DMA cycles than it needs. I think what I have is correct, if I remove the loop to make it a single blit, that is:

Code:
void FillPrimaryDisplay(const UWORD colour) {
    F_START

    APTR bltdpt = backBuffer;

    for (UWORD i = 0; i < PRIMARY_DISPLAY_DEPTH; i++) {
        WaitBlit();

        custom->bltcon0 = DEST | (colour & 1 << i ? 0xff : 0x00);
        custom->bltcon1 = 0;
        custom->bltdpt = bltdpt;
        custom->bltdmod = SCREEN_WIDTH_IN_BYTES * (PRIMARY_DISPLAY_DEPTH - 1);

        custom->bltsize = PRIMARY_DISPLAY_HEIGHT << 6 | SCREEN_WIDTH_IN_WORDS;

        bltdpt += SCREEN_WIDTH_IN_BYTES;
    }

    F_STOP
}
If I understand this correctly, this code waits on the Blitter between clear blits for each bitplane. That is not efficient. What you want to do is have the big clear start and then do something (anything) else while it is busy.

Should this indeed be what you're doing, there's three ways you can improve it (IMHO anyway). The first is to run the clear through interrupts - they're big blits so the overhead of running the handler vs the gains in usable cycles should be very positive. The second is to make the function so it clears only one plane and call that at certain positions of your main loop while it does everything else it can do (basically manually interleaving the calls). The third is that instead of waiting on the blitter, clear part of the screen using the CPU at that time.

A way to do this would be to clear half of the bitplane you want to clear with the CPU after you start the Blitter running clearing the other half and then wait on the blitter once the CPU is done. Of course, the exact amount needs to be checked. Perhaps you can only clear 1/3 with the CPU, this needs experimentation.

Just some ideas, hope they help!
Quote:
How long does it take for the blitter to clear a screen, anyway?
It clears at 4 CPU cycles per word cleared, so a theoretical maximum clear rate of about 3.5MB/sec. That translates to a theoretical maximum of about 70.000 bytes per frame (which you'll probably not manage to actually achieve considering there's other stuff happening as well).
Quote:
Originally Posted by chb View Post
I don't think that's a good option if you rely on the CPU for draw + fill and on the blitter only for clearing. Yes, you have more free dma slots outside the display area, but the CPU can only take half of them, with nothing else making use of the rest (except for some audio DMA). Better use option 1 in that case.
Yup, I fully agree. Either clear using both or clear using the Blitter when no other DMA occurs are the preferred options.
roondar is offline  
Old 31 October 2019, 14:16   #207
deimos
Registered User

 
Join Date: Jul 2018
Location: Londonish / UK
Posts: 489
Quote:
Originally Posted by chb View Post
I don't think that's a good option if you rely on the CPU for draw + fill and on the blitter only for clearing. Yes, you have more free dma slots outside the display area, but the CPU can only take half of them, with nothing else making use of the rest (except for some audio DMA). Better use option 1 in that case.

EDIT: Blitter uses every second DMA slot in clear mode, so it can clear roughly 70kb per frame; so for a 320*200 window it needs about half a frame with blitter nasty on (display dma doesn't affect that number if <= 4 bpl). More if it runs concurrently in nice mode with CPU.
I (we?) need to decide whether I'm going to ditch the blitter queue and go CPU only. Optimisations like this impact each differently, and I won't be able to keep two versions going.
deimos is offline  
Old 31 October 2019, 14:20   #208
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,378
Regardless of which version you end up choosing, using the Blitter to clear the screen (when done in an optimal way as previously discussed) is always preferable to just using the CPU.

That said, it seems to me your CPU routines for drawing are currently much faster than the Blitter ones. I'm not so sure you'll manage to bridge that gap in a reasonable timeframe so perhaps picking the CPU for polygon drawing is a good start. You can always decide to try a Blitter version in a later project.

Last edited by roondar; 31 October 2019 at 14:28.
roondar is offline  
Old 31 October 2019, 14:37   #209
deimos
Registered User

 
Join Date: Jul 2018
Location: Londonish / UK
Posts: 489
Quote:
Originally Posted by roondar View Post
Regardless of which version you end up choosing, using the Blitter to clear the screen (when done in an optimal way as previously discussed) is always preferable to just using the CPU.

That said, it seems to me your CPU routines for drawing are currently much faster than the Blitter ones. I'm not so sure you'll manage to bridge that gap in a reasonable timeframe so perhaps picking the CPU for polygon drawing is a good start. You can always decide to try a Blitter version in a later project.
Right then, unless someone makes a compelling argument otherwise by the time I've finished this coffee, we're going CPU for all polygons, blitter for clearing only (filling the screen in sky blue, possibly drawing the ground, if I can make that work).
deimos is offline  
Old 31 October 2019, 14:41   #210
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 44
Posts: 23,352
Quote:
Originally Posted by roondar View Post
[*]Run the Blitter Clear during the bitplane display. This will cause the CPU to be almost fully locked out during that time, but does make use of Blitter clear interleaving with display DMA, leaving more time outside of the bitplane display.
CPU won't be almost locked out in this situation.

CPU can always use any idle blitter cycle. Blitter idle cycles require free cycle (not used by bitplane, copper etc) but blitter also won't "use" it, freeing the bus for the CPU.
Toni Wilen is offline  
Old 31 October 2019, 14:43   #211
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 198
Quote:
Originally Posted by deimos View Post
I (we?) need to decide whether I'm going to ditch the blitter queue and go CPU only. Optimisations like this impact each differently, and I won't be able to keep two versions going.
Just my personal opinion: Using the blitter to fill polygons looks like a nice idea, but in practice it's an inferior method for a game with complex and overlapping objects, as it utilizes memory bandwidth very ineffectively. I wrote something about it in another thread of yours some time ago.
chb is offline  
Old 31 October 2019, 14:48   #212
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 198
Quote:
Originally Posted by Toni Wilen View Post
CPU won't be almost locked out in this situation.

CPU can always use any idle blitter cycle. Blitter idle cycles require free cycle (not used by bitplane, copper etc) but blitter also won't "use" it, freeing the bus for the CPU.
Ah right, there was something about those idle cycles. So during display DMA (4 bpl) blitter in D-only fillmode uses only every 4th DMA cycle, meaning also every 4th DMA cycle (blitter idle cycle) is available for the CPU?
chb is offline  
Old 31 October 2019, 14:51   #213
deimos
Registered User

 
Join Date: Jul 2018
Location: Londonish / UK
Posts: 489
Quote:
Originally Posted by chb View Post
Just my personal opinion: Using the blitter to fill polygons looks like a nice idea, but in practice it's an inferior method for a game with complex and overlapping objects, as it utilizes memory bandwidth very ineffectively. I wrote something about it in another thread of yours some time ago.
Yes, but now that's not just your personal opinion, it's something that we now have evidence for. And the journey hasn't been a waste. Even if I've learnt 5 wrong ways to do something, that's 5 ways of doing something I didn't know before.
deimos is offline  
Old 31 October 2019, 14:55   #214
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,378
Quote:
Originally Posted by Toni Wilen View Post
CPU won't be almost locked out in this situation.

CPU can always use any idle blitter cycle. Blitter idle cycles require free cycle (not used by bitplane, copper etc) but blitter also won't "use" it, freeing the bus for the CPU.
Perhaps that was worded too strongly. Of course the CPU can use any idle cycles.

What I was trying to get across is that if you're interleaving a Blitter clear with bitplane DMA, then the Blitter idle cycles will normally be used by the bitplane DMA (for a 4 bitplane screen, this means effectively using all non-hblank cycles for either Blitter or bitplane DMA).

If you're running in nasty mode that leaves you with very few CPU cycles per scanline (only H-BLANK). Otherwise, it is a lot better as the normal 1-in-4 rule plays out, but you're still losing plenty of CPU cycles at stage.

Last edited by roondar; 31 October 2019 at 15:04.
roondar is offline  
Old 31 October 2019, 15:06   #215
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 198
Quote:
Originally Posted by roondar View Post
What I was trying to get across is that if you're interleaving a Blitter clear with bitplane DMA, then the Blitter idle cycles will normally be used by the bitplane DMA (folr a 4 bitplane screen, this means effectively using all non-hblank cycles for either Blitter or bitplane DMA). .
AFAIK the catch is that blitter idle cycles cannot occur when other dma is going on. So on a 4 bpl display the idle cycles cannot overlap with the bpl dma cycles (but with cpu memory cycles):

Code:
B: Blitter I: Blitter idle D: Display C: CPU
only display dma: 
D - D - D - D -
only display dma + blitter
D B D I D B D I
disp + blit + cpu:
D B D C D B D C
chb is offline  
Old 31 October 2019, 15:08   #216
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 198
Quote:
Originally Posted by deimos View Post
Yes, but now that's not just your personal opinion, it's something that we now have evidence for. And the journey hasn't been a waste. Even if I've learnt 5 wrong ways to do something, that's 5 ways of doing something I didn't know before.
It totally wasn't meant as a "told you so!!1!!". I was just to lazy to type all this again. I really admire your effort - and as you said, it was just theorizing from me, so you are giving us now some evidence.
chb is offline  
Old 31 October 2019, 15:10   #217
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,378
That doesn't seem right to me. I've looked at Elite 2: Frontier in the past using the WinUAE visual DMA debugger and it seems to interleave it's Blitter clear 'perfectly' with the display and not with extra idle cycles added.

I.E. it seemed to do: DBDBDB, not: DBDIDBDI.

Edit: I'm not discounting the possibility I'm very wrong here, especially after what Toni wrote earlier. It's just that I'm nearly 100% certain I saw it work like that I wrote when examining Elite Frontier with the DMA debugger. If it does work as chb/Toni seem to suggest (which is very probable given it's Toni who wrote it!), that would make Blitter idle cycles absolutely terrible for Blitter throughput during bitplane DMA. Clearing would effectively drop to 50% speed during bitplane DMA as a result. In that case it'd be even more advisable to use clear blits outside of the bitplane DMA period or mixing clear blits with CPU clearing.

Last edited by roondar; 31 October 2019 at 16:00.
roondar is offline  
Old 31 October 2019, 15:11   #218
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 1,642
What about taking into the count, the fact that on faster cpu, or with little bit fast mem, cpu + blitter is faster?
sandruzzo is offline  
Old 31 October 2019, 15:26   #219
deimos
Registered User

 
Join Date: Jul 2018
Location: Londonish / UK
Posts: 489
Quote:
Originally Posted by chb View Post
It totally wasn't meant as a "told you so!!1!!".
I know that. Please take my reply as a thank you to everyone who has been patient with me.
deimos is offline  
Old 31 October 2019, 16:13   #220
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 198
Quote:
Originally Posted by roondar View Post
That doesn't seem right to me. I've looked at Elite 2: Frontier in the past using the WinUAE visual DMA debugger and it seems to interleave it's Blitter clear 'perfectly' with the display and not with extra idle cycles added.

I.E. it seemed to do: DBDBDB, not: DBDIDBDI.

Edit: I'm not discounting the possibility I'm very wrong here, especially after what Toni wrote earlier. It's just that I'm nearly 100% certain I saw it work like that I wrote when examining Elite Frontier with the DMA debugger. If it does work as chb/Toni seem to suggest (which is very probable given it's Toni who wrote it!), that would make Blitter idle cycles absolutely terrible for Blitter throughput during bitplane DMA. Clearing would effectively drop to 50% speed during bitplane DMA as a result. In that case it'd be even more advisable to use clear blits outside of the bitplane DMA period or mixing clear blits with CPU clearing.
I did not test it, but from what Toni said in this thread and also here, I assumed blitter idle cycles cannot overlap with display dma. Hmm. But there's this changelog for WinUAE 2.8.0 beta:
Quote:
Blitter's last idle cycle before final D write does not require free bus cycle. Exception: it needs to be free if it is extra cycle added in fill mode. (Lets Rave / Profecy, CrazyWorld / Dioxide)
That suggests your observation is actually true, if "final" does not mean "last write to D of the whole blit". I think we have to wait for Toni.
chb is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
So, I'd like to write a new Amiga game - what do you want to see? Graham Humphrey Amiga scene 88 26 February 2012 22:50
My sales over next couple of weeks emdxxxx MarketPlace 4 31 October 2007 11:17
AmigaSYS 1.7 Released ETA : 1-2 Weeks. Dary News 34 22 March 2005 20:51
HOL mentioned in this weeks Micro Mart fiath Amiga scene 8 07 June 2004 00:56

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 06:58.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.11354 seconds with 16 queries