English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 08 March 2019, 10:39   #1
bloodline
Registered User
 
bloodline's Avatar
 
Join Date: Jan 2017
Location: London, UK
Posts: 433
Blitter Busy Flag

Something in the HRM has me confused...

Code:
About the blitter done flag.
   ----------------------------
   If a blit has just been started but has been locked out of memory
   access because of, for instance, display fetches, this bit may not
   yet be set.  The processor, on the other hand, may be running
   completely uninhibited out of Fast memory or its internal cache, so
   it will continue to have memory cycles.

The solution is to read a chip memory or hardware register address with
the processor before testing the bit.  This can easily be done with the
sequence:

        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)
        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)
When a blit is in progress the Blitter Busy flag is set. This gives the CPU a nice way to check if a blit is in progress.

But the Blitter has priority over the CPU when accessing the 24bit bus. So if the CPU tries to read the DMACONR register and the Blitter is busy, the CPU will just wait because the Blitter is using the bus...

It seems to me that the CPU can never read the Blitter busy flag as true!?

Have I misunderstood? Can the chipregs be written/read by the CPU even when the chipram is in use by another DMA device!?
bloodline is offline  
Old 08 March 2019, 10:59   #2
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,537
IIRC it was mostly a problem with the A1000 blitter, which delayed the setting of the BLTDONE flag. This was fixed in later Agnus revisions.
phx is offline  
Old 08 March 2019, 10:59   #3
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by bloodline View Post
Something in the HRM has me confused...

Code:
About the blitter done flag.
   ----------------------------
   If a blit has just been started but has been locked out of memory
   access because of, for instance, display fetches, this bit may not
   yet be set.  The processor, on the other hand, may be running
   completely uninhibited out of Fast memory or its internal cache, so
   it will continue to have memory cycles.

The solution is to read a chip memory or hardware register address with
the processor before testing the bit.  This can easily be done with the
sequence:

        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)
        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)
When a blit is in progress the Blitter Busy flag is set. This gives the CPU a nice way to check if a blit is in progress.

But the Blitter has priority over the CPU when accessing the 24bit bus. So if the CPU tries to read the DMACONR register and the Blitter is busy, the CPU will just wait because the Blitter is using the bus...

It seems to me that the CPU can never read the Blitter busy flag as true!?

Have I misunderstood? Can the chipregs be written/read by the CPU even when the chipram is in use by another DMA device!?
There are two ways of operating the Blitter. Normal and "nasty" (or BLTHOG).

If the Blitter is operating in normal mode, if will periodically stop itself for a cycle to give the CPU a cycle to do work (to be precise: this happens one every four cycles). In this scenario, the CPU can indeed read the BLTDONE flag even when the Blitter is running.

If the Blitter is operating in "nasty" (or BLTHOG) mode, it doesn't give any cycles to the CPU. So you'd think the CPU can't read the BLTDONE flag. And for the most common blits this is true (given a 68000).

However, for some other blits (notably running the Blitter in clear mode, line mode and most non AD/ABCD blits), the Blitter doesn't actually use all cycles on the bus. In this case, the CPU can read the BLTDONE flag as well, even if the Blitter is in "nasty" mode.

If the CPU is running from fastmemory or the CPU is faster than the standard 68000, it can also in some situations read the BLTDONE flag when the Blitter is running, regardless of the blit type (though it can't always read it, just at certain points during the blit).

Note:
It's important to realise that any CPU running from fastmemory or and faster CPU than the base 68000 can thus interrupt the Blitter mid work even if it is running in "nasty" mode. As such it's required to wait on the Blitter, even if you run in "nasty" mode. Otherwise your code can end up crashing the system if run on a faster CPU or with more memory.

Edit:
I want to add here that PHX is correct, the double test is needed for a bug in early A1000's Agnus chip.

Last edited by roondar; 08 March 2019 at 11:05.
roondar is offline  
Old 08 March 2019, 11:00   #4
bloodline
Registered User
 
bloodline's Avatar
 
Join Date: Jan 2017
Location: London, UK
Posts: 433
Quote:
Originally Posted by phx View Post
IIRC it was mostly a problem with the A1000 blitter, which delayed the setting of the BLTDONE flag. This was fixed in later Agnus revisions.
No, not the delay setting the flag, but the CPU can never read the flag as true!
bloodline is offline  
Old 08 March 2019, 11:04   #5
bloodline
Registered User
 
bloodline's Avatar
 
Join Date: Jan 2017
Location: London, UK
Posts: 433
Quote:
Originally Posted by roondar View Post
There are two ways of operating the Blitter. Normal and "nasty" (or BLTHOG).

If the Blitter is operating in normal mode, if will periodically stop itself for a cycle to give the CPU a cycle to do work (to be precise: this happens one every four cycles). In this scenario, the CPU can indeed read the BLTDONE flag even when the Blitter is running.

If the Blitter is operating in "nasty" (or BLTHOG) mode, it doesn't give any cycles to the CPU. So you'd think the CPU can't read the BLTDONE flag. And for the most common blits this is true (given a 68000).

However, for some other blits (notably running the Blitter in clear mode, line mode and most non AD/ABCD blits), the Blitter doesn't actually use all cycles on the bus. In this case, the CPU can read the BLTDONE flag as well, even if the Blitter is in "nasty" mode.

If the CPU is running from fastmemory or the CPU is faster than the standard 68000, it can also read the BLTDONE flag when the Blitter is running, regardless of the blit type (though it can't always read it, just at certain points during the blit).

Note:
It's important to realise that any CPU running from fastmemory or and faster CPU than the base 68000 can thus interrupt the Blitter mid work even if it is running in "nasty" mode. As such it's required to wait on the Blitter, even if you run in "nasty" mode. Otherwise your code can end up crashing the system if run on a faster CPU or with more memory.
Ok, this makes sense! I was under the false assumption that the Blitter would just take every bus cycle available.

Cheers for the explanation.

It still seems that polling this flag will stall the CPU (if it has fastram), and thus a bad idea.
bloodline is offline  
Old 08 March 2019, 11:09   #6
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by bloodline View Post
It still seems that polling this flag will stall the CPU (if it has fastram), and thus a bad idea.
If you want safe blits, there are four common ways to do it:
  1. Wait on the Blitter using a polling system like above
  2. Use the Blitter in nasty mode and run the code from chipmemory, which saves you running a polling loop during the entire blit, but still makes the CPU wait (and you still need to have a Blitter wait loop to prevent faster CPU's from outrunning the Blitter).
  3. Use Blitter interrupts, which don't keep the CPU waiting but add more overhead to blits
  4. Use the Copper to set up & run blits
Any of these will do, but if you want to use the CPU the only safe ways (assuming you want compatibility outside of 68000/chipmem only systems) are to poll* or to use interrupts.

What I do (and I guess others as well), is to attempt to write my code so the CPU has as much work to do between blits as possible. That way I can run the polling loop at the last possible moment (meaning directly prior to setting the Blitter registers for the next blit).

*) or poll + nasty, which is kinda the same thing.
roondar is offline  
Old 08 March 2019, 13:05   #7
dmacon
Registered User
 
Join Date: Nov 2018
Location: Germany
Posts: 42
Quote:
Originally Posted by roondar View Post
Use the Copper to set up & run blits

And with that, only in BLTHOG mode, because otherwise, CPU access cycles to Chip-Ram would cause non-determistic execution times of the copperlist, leading to display corruption when the mandatory initialization of the bitplane pointers overlaps with the actual display phase.

Using the copper would actually have some nice advantages, if you design your graphics pipeline in a way that all single blits will finish within a single frame update, possibly with just a single buffer. Using a double buffered copperlist, the CPU needs not to waste precious cycles by constantly polling flags or serving interrupt requests after each blit.

Last edited by dmacon; 08 March 2019 at 13:22.
dmacon is offline  
Old 08 March 2019, 13:25   #8
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by dmacon View Post
And with that, only in BLTHOG mode, because otherwise, CPU access cycles to Chip-Ram would cause non-determistic execution times of the copperlist, leading to display corruption when the mandatory initialization of the bitplane pointers overlaps with the actual display phase.
I think I understand what you mean, but I want to clarify to be sure.

So what you mean is not that Copper DMA is interupted by the CPU (because that never happens), but that using the CPU during blits can lead to the Copper wait on the Blitter causing problems by not executing at the time you expect.

Is that correct?

Because if that is what you mean I agree this is something to watch out for. I don't think you really need to use BLTHOG mode for it to be fixed though, you can probably (in most cases anyway) also get away with deferring the Copper wait for the last blit before the display setup until after the display setup is done*.

Obviously, BLTHOG is the easier option here.
Quote:
Using the copper actually would actually have some nice advantages, if you design your graphics pipeline in a way that all single blits will finish within a single frame update. Using a double buffered copperlist, the CPU needs not to waste precious cycles by constantly polling flags or serving interrupt requests after each blit.
This is true, though if you use the Copper then you have to set up the Copper blits as well. This essentially means that the CPU writes the register values into the Copper list and the Copper then writes the registers. This would be additional overhead compared to the CPU setting registers directly.

I'm pretty convinced it'll still be faster than a pure CPU based approach (as you save either the polling loop or the interrupt overhead), but writing these updates is obviously not free either.

*) if this is not enough to make sure no timing issues occur there are still other options. Such as not waiting at all with the Copper and instead starting blits at specific raster locations (based on the worst case scenario time the blits take).

Last edited by roondar; 08 March 2019 at 13:38. Reason: grammar
roondar is offline  
Old 08 March 2019, 13:36   #9
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
Quote:
Originally Posted by dmacon View Post
And with that, only in BLTHOG mode, because otherwise, CPU access cycles to Chip-Ram would cause non-determistic execution times of the copperlist, leading to display corruption when the mandatory initialization of the bitplane pointers overlaps with the actual display phase.
This sentence does not make much sense to me, but it may be that I did not understand it
How "CPU cycles to Chip-Ram" can cause non-deterministic execution for the copperlist?
If there is a deterministic thing it is Copper behavior.

The only difference with BLTHOG active is that you may have to wait longer than expected with the Copper for the Blitter finish bit and then in that case some fundamental initializations (bitplane pointer, screen splits, sprites) may not occur in due time and create screen glitches.
But, as usual, just build the copper list appropriately

Maybe you meant this.
ross is offline  
Old 08 March 2019, 13:42   #10
arcanist
Registered User
 
Join Date: Dec 2017
Location: Austin, TX
Age: 41
Posts: 412
Quote:
Originally Posted by roondar View Post
This is true, though if you use the Copper then you have to set up the Copper blits as well. This essentially means that the CPU writes the register values into the Copper list and the Copper then writes the registers. This would be additional overhead compared to the CPU setting registers directly.
One advantage a copperlist has here is that the CPU only needs to update the values which have changed. e.g. I used this for blitting a score to the screen and only update the source address each frame.

When working with the blitter directly from the CPU all of the relevant blitter registers have to be written.
arcanist is offline  
Old 08 March 2019, 13:58   #11
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by arcanist View Post
One advantage a copperlist has here is that the CPU only needs to update the values which have changed. e.g. I used this for blitting a score to the screen and only update the source address each frame.

When working with the blitter directly from the CPU all of the relevant blitter registers have to be written.
This is true, though the Copper will still have to write all blitter registers regardless so purely in terms of registers written you're not going to see a massive advantage*.

But in terms of overall overhead you'll definitely see advantages like this. Being able to skip some/most/all of the calculations etc for a blit after the first time you do a blit is a fairly big advantage after all

Which is why I do believe it'll be faster to use the Copper even though you're actually doing more CPU+Copper 'Blitter register content' writes in total.

*) though it's slightly easier to 'cheat' by selectively setting registers between blits in the same frame when using the Copper vs the CPU.
roondar is offline  
Old 08 March 2019, 14:01   #12
dmacon
Registered User
 
Join Date: Nov 2018
Location: Germany
Posts: 42
Quote:
Originally Posted by ross View Post
This sentence does not make much sense to me, but it may be that I did not understand it
How "CPU cycles to Chip-Ram" can cause non-deterministic execution for the copperlist?
If there is a deterministic thing it is Copper behavior.
Yes, but not the copper waiting for the blitter to finish blits which are delayed by CPU accesses to Chip-Ram.

Quote:
But, as usual, just build the copper list appropriately
But then you need to always take the worst case assumption for blitter timing interrupted by CPU (each 3rd blitter cycle is granted to the CPU when BLTHOG is not set, and CPU attempts to access Chip-Ram / registers).

When BLTHOG is set, you could even closely interleave Copper initiated blit cycles with other synchronized modifications to the display output.

I'm thinking about simulating a sprite engine with the copper in a way that the copperlist becomes the sprite attribute table, with a constant 50Hz update refresh, using a single buffer.
dmacon is offline  
Old 08 March 2019, 14:07   #13
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by dmacon View Post
Yes, but not the copper waiting for the blitter to finish blits which are delayed by CPU accesses to Chip-Ram.

But then you need to always take the worst case assumption for blitter timing interrupted by CPU (each 3rd blitter cycle is granted to the CPU when BLTHOG is not set, and CPU attempts to access Chip-Ram / registers).

When BLTHOG is set, you could even closely interleave Copper initiated blit cycles with other synchronized modifications to the display output.

I'm thinking about simulating a sprite engine with the copper in a way that the copperlist becomes the sprite attribute table, with a constant 50Hz update refresh, and using a single buffer.
For extra fun, I've considered a Copper Blitter 'engine' with flexible 'blithogging' (i.e. turn it on and off several times a frame or at specific times).

That would allow for really large blits without interfering with Protracker playback (I've not tested how sensitive Protracker playback is to latency, but I'd guess it might be audible if a half-a-frame blit delays the PT interrupt by 100+ raster lines), or give the CPU some room for other stuff to do.
roondar is offline  
Old 08 March 2019, 14:55   #14
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
Quote:
Originally Posted by dmacon View Post
Yes, but not the copper waiting for the blitter to finish blits which are delayed by CPU accesses to Chip-Ram.
Where you got that idea?
The BFD bit simply add a further condition to the video position for the execution of the next instruction. The blitter finish bit setup (that trigger BFD condition in Copper wait) as nothing to do with CPU accesses to chip RAM.

Or I still do not understand what you mean
ross is offline  
Old 08 March 2019, 15:06   #15
dmacon
Registered User
 
Join Date: Nov 2018
Location: Germany
Posts: 42
Quote:
Originally Posted by ross View Post
Where you got that idea?
The BFD bit simply add a further condition to the video position for the execution of the next instruction.
Exactly. So you modify the raster comparison part of the wait instruction to always yield a true condition (easy, just wait for a position which has already passed), and clear the BFD bit. That way, the copper wait timing is determined solely by the timing of the blitter operation.

Quote:
The blitter finish bit setup (that trigger BFD condition in Copper wait) as nothing to do with CPU accesses to chip RAM.
If BLTHOG is not set, then yes.

Because then, the bitter tliming is also dependent on the CPU access to chip-ram. Meaning that the copper could be delayed in a way that display update related register writes will end-up during active display period, causing glitches in the output.

This behaviour will behave in a cumulative way if you concatenate multiple blit set-ups by using the copper in this way.

Quote:
Or I still do not understand what you mean
Maybe you will now.

I know that at least the famous Hard-Wired demo is using the copper in this manner during the "Glenz Vector" part. All blits which draw the muti-faced polygon figure are initiated by the copper using the wait instruction in the way I described above.

Last edited by dmacon; 08 March 2019 at 15:26.
dmacon is offline  
Old 08 March 2019, 15:19   #16
dmacon
Registered User
 
Join Date: Nov 2018
Location: Germany
Posts: 42
Quote:
Originally Posted by roondar View Post
That would allow for really large blits without interfering with Protracker playback (I've not tested how sensitive Protracker playback is to latency, but I'd guess it might be audible if a half-a-frame blit delays the PT interrupt by 100+ raster lines), or give the CPU some room for other stuff to do.

Some people might even like the slight uneveness in update rate by a few ms, somewhat mimicking real human behavior. Can a human being achieve a playback timing with <10ms error margin???
dmacon is offline  
Old 08 March 2019, 15:43   #17
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
@dmacon: I'll give up
We are probably using different words for the same concepts.

Cheers!
ross is offline  
Old 08 March 2019, 15:47   #18
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by dmacon View Post
Some people might even like the slight uneveness in update rate by a few ms, somewhat mimicking real human behavior. Can a human being achieve a playback timing with <10ms error margin???
Well, I haven't tested any of this, but Protracker plays samples. If starting (or stopping for that matter) of a sample is off by enough for it to be audible, I'm not so sure the effect would be as graceful as a human musician being off by 10ms or so.

I'd guess that you'd potentially could get either clicks or pops, or audible repetition of parts of the sample playing until it changes into either silence or the next sample.

This on top of any other timing related distortions such as notes starting slightly too late (which probably wouldn't be very notable and would be similar to what human musicians probably end up doing).

It's not that I want to be negative, more that my experiences with slight errors in digital playback are that it usually doesn't sound nice
roondar is offline  
Old 08 March 2019, 16:21   #19
dmacon
Registered User
 
Join Date: Nov 2018
Location: Germany
Posts: 42
Quote:
Originally Posted by ross View Post
@dmacon: I'll give up
We are probably using different words for the same concepts.
One more try.
let‘s go back to the following statement:

Quote:
The blitter finish bit setup (that trigger BFD condition in Copper wait) as nothing to do with CPU accesses to chip RAM.
This is not true, because if you disable BLTHOG, the timing of the BFD conditition becoming true depends on how often the cpu accesses the chip-ram during the blit
dmacon is offline  
Old 08 March 2019, 16:36   #20
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
Quote:
Originally Posted by dmacon View Post
One more try.
You anticipated me.
I was about to write that I understood what you meant.

Practically I had taken to the letter what you wrote, namely that the blitter finished bit was [directly] influenced by the CPU access in chip mem, while you meant that it's the blitter operation times that are lengthened (if BLTPRI is not setup) [indirectly by CPU accesses] so BBUSY is cleared some time after and Copper operations cumulatively can be delayed.

Right?


EDIT: when one reads quickly while engaged in something else, it happens to not reflect too much on what is written

In my code I have always optimized the BLTPRI bit usage and in addition I have often exploited the different behavior that the Blitter leave to you using different DMA channels for the same operation (see blitter cycles diagram in HRM).

Sure is a good choice to setup BLTPRI=1 if you drive blitter with copper, but even take 'the worst case' scenario when BLTPRI=0 is not so bad.
It depends on what good things you have to do with CPU

Last edited by ross; 08 March 2019 at 17:24. Reason: [] and typo
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Immediate Blitter & Wait for Blitter... volvo_0ne support.WinUAE 32 18 September 2022 09:52
Blitter busy flag with blitter DMA off? NorthWay Coders. Asm / Hardware 9 23 February 2014 21:05
stingray being busy extralife request.Demos 30 08 January 2013 16:44
Wow emuunlim have been busy Paul Amiga websites reviews 1 05 January 2002 02:27
Server busy problem Paul project.EAB 3 03 January 2002 17:03

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:33.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.14825 seconds with 13 queries