Blitter Busy Flag

bloodline · 08 March 2019, 10:39

Something in the HRM has me confused...

Code:

About the blitter done flag.
   ----------------------------
   If a blit has just been started but has been locked out of memory
   access because of, for instance, display fetches, this bit may not
   yet be set.  The processor, on the other hand, may be running
   completely uninhibited out of Fast memory or its internal cache, so
   it will continue to have memory cycles.

The solution is to read a chip memory or hardware register address with
the processor before testing the bit.  This can easily be done with the
sequence:

        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)
        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)

When a blit is in progress the Blitter Busy flag is set. This gives the CPU a nice way to check if a blit is in progress.

But the Blitter has priority over the CPU when accessing the 24bit bus. So if the CPU tries to read the DMACONR register and the Blitter is busy, the CPU will just wait because the Blitter is using the bus...

It seems to me that the CPU can never read the Blitter busy flag as true!?

Have I misunderstood? Can the chipregs be written/read by the CPU even when the chipram is in use by another DMA device!?

phx · 08 March 2019, 10:59

IIRC it was mostly a problem with the A1000 blitter, which delayed the setting of the BLTDONE flag. This was fixed in later Agnus revisions.

roondar · 08 March 2019, 10:59

Quote:

Originally Posted by bloodline

Something in the HRM has me confused...

Code:

About the blitter done flag.
   ----------------------------
   If a blit has just been started but has been locked out of memory
   access because of, for instance, display fetches, this bit may not
   yet be set.  The processor, on the other hand, may be running
   completely uninhibited out of Fast memory or its internal cache, so
   it will continue to have memory cycles.

The solution is to read a chip memory or hardware register address with
the processor before testing the bit.  This can easily be done with the
sequence:

        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)
        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)

When a blit is in progress the Blitter Busy flag is set. This gives the CPU a nice way to check if a blit is in progress.

But the Blitter has priority over the CPU when accessing the 24bit bus. So if the CPU tries to read the DMACONR register and the Blitter is busy, the CPU will just wait because the Blitter is using the bus...

It seems to me that the CPU can never read the Blitter busy flag as true!?

Have I misunderstood? Can the chipregs be written/read by the CPU even when the chipram is in use by another DMA device!?

There are two ways of operating the Blitter. Normal and "nasty" (or BLTHOG).

If the Blitter is operating in normal mode, if will periodically stop itself for a cycle to give the CPU a cycle to do work (to be precise: this happens one every four cycles). In this scenario, the CPU can indeed read the BLTDONE flag even when the Blitter is running.

If the Blitter is operating in "nasty" (or BLTHOG) mode, it doesn't give any cycles to the CPU. So you'd think the CPU can't read the BLTDONE flag. And for the most common blits this is true (given a 68000).

However, for some other blits (notably running the Blitter in clear mode, line mode and most non AD/ABCD blits), the Blitter doesn't actually use all cycles on the bus. In this case, the CPU can read the BLTDONE flag as well, even if the Blitter is in "nasty" mode.

If the CPU is running from fastmemory or the CPU is faster than the standard 68000, it can also in some situations read the BLTDONE flag when the Blitter is running, regardless of the blit type (though it can't always read it, just at certain points during the blit).

Note:
It's important to realise that any CPU running from fastmemory or and faster CPU than the base 68000 can thus interrupt the Blitter mid work even if it is running in "nasty" mode. As such it's required to wait on the Blitter, even if you run in "nasty" mode. Otherwise your code can end up crashing the system if run on a faster CPU or with more memory.

Edit:
I want to add here that PHX is correct, the double test is needed for a bug in early A1000's Agnus chip.

bloodline · 08 March 2019, 11:00

Quote:

Originally Posted by phx

IIRC it was mostly a problem with the A1000 blitter, which delayed the setting of the BLTDONE flag. This was fixed in later Agnus revisions.

No, not the delay setting the flag, but the CPU can never read the flag as true!

bloodline · 08 March 2019, 11:04

Quote:

Originally Posted by roondar

There are two ways of operating the Blitter. Normal and "nasty" (or BLTHOG).

If the Blitter is operating in normal mode, if will periodically stop itself for a cycle to give the CPU a cycle to do work (to be precise: this happens one every four cycles). In this scenario, the CPU can indeed read the BLTDONE flag even when the Blitter is running.

If the Blitter is operating in "nasty" (or BLTHOG) mode, it doesn't give any cycles to the CPU. So you'd think the CPU can't read the BLTDONE flag. And for the most common blits this is true (given a 68000).

However, for some other blits (notably running the Blitter in clear mode, line mode and most non AD/ABCD blits), the Blitter doesn't actually use all cycles on the bus. In this case, the CPU can read the BLTDONE flag as well, even if the Blitter is in "nasty" mode.

If the CPU is running from fastmemory or the CPU is faster than the standard 68000, it can also read the BLTDONE flag when the Blitter is running, regardless of the blit type (though it can't always read it, just at certain points during the blit).

Note:
It's important to realise that any CPU running from fastmemory or and faster CPU than the base 68000 can thus interrupt the Blitter mid work even if it is running in "nasty" mode. As such it's required to wait on the Blitter, even if you run in "nasty" mode. Otherwise your code can end up crashing the system if run on a faster CPU or with more memory.

Ok, this makes sense! I was under the false assumption that the Blitter would just take every bus cycle available.

Cheers for the explanation.

It still seems that polling this flag will stall the CPU (if it has fastram), and thus a bad idea.

roondar · 08 March 2019, 11:09

Quote:

Originally Posted by bloodline

It still seems that polling this flag will stall the CPU (if it has fastram), and thus a bad idea.

If you want safe blits, there are four common ways to do it:

Wait on the Blitter using a polling system like above
Use the Blitter in nasty mode and run the code from chipmemory, which saves you running a polling loop during the entire blit, but still makes the CPU wait (and you still need to have a Blitter wait loop to prevent faster CPU's from outrunning the Blitter).
Use Blitter interrupts, which don't keep the CPU waiting but add more overhead to blits
Use the Copper to set up & run blits

Any of these will do, but if you want to use the CPU the only safe ways (assuming you want compatibility outside of 68000/chipmem only systems) are to poll* or to use interrupts.

What I do (and I guess others as well), is to attempt to write my code so the CPU has as much work to do between blits as possible. That way I can run the polling loop at the last possible moment (meaning directly prior to setting the Blitter registers for the next blit).

*) or poll + nasty, which is kinda the same thing.

dmacon · 08 March 2019, 13:05

Quote:

Originally Posted by roondar

Use the Copper to set up & run blits

And with that, only in BLTHOG mode, because otherwise, CPU access cycles to Chip-Ram would cause non-determistic execution times of the copperlist, leading to display corruption when the mandatory initialization of the bitplane pointers overlaps with the actual display phase.

Using the copper would actually have some nice advantages, if you design your graphics pipeline in a way that all single blits will finish within a single frame update, possibly with just a single buffer. Using a double buffered copperlist, the CPU needs not to waste precious cycles by constantly polling flags or serving interrupt requests after each blit.

roondar · 08 March 2019, 13:25

Quote:

Originally Posted by dmacon

And with that, only in BLTHOG mode, because otherwise, CPU access cycles to Chip-Ram would cause non-determistic execution times of the copperlist, leading to display corruption when the mandatory initialization of the bitplane pointers overlaps with the actual display phase.

I think I understand what you mean, but I want to clarify to be sure.

So what you mean is not that Copper DMA is interupted by the CPU (because that never happens), but that using the CPU during blits can lead to the Copper wait on the Blitter causing problems by not executing at the time you expect.

Is that correct?

Because if that is what you mean I agree this is something to watch out for. I don't think you really need to use BLTHOG mode for it to be fixed though, you can probably (in most cases anyway) also get away with deferring the Copper wait for the last blit before the display setup until after the display setup is done*.

Obviously, BLTHOG is the easier option here.

Quote:

Using the copper actually would actually have some nice advantages, if you design your graphics pipeline in a way that all single blits will finish within a single frame update. Using a double buffered copperlist, the CPU needs not to waste precious cycles by constantly polling flags or serving interrupt requests after each blit.

This is true, though if you use the Copper then you have to set up the Copper blits as well. This essentially means that the CPU writes the register values into the Copper list and the Copper then writes the registers. This would be additional overhead compared to the CPU setting registers directly.

I'm pretty convinced it'll still be faster than a pure CPU based approach (as you save either the polling loop or the interrupt overhead), but writing these updates is obviously not free either.

*) if this is not enough to make sure no timing issues occur there are still other options. Such as not waiting at all with the Copper and instead starting blits at specific raster locations (based on the worst case scenario time the blits take).

ross · 08 March 2019, 13:36

Quote:

Originally Posted by dmacon

And with that, only in BLTHOG mode, because otherwise, CPU access cycles to Chip-Ram would cause non-determistic execution times of the copperlist, leading to display corruption when the mandatory initialization of the bitplane pointers overlaps with the actual display phase.

This sentence does not make much sense to me, but it may be that I did not understand it

How "CPU cycles to Chip-Ram" can cause non-deterministic execution for the copperlist?
If there is a deterministic thing it is Copper behavior.

The only difference with BLTHOG active is that you may have to wait longer than expected with the Copper for the Blitter finish bit and then in that case some fundamental initializations (bitplane pointer, screen splits, sprites) may not occur in due time and create screen glitches.
But, as usual, just build the copper list appropriately

Maybe you meant this.

arcanist · 08 March 2019, 13:42

Quote:

Originally Posted by roondar

This is true, though if you use the Copper then you have to set up the Copper blits as well. This essentially means that the CPU writes the register values into the Copper list and the Copper then writes the registers. This would be additional overhead compared to the CPU setting registers directly.

One advantage a copperlist has here is that the CPU only needs to update the values which have changed. e.g. I used this for blitting a score to the screen and only update the source address each frame.

When working with the blitter directly from the CPU all of the relevant blitter registers have to be written.

roondar · 08 March 2019, 13:58

Quote:

Originally Posted by arcanist

One advantage a copperlist has here is that the CPU only needs to update the values which have changed. e.g. I used this for blitting a score to the screen and only update the source address each frame.

When working with the blitter directly from the CPU all of the relevant blitter registers have to be written.

This is true, though the Copper will still have to write all blitter registers regardless so purely in terms of registers written you're not going to see a massive advantage*.

But in terms of overall overhead you'll definitely see advantages like this. Being able to skip some/most/all of the calculations etc for a blit after the first time you do a blit is a fairly big advantage after all

Which is why I do believe it'll be faster to use the Copper even though you're actually doing more CPU+Copper 'Blitter register content' writes in total.

*) though it's slightly easier to 'cheat' by selectively setting registers between blits in the same frame when using the Copper vs the CPU.

dmacon · 08 March 2019, 14:01

Quote:

Originally Posted by ross

This sentence does not make much sense to me, but it may be that I did not understand it

How "CPU cycles to Chip-Ram" can cause non-deterministic execution for the copperlist?
If there is a deterministic thing it is Copper behavior.

Yes, but not the copper waiting for the blitter to finish blits which are delayed by CPU accesses to Chip-Ram.

Quote:

But, as usual, just build the copper list appropriately

But then you need to always take the worst case assumption for blitter timing interrupted by CPU (each 3rd blitter cycle is granted to the CPU when BLTHOG is not set, and CPU attempts to access Chip-Ram / registers).

When BLTHOG is set, you could even closely interleave Copper initiated blit cycles with other synchronized modifications to the display output.

I'm thinking about simulating a sprite engine with the copper in a way that the copperlist becomes the sprite attribute table, with a constant 50Hz update refresh, using a single buffer.

roondar · 08 March 2019, 14:07

Quote:

Originally Posted by dmacon

Yes, but not the copper waiting for the blitter to finish blits which are delayed by CPU accesses to Chip-Ram.

But then you need to always take the worst case assumption for blitter timing interrupted by CPU (each 3rd blitter cycle is granted to the CPU when BLTHOG is not set, and CPU attempts to access Chip-Ram / registers).

When BLTHOG is set, you could even closely interleave Copper initiated blit cycles with other synchronized modifications to the display output.

I'm thinking about simulating a sprite engine with the copper in a way that the copperlist becomes the sprite attribute table, with a constant 50Hz update refresh, and using a single buffer.

For extra fun, I've considered a Copper Blitter 'engine' with flexible 'blithogging' (i.e. turn it on and off several times a frame or at specific times).

That would allow for really large blits without interfering with Protracker playback (I've not tested how sensitive Protracker playback is to latency, but I'd guess it might be audible if a half-a-frame blit delays the PT interrupt by 100+ raster lines), or give the CPU some room for other stuff to do.

ross · 08 March 2019, 14:55

Quote:

Originally Posted by dmacon

Yes, but not the copper waiting for the blitter to finish blits which are delayed by CPU accesses to Chip-Ram.

Where you got that idea?
The BFD bit simply add a further condition to the video position for the execution of the next instruction. The blitter finish bit setup (that trigger BFD condition in Copper wait) as nothing to do with CPU accesses to chip RAM.

Or I still do not understand what you mean

dmacon · 08 March 2019, 15:06

Quote:

Originally Posted by ross

Where you got that idea?
The BFD bit simply add a further condition to the video position for the execution of the next instruction.

Exactly. So you modify the raster comparison part of the wait instruction to always yield a true condition (easy, just wait for a position which has already passed), and clear the BFD bit. That way, the copper wait timing is determined solely by the timing of the blitter operation.

Quote:

The blitter finish bit setup (that trigger BFD condition in Copper wait) as nothing to do with CPU accesses to chip RAM.

If BLTHOG is not set, then yes.

Because then, the bitter tliming is also dependent on the CPU access to chip-ram. Meaning that the copper could be delayed in a way that display update related register writes will end-up during active display period, causing glitches in the output.

This behaviour will behave in a cumulative way if you concatenate multiple blit set-ups by using the copper in this way.

Quote:

Or I still do not understand what you mean

Maybe you will now.

I know that at least the famous Hard-Wired demo is using the copper in this manner during the "Glenz Vector" part. All blits which draw the muti-faced polygon figure are initiated by the copper using the wait instruction in the way I described above.

dmacon · 08 March 2019, 15:19

Quote:

Originally Posted by roondar

That would allow for really large blits without interfering with Protracker playback (I've not tested how sensitive Protracker playback is to latency, but I'd guess it might be audible if a half-a-frame blit delays the PT interrupt by 100+ raster lines), or give the CPU some room for other stuff to do.

Some people might even like the slight uneveness in update rate by a few ms, somewhat mimicking real human behavior. Can a human being achieve a playback timing with <10ms error margin???

ross · 08 March 2019, 15:43

@dmacon: I'll give up

We are probably using different words for the same concepts.

Cheers!

roondar · 08 March 2019, 15:47

Quote:

Originally Posted by dmacon

Some people might even like the slight uneveness in update rate by a few ms, somewhat mimicking real human behavior. Can a human being achieve a playback timing with <10ms error margin???

Well, I haven't tested any of this, but Protracker plays samples. If starting (or stopping for that matter) of a sample is off by enough for it to be audible, I'm not so sure the effect would be as graceful as a human musician being off by 10ms or so.

I'd guess that you'd potentially could get either clicks or pops, or audible repetition of parts of the sample playing until it changes into either silence or the next sample.

This on top of any other timing related distortions such as notes starting slightly too late (which probably wouldn't be very notable and would be similar to what human musicians probably end up doing).

It's not that I want to be negative, more that my experiences with slight errors in digital playback are that it usually doesn't sound nice

dmacon · 08 March 2019, 16:21

Quote:

Originally Posted by ross

@dmacon: I'll give up

We are probably using different words for the same concepts.

One more try.
let‘s go back to the following statement:

Quote:

The blitter finish bit setup (that trigger BFD condition in Copper wait) as nothing to do with CPU accesses to chip RAM.

This is not true, because if you disable BLTHOG, the timing of the BFD conditition becoming true depends on how often the cpu accesses the chip-ram during the blit

ross · 08 March 2019, 16:36

Quote:

Originally Posted by dmacon

One more try.

You anticipated me.
I was about to write that I understood what you meant.

Practically I had taken to the letter what you wrote, namely that the blitter finished bit was [directly] influenced by the CPU access in chip mem, while you meant that it's the blitter operation times that are lengthened (if BLTPRI is not setup) [indirectly by CPU accesses] so BBUSY is cleared some time after and Copper operations cumulatively can be delayed.

Right?

EDIT: when one reads quickly while engaged in something else, it happens to not reflect too much on what is written

In my code I have always optimized the BLTPRI bit usage and in addition I have often exploited the different behavior that the Blitter leave to you using different DMA channels for the same operation (see blitter cycles diagram in HRM).

Sure is a good choice to setup BLTPRI=1 if you drive blitter with copper, but even take 'the worst case' scenario when BLTPRI=0 is not so bad.
It depends on what good things you have to do with CPU

08 March 2019, 10:39	#1
bloodline Registered User Join Date: Jan 2017 Location: London, UK Posts: 433	Blitter Busy Flag Something in the HRM has me confused... Code: About the blitter done flag. ---------------------------- If a blit has just been started but has been locked out of memory access because of, for instance, display fetches, this bit may not yet be set. The processor, on the other hand, may be running completely uninhibited out of Fast memory or its internal cache, so it will continue to have memory cycles. The solution is to read a chip memory or hardware register address with the processor before testing the bit. This can easily be done with the sequence: btst.b #DMAB_BLTDONE-8,DMACONR(a1) btst.b #DMAB_BLTDONE-8,DMACONR(a1) When a blit is in progress the Blitter Busy flag is set. This gives the CPU a nice way to check if a blit is in progress. But the Blitter has priority over the CPU when accessing the 24bit bus. So if the CPU tries to read the DMACONR register and the Blitter is busy, the CPU will just wait because the Blitter is using the bus... It seems to me that the CPU can never read the Blitter busy flag as true!? Have I misunderstood? Can the chipregs be written/read by the CPU even when the chipram is in use by another DMA device!?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Immediate Blitter & Wait for Blitter...	volvo_0ne	support.WinUAE	32	18 September 2022 09:52
Blitter busy flag with blitter DMA off?	NorthWay	Coders. Asm / Hardware	9	23 February 2014 21:05
stingray being busy	extralife	request.Demos	30	08 January 2013 16:44
Wow emuunlim have been busy	Paul	Amiga websites reviews	1	05 January 2002 02:27
Server busy problem	Paul	project.EAB	3	03 January 2002 17:03

08 March 2019, 10:59	#2
phx Natteravn Join Date: Nov 2009 Location: Herford / Germany Posts: 2,537	IIRC it was mostly a problem with the A1000 blitter, which delayed the setting of the BLTDONE flag. This was fixed in later Agnus revisions.

08 March 2019, 15:43	#17
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 54 Posts: 4,488	@dmacon: I'll give up We are probably using different words for the same concepts. Cheers!

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)