Copper blitwait safety

Paradroid · 16 November 2012, 19:10

My first post, hello! :-)

I remember that because of some ye olde hardware we need to test the BBUSY bit of DMACONR twice to make sure the blitter isn't doing anything before updating its registers, but I was wondering if the copper also had to test the status twice or was it just a problem with the way the CPU had been hooked up?

EDIT: asking because I can recover some much needed chipram if I don't need to wait twice.

Galahad/FLT · 16 November 2012, 21:00

Quote:

Originally Posted by Paradroid

My first post, hello! :-)

I remember that because of some ye olde hardware we need to test the BBUSY bit of DMACONR twice to make sure the blitter isn't doing anything before updating its registers, but I was wondering if the copper also had to test the status twice or was it just a problem with the way the CPU had been hooked up?

EDIT: asking because I can recover some much needed chipram if I don't need to wait twice.

I was under the impression it was a bug in Kickstart 1.2 machines, after that it was fixed.

Toni Wilen · 16 November 2012, 21:18

It is A1000 Agnus hardware bug. Fixed in "fat" Agnus and later versions.

Workaround does not require DMACONR access, any chip ram or custom register access before "real" DMACONR access works. (same bus, same timing)

Bug can only trigger if code runs in real fast ram (or CPU code cache if available) and there is heavy enough Agnus DMA activity that prevents blitter to get any free DMA slots between BLTSIZE write and DMACONR read.

Anyway, I don't think copper's blitter wait is affected.

Paradroid · 17 November 2012, 08:33

Thanks for the replies

I knew the A1000 had the problem, but wasn't sure if it applied to the earliest A500s too. My own 1.2 machine was fine, but information was hard to come by back then so always played it safe. So all A500s had a FAT Agnus from the very beginning?

Quote:

Originally Posted by Toni Wilen

I don't think copper's blitter wait is affected.

Now that I know what the problem was, I agree.

That's an extra 512bytes of chip ram I can claw back, weeeeee!

mark_k · 17 November 2012, 11:20

I believe the early German-designed A2000s (which had 512KB RAM on-board, the other 512KB in a CPU slot expansion card) also used the original DIP Agnus, so they will have the problem/bug too.

TheDarkCoder · 22 November 2012, 12:45

Quote:

Originally Posted by mark_k

I believe the early German-designed A2000s (which had 512KB RAM on-board, the other 512KB in a CPU slot expansion card) also used the original DIP Agnus, so they will have the problem/bug too.

I have such a machine. Could someone help me writing an example code capable of showing the bug?

Toni Wilen · 23 November 2012, 21:43

If the bug works as documented (blit busy bit is set when blitter does first DMA access), it should be easy to trigger if first cycle in blitter cycle diagram is idle (=any channel combination without A enabled).

Blit with no channels active may be also good test case. No dma activity at all but it still takes 2 cycles * width * height to complete.

Test code could be something like this:

a0 = bltsize
a1 = dmaconr

move.w d0,(a0)
move.w (a1),d1
move.w (a1),d2
move.w (a1),d3

d1 should not have blit busy set but d2 and d3 should.

TheDarkCoder · 24 November 2012, 16:48

Quote:

Originally Posted by Toni Wilen

If the bug works as documented (blit busy bit is set when blitter does first DMA access), it should be easy to trigger if first cycle in blitter cycle diagram is idle (=any channel combination without A enabled).

Blit with no channels active may be also good test case. No dma activity at all but it still takes 2 cycles * width * height to complete.

Test code could be something like this:

a0 = bltsize
a1 = dmaconr

move.w d0,(a0)
move.w (a1),d1
move.w (a1),d2
move.w (a1),d3

d1 should not have blit busy set but d2 and d3 should.

I will try.
BTW: Is there some known program which, due to the bug, does not work properly on Amiga's with buggy Agnus?

mark_k · 24 November 2012, 20:42

Quote:

Originally Posted by Toni Wilen

Bug can only trigger if code runs in real fast ram (or CPU code cache if available) and there is heavy enough Agnus DMA activity that prevents blitter to get any free DMA slots between BLTSIZE write and DMACONR read.

Could the 68000 instruction prefetch maybe allow the bug to show up with your move.w d0,(a0) / move.w (a1),d1 test code, even when running from chip RAM with 68000 CPU??

Quote:

Originally Posted by TheDarkCoder

BTW: Is there some known program which, due to the bug, does not work properly on Amiga's with buggy Agnus?

Probably many games don't bother to do it properly. In the docs for various WHDLoad installers there are mentions of fixing blitter waits. I'm not sure how many of them are just inserting blit waits which were not in the original code, or whether the installer devs also fix blit waits which don't check the bit twice.

I seem to remember the blitter wait code in Carrier Command not checking it twice. But that code was always running from chip RAM so maybe the issue wouldn't show up in practice. The same would apply to most games which tend to always run from chip RAM.

I guess you could try to SetFunction() WaitBlit() on your system to remove the double-check and see if you notice any problems with OS-legal programs? Problems would be more likely to show up if you have an accelerator card with fast RAM.

Toni Wilen · 25 November 2012, 09:38

Quote:

Originally Posted by mark_k

Could the 68000 instruction prefetch maybe allow the bug to show up with your move.w d0,(a0) / move.w (a1),d1 test code, even when running from chip RAM with 68000 CPU??

It is possible. Also all blits also have 2 idle cycles at startup.

Quote:

I seem to remember the blitter wait code in Carrier Command not checking it twice. But that code was always running from chip RAM so maybe the issue wouldn't show up in practice. The same would apply to most games which tend to always run from chip RAM.

I agree and most programs have some code between write to bltsize and inline blitter wait.

Quote:

I guess you could try to SetFunction() WaitBlit() on your system to remove the double-check and see if you notice any problems with OS-legal programs? Problems would be more likely to show up if you have an accelerator card with fast RAM.

jsr WaitBlit(a6) -instruction is slow enough to prevent the bug from triggering. unless you have an accelerator and real fast ram. (68000: 2*word writes to stack + 2*prefetches before normal execution continues)

IMHO bigger problem with DIP Agnus is vblank triggering one scanline too late (1, not 0).

mark_k · 25 November 2012, 17:51

If anyone's interested, I looked at the WaitBlit() code in several Kickstart versions.

KickStart v27.6 WaitBlit() looks like this. The graphics.library routines call InternalWaitBlit directly.

Code:

InternalWaitBlit:
		NOP
1$		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS
...
WaitBlit:	MOVE.L	A6,-(SP)
		JSR	(InternalWaitBlit).L
		LEA	(4,SP),SP
		RTS

Apart from the initial NOP, there doesn't seem to be anything to work around the Agnus bug.

The WaitBlit() code in Kickstart 1.0 was changed, so obviously someone at Commodore-Amiga was aware of the problem back then. The Kickstart 1.0 WaitBlit() code looks like this. Again, functions in graphics.library call InternalWaitBlit directly, whereas user programs calling (_LVOWaitBlit,A6) jump to WaitBlit.

Code:

WaitBlit:	MOVE.L	A6,-(SP)
		JSR	(InternalWaitBlit).L
		LEA	(4,SP),SP
		RTS
...
InternalWaitBlit:
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

1$		NOP
		NOP
		BTST.B	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

For Kickstart 1.1 WaitBlit was changed again. The unnecessary stub was removed. graphics.library routines still call it directly instead of going via the vector in GfxBase.

Code:

WaitBlit:	BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

1$		NOP
		NOP
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

Note the extra initial/dummy test of dmaconr. The NOPs are used to reduce the impact of CPU accesses to the custom chip bus. Does repeatedly checking dmaconr actually slow the blit down in some cases?

WaitBlit in Kickstart 1.2/1.3 is identical to that in 1.1. I didn't look at WaitBlit in Kickstart 2.0 to 3.0, but the version in the 3.1 (v40.70) ROM looks like this:

Code:

WaitBlit:	TST.B	(_custom+dmaconr).L
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

1$		TST.B	(_ciaa).L
		TST.B	(_ciaa).L
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$

		TST.B	(_custom+dmaconr).L
		RTS

Graphics.library routines still call WaitBlit() directly instead of going via the GfxBase vector in v40.70.

The initial BTST #DMAB_BLTDONE-8,(_custom+dmaconr).L has been replaced by TST.B (_custom+dmaconr).L. That's probably just to save two bytes and execute slightly faster. Also instead of NOPs in the main loop there are two TST.B (_ciaa).L instructions. Probably the two NOPs could execute too fast on some CPUs? Reading (_ciaa).L was probably done in order to stay off the custom chip bus.

I wonder any of the various changes to WaitBlit() over the years work around any other Agnus bugs?

Edit: The final TST.B (_custom+dmaconr).L in the v40.70 WaitBlit() is probably to work around the bug mentioned in the WaitBlit autodoc:

Quote:

Because of a different bug in Agnus (currently all revisions thru ECS) this code may return too soon when the blitter has, in fact, not stopped the blit yet, even though blitter busy has been cleared.

Toni Wilen · 25 November 2012, 19:20

Quote:

Originally Posted by mark_k

Note the extra initial/dummy test of dmaconr. The NOPs are used to reduce the impact of CPU accesses to the custom chip bus. Does repeatedly checking dmaconr actually slow the blit down in some cases?

Yes, custom chipset registers are shared with chip ram bus. Blitter will give one cycle to CPU if blitter nasty bit is not set and CPU has been waiting >=3 cycles for chip bus accesss.

Toni Wilen · 26 November 2012, 16:07

Quote:

Originally Posted by mark_k

Edit: The final TST.B (_custom+dmaconr).L in the v40.70 WaitBlit() is probably to work around the bug mentioned in the WaitBlit autodoc:

This may be related to blitter behavior I have noticed: Any blitter register (including BLTSIZE!) can be modified before final D write without any side-effects and it seems blit busy bit gets cleared (and also blitter interrupt flag set) before final D write. I think some demo required it or timing would not be correct.

For example normal D=A copy cycle diagram: A-ADADAD-d (Blitter has 2 stage pipeline, sources are always read twice before first write).

Blitter busy bit is cleared and interrupt triggered after 'D' write has finished, before blit is completely finished. ('d' write done)

I don't know if it is fixed in AGA, testing it requires logic analyzer and I haven't bothered with AGA SMD chips and circuit board..

Mrs Beanbag · 30 November 2012, 15:20

I always set blitter nasty while I'm waiting, there's no point clogging up the bus by polling the blitter status while it's running, it will only slow it down. The next attempt to read DMAConR will force the CPU to wait for the blitter anyway (although I still do a loop, for safety, but in theory it should always exit straight away).

Not that this helps you if you're using the Copper to do blits...

phx · 07 December 2012, 12:53

Quote:

Originally Posted by Mrs Beanbag

I always set blitter nasty while I'm waiting, there's no point clogging up the bus by polling the blitter status while it's running, it will only slow it down. The next attempt to read DMAConR will force the CPU to wait for the blitter anyway (although I still do a loop, for safety, but in theory it should always exit straight away).

So theoretically any access to chip RAM or custom chip registers should block, until the Blitter is done? Even when the code is running in Fast RAM?
Can somebody confirm that?

Toni Wilen · 07 December 2012, 21:00

Quote:

Originally Posted by phx

So theoretically any access to chip RAM or custom chip registers should block, until the Blitter is done? Even when the code is running in Fast RAM?
Can somebody confirm that?

Yes, in theory but it isn't safe in practice.

Even if blit uses all available DMA slots (for example A=D copy) and blitter nasty set, there are always 3 idle cycles at the start of blit and before final write there is another idle cycle.

phx · 07 December 2012, 23:43

Thanks. Then I need the test for the Blitter-Done flag.

Does a WAITBLIT macro like this make sense?

Code:

        macro   WAITBLIT
        move.w  #$8400,DMACON(a6)
.\@     btst    #6,DMACONR(a6)
        bne.b   .\@  
        move.w  #$0400,DMACON(a6)
        endm

I'm getting this warning from UAE:

Code:

warning: program is doing blitpri hacks.

mark_k · 10 December 2012, 17:14

Quote:

Originally Posted by mark_k

I believe the early German-designed A2000s (which had 512KB RAM on-board, the other 512KB in a CPU slot expansion card) also used the original DIP Agnus, so they will have the problem/bug too.

And also, the A2000 $C00000 RAM is true fast RAM, so the CPU won't be halted from running code there when the DMAF_BLITHOG bit is set.

TheDarkCoder · 11 December 2012, 12:00

Quote:

Originally Posted by mark_k

And also, the A2000 $C00000 RAM is true fast RAM, so the CPU won't be halted from running code there when the DMAF_BLITHOG bit is set.

i have done only preliminary tests, but up to now I was unable to put the bug in evidence.
Maybe only the NTSC Agnuses, which were produced earlier, have the bug?
My A2000A has a PAL Agnus.

mark_k · 11 December 2012, 12:38

That wouldn't be an Agnus bug. The CPU should never be halted when the DMAF_BLITHOG bit is set and a blit is in progress, if it's running code out of true fast RAM (e.g. Zorro II fast RAM or $C00000 RAM on an A2000).

The CPU would be halted in that case if it's running code from $C00000 slow RAM as with an A500/A501 or B2000. Because Agnus treats CPU accesses to slow RAM the same as chip RAM.

16 November 2012, 19:10	#1
Paradroid Rock Lobster Join Date: Nov 2012 Location: Macclesfield Age: 49 Posts: 40	Copper blitwait safety My first post, hello! :-) I remember that because of some ye olde hardware we need to test the BBUSY bit of DMACONR twice to make sure the blitter isn't doing anything before updating its registers, but I was wondering if the copper also had to test the status twice or was it just a problem with the way the CPU had been hooked up? EDIT: asking because I can recover some much needed chipram if I don't need to wait twice. Last edited by Paradroid; 16 November 2012 at 19:11. Reason: more info

07 December 2012, 23:43	#17
phx Natteravn Join Date: Nov 2009 Location: Herford / Germany Posts: 2,500	Thanks. Then I need the test for the Blitter-Done flag. Does a WAITBLIT macro like this make sense? Code: macro WAITBLIT move.w #$8400,DMACON(a6) .\@ btst #6,DMACONR(a6) bne.b .\@ move.w #$0400,DMACON(a6) endm I'm getting this warning from UAE: Code: warning: program is doing blitpri hacks.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Combining copper scrolling with copper background	phx	Coders. Asm / Hardware	16	13 February 2021 12:41
Best use of copper in a game	donnie	Retrogaming General Discussion	16	09 August 2010 21:34
Copper Bars	Vortex	Coders. Tutorials	51	26 June 2009 23:23
Is HOL backed up regularly for safety?	rsn8887	HOL suggestions and feedback	1	18 October 2006 19:36
Online Safety	Badders	support.Hardware	1	22 August 2006 16:11

16 November 2012, 21:18	#3
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,517	It is A1000 Agnus hardware bug. Fixed in "fat" Agnus and later versions. Workaround does not require DMACONR access, any chip ram or custom register access before "real" DMACONR access works. (same bus, same timing) Bug can only trigger if code runs in real fast ram (or CPU code cache if available) and there is heavy enough Agnus DMA activity that prevents blitter to get any free DMA slots between BLTSIZE write and DMACONR read. Anyway, I don't think copper's blitter wait is affected.

17 November 2012, 11:20	#5
mark_k Registered User Join Date: Aug 2004 Location: Posts: 3,343	I believe the early German-designed A2000s (which had 512KB RAM on-board, the other 512KB in a CPU slot expansion card) also used the original DIP Agnus, so they will have the problem/bug too.

23 November 2012, 21:43	#7
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,517	If the bug works as documented (blit busy bit is set when blitter does first DMA access), it should be easy to trigger if first cycle in blitter cycle diagram is idle (=any channel combination without A enabled). Blit with no channels active may be also good test case. No dma activity at all but it still takes 2 cycles * width * height to complete. Test code could be something like this: a0 = bltsize a1 = dmaconr move.w d0,(a0) move.w (a1),d1 move.w (a1),d2 move.w (a1),d3 d1 should not have blit busy set but d2 and d3 should.

30 November 2012, 15:20	#14
Mrs Beanbag Glastonbridge Software Join Date: Jan 2012 Location: Edinburgh/Scotland Posts: 2,243	I always set blitter nasty while I'm waiting, there's no point clogging up the bus by polling the blitter status while it's running, it will only slow it down. The next attempt to read DMAConR will force the CPU to wait for the blitter anyway (although I still do a loop, for safety, but in theory it should always exit straight away). Not that this helps you if you're using the Copper to do blits...

11 December 2012, 12:38	#20
mark_k Registered User Join Date: Aug 2004 Location: Posts: 3,343	That wouldn't be an Agnus bug. The CPU should never be halted when the DMAF_BLITHOG bit is set and a blit is in progress, if it's running code out of true fast RAM (e.g. Zorro II fast RAM or $C00000 RAM on an A2000). The CPU would be halted in that case if it's running code from $C00000 slow RAM as with an A500/A501 or B2000. Because Agnus treats CPU accesses to slow RAM the same as chip RAM.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)