English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 16 November 2012, 19:10   #1
Paradroid
Rock Lobster
 
Join Date: Nov 2012
Location: Macclesfield
Age: 49
Posts: 40
Copper blitwait safety

My first post, hello! :-)

I remember that because of some ye olde hardware we need to test the BBUSY bit of DMACONR twice to make sure the blitter isn't doing anything before updating its registers, but I was wondering if the copper also had to test the status twice or was it just a problem with the way the CPU had been hooked up?

EDIT: asking because I can recover some much needed chipram if I don't need to wait twice.

Last edited by Paradroid; 16 November 2012 at 19:11. Reason: more info
Paradroid is offline  
Old 16 November 2012, 21:00   #2
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 8,997
Quote:
Originally Posted by Paradroid View Post
My first post, hello! :-)

I remember that because of some ye olde hardware we need to test the BBUSY bit of DMACONR twice to make sure the blitter isn't doing anything before updating its registers, but I was wondering if the copper also had to test the status twice or was it just a problem with the way the CPU had been hooked up?

EDIT: asking because I can recover some much needed chipram if I don't need to wait twice.
I was under the impression it was a bug in Kickstart 1.2 machines, after that it was fixed.
Galahad/FLT is online now  
Old 16 November 2012, 21:18   #3
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
It is A1000 Agnus hardware bug. Fixed in "fat" Agnus and later versions.

Workaround does not require DMACONR access, any chip ram or custom register access before "real" DMACONR access works. (same bus, same timing)

Bug can only trigger if code runs in real fast ram (or CPU code cache if available) and there is heavy enough Agnus DMA activity that prevents blitter to get any free DMA slots between BLTSIZE write and DMACONR read.

Anyway, I don't think copper's blitter wait is affected.
Toni Wilen is offline  
Old 17 November 2012, 08:33   #4
Paradroid
Rock Lobster
 
Join Date: Nov 2012
Location: Macclesfield
Age: 49
Posts: 40
Thanks for the replies

I knew the A1000 had the problem, but wasn't sure if it applied to the earliest A500s too. My own 1.2 machine was fine, but information was hard to come by back then so always played it safe. So all A500s had a FAT Agnus from the very beginning?

Quote:
Originally Posted by Toni Wilen View Post
I don't think copper's blitter wait is affected.
Now that I know what the problem was, I agree.

That's an extra 512bytes of chip ram I can claw back, weeeeee!
Paradroid is offline  
Old 17 November 2012, 11:20   #5
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,343
I believe the early German-designed A2000s (which had 512KB RAM on-board, the other 512KB in a CPU slot expansion card) also used the original DIP Agnus, so they will have the problem/bug too.
mark_k is offline  
Old 22 November 2012, 12:45   #6
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by mark_k View Post
I believe the early German-designed A2000s (which had 512KB RAM on-board, the other 512KB in a CPU slot expansion card) also used the original DIP Agnus, so they will have the problem/bug too.
I have such a machine. Could someone help me writing an example code capable of showing the bug?
TheDarkCoder is offline  
Old 23 November 2012, 21:43   #7
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
If the bug works as documented (blit busy bit is set when blitter does first DMA access), it should be easy to trigger if first cycle in blitter cycle diagram is idle (=any channel combination without A enabled).

Blit with no channels active may be also good test case. No dma activity at all but it still takes 2 cycles * width * height to complete.

Test code could be something like this:

a0 = bltsize
a1 = dmaconr

move.w d0,(a0)
move.w (a1),d1
move.w (a1),d2
move.w (a1),d3

d1 should not have blit busy set but d2 and d3 should.
Toni Wilen is offline  
Old 24 November 2012, 16:48   #8
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by Toni Wilen View Post
If the bug works as documented (blit busy bit is set when blitter does first DMA access), it should be easy to trigger if first cycle in blitter cycle diagram is idle (=any channel combination without A enabled).

Blit with no channels active may be also good test case. No dma activity at all but it still takes 2 cycles * width * height to complete.

Test code could be something like this:

a0 = bltsize
a1 = dmaconr

move.w d0,(a0)
move.w (a1),d1
move.w (a1),d2
move.w (a1),d3

d1 should not have blit busy set but d2 and d3 should.
I will try.
BTW: Is there some known program which, due to the bug, does not work properly on Amiga's with buggy Agnus?
TheDarkCoder is offline  
Old 24 November 2012, 20:42   #9
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,343
Quote:
Originally Posted by Toni Wilen View Post
Bug can only trigger if code runs in real fast ram (or CPU code cache if available) and there is heavy enough Agnus DMA activity that prevents blitter to get any free DMA slots between BLTSIZE write and DMACONR read.
Could the 68000 instruction prefetch maybe allow the bug to show up with your move.w d0,(a0) / move.w (a1),d1 test code, even when running from chip RAM with 68000 CPU??

Quote:
Originally Posted by TheDarkCoder View Post
BTW: Is there some known program which, due to the bug, does not work properly on Amiga's with buggy Agnus?
Probably many games don't bother to do it properly. In the docs for various WHDLoad installers there are mentions of fixing blitter waits. I'm not sure how many of them are just inserting blit waits which were not in the original code, or whether the installer devs also fix blit waits which don't check the bit twice.

I seem to remember the blitter wait code in Carrier Command not checking it twice. But that code was always running from chip RAM so maybe the issue wouldn't show up in practice. The same would apply to most games which tend to always run from chip RAM.

I guess you could try to SetFunction() WaitBlit() on your system to remove the double-check and see if you notice any problems with OS-legal programs? Problems would be more likely to show up if you have an accelerator card with fast RAM.
mark_k is offline  
Old 25 November 2012, 09:38   #10
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
Quote:
Originally Posted by mark_k View Post
Could the 68000 instruction prefetch maybe allow the bug to show up with your move.w d0,(a0) / move.w (a1),d1 test code, even when running from chip RAM with 68000 CPU??
It is possible. Also all blits also have 2 idle cycles at startup.

Quote:
I seem to remember the blitter wait code in Carrier Command not checking it twice. But that code was always running from chip RAM so maybe the issue wouldn't show up in practice. The same would apply to most games which tend to always run from chip RAM.
I agree and most programs have some code between write to bltsize and inline blitter wait.

Quote:
I guess you could try to SetFunction() WaitBlit() on your system to remove the double-check and see if you notice any problems with OS-legal programs? Problems would be more likely to show up if you have an accelerator card with fast RAM.
jsr WaitBlit(a6) -instruction is slow enough to prevent the bug from triggering. unless you have an accelerator and real fast ram. (68000: 2*word writes to stack + 2*prefetches before normal execution continues)

IMHO bigger problem with DIP Agnus is vblank triggering one scanline too late (1, not 0).
Toni Wilen is offline  
Old 25 November 2012, 17:51   #11
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,343
If anyone's interested, I looked at the WaitBlit() code in several Kickstart versions.

KickStart v27.6 WaitBlit() looks like this. The graphics.library routines call InternalWaitBlit directly.
Code:
InternalWaitBlit:
		NOP
1$		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS
...
WaitBlit:	MOVE.L	A6,-(SP)
		JSR	(InternalWaitBlit).L
		LEA	(4,SP),SP
		RTS
Apart from the initial NOP, there doesn't seem to be anything to work around the Agnus bug.

The WaitBlit() code in Kickstart 1.0 was changed, so obviously someone at Commodore-Amiga was aware of the problem back then. The Kickstart 1.0 WaitBlit() code looks like this. Again, functions in graphics.library call InternalWaitBlit directly, whereas user programs calling (_LVOWaitBlit,A6) jump to WaitBlit.
Code:
WaitBlit:	MOVE.L	A6,-(SP)
		JSR	(InternalWaitBlit).L
		LEA	(4,SP),SP
		RTS
...
InternalWaitBlit:
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

1$		NOP
		NOP
		BTST.B	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS
For Kickstart 1.1 WaitBlit was changed again. The unnecessary stub was removed. graphics.library routines still call it directly instead of going via the vector in GfxBase.
Code:
WaitBlit:	BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

1$		NOP
		NOP
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS
Note the extra initial/dummy test of dmaconr. The NOPs are used to reduce the impact of CPU accesses to the custom chip bus. Does repeatedly checking dmaconr actually slow the blit down in some cases?

WaitBlit in Kickstart 1.2/1.3 is identical to that in 1.1. I didn't look at WaitBlit in Kickstart 2.0 to 3.0, but the version in the 3.1 (v40.70) ROM looks like this:
Code:
WaitBlit:	TST.B	(_custom+dmaconr).L
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$
		RTS

1$		TST.B	(_ciaa).L
		TST.B	(_ciaa).L
		BTST	#DMAB_BLTDONE-8,(_custom+dmaconr).L
		BNE.B	1$

		TST.B	(_custom+dmaconr).L
		RTS
Graphics.library routines still call WaitBlit() directly instead of going via the GfxBase vector in v40.70.

The initial BTST #DMAB_BLTDONE-8,(_custom+dmaconr).L has been replaced by TST.B (_custom+dmaconr).L. That's probably just to save two bytes and execute slightly faster. Also instead of NOPs in the main loop there are two TST.B (_ciaa).L instructions. Probably the two NOPs could execute too fast on some CPUs? Reading (_ciaa).L was probably done in order to stay off the custom chip bus.

I wonder any of the various changes to WaitBlit() over the years work around any other Agnus bugs?

Edit: The final TST.B (_custom+dmaconr).L in the v40.70 WaitBlit() is probably to work around the bug mentioned in the WaitBlit autodoc:
Quote:
Because of a different bug in Agnus (currently all revisions thru ECS) this code may return too soon when the blitter has, in fact, not stopped the blit yet, even though blitter busy has been cleared.

Last edited by mark_k; 25 November 2012 at 20:20.
mark_k is offline  
Old 25 November 2012, 19:20   #12
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
Quote:
Originally Posted by mark_k View Post
Note the extra initial/dummy test of dmaconr. The NOPs are used to reduce the impact of CPU accesses to the custom chip bus. Does repeatedly checking dmaconr actually slow the blit down in some cases?
Yes, custom chipset registers are shared with chip ram bus. Blitter will give one cycle to CPU if blitter nasty bit is not set and CPU has been waiting >=3 cycles for chip bus accesss.
Toni Wilen is offline  
Old 26 November 2012, 16:07   #13
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
Quote:
Originally Posted by mark_k View Post
Edit: The final TST.B (_custom+dmaconr).L in the v40.70 WaitBlit() is probably to work around the bug mentioned in the WaitBlit autodoc:
This may be related to blitter behavior I have noticed: Any blitter register (including BLTSIZE!) can be modified before final D write without any side-effects and it seems blit busy bit gets cleared (and also blitter interrupt flag set) before final D write. I think some demo required it or timing would not be correct.

For example normal D=A copy cycle diagram: A-ADADAD-d (Blitter has 2 stage pipeline, sources are always read twice before first write).

Blitter busy bit is cleared and interrupt triggered after 'D' write has finished, before blit is completely finished. ('d' write done)

I don't know if it is fixed in AGA, testing it requires logic analyzer and I haven't bothered with AGA SMD chips and circuit board..
Toni Wilen is offline  
Old 30 November 2012, 15:20   #14
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
I always set blitter nasty while I'm waiting, there's no point clogging up the bus by polling the blitter status while it's running, it will only slow it down. The next attempt to read DMAConR will force the CPU to wait for the blitter anyway (although I still do a loop, for safety, but in theory it should always exit straight away).

Not that this helps you if you're using the Copper to do blits...
Mrs Beanbag is offline  
Old 07 December 2012, 12:53   #15
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Quote:
Originally Posted by Mrs Beanbag View Post
I always set blitter nasty while I'm waiting, there's no point clogging up the bus by polling the blitter status while it's running, it will only slow it down. The next attempt to read DMAConR will force the CPU to wait for the blitter anyway (although I still do a loop, for safety, but in theory it should always exit straight away).
So theoretically any access to chip RAM or custom chip registers should block, until the Blitter is done? Even when the code is running in Fast RAM?
Can somebody confirm that?
phx is offline  
Old 07 December 2012, 21:00   #16
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
Quote:
Originally Posted by phx View Post
So theoretically any access to chip RAM or custom chip registers should block, until the Blitter is done? Even when the code is running in Fast RAM?
Can somebody confirm that?
Yes, in theory but it isn't safe in practice.

Even if blit uses all available DMA slots (for example A=D copy) and blitter nasty set, there are always 3 idle cycles at the start of blit and before final write there is another idle cycle.
Toni Wilen is offline  
Old 07 December 2012, 23:43   #17
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Thanks. Then I need the test for the Blitter-Done flag.

Does a WAITBLIT macro like this make sense?
Code:
        macro   WAITBLIT
        move.w  #$8400,DMACON(a6)
.\@     btst    #6,DMACONR(a6)
        bne.b   .\@  
        move.w  #$0400,DMACON(a6)
        endm
I'm getting this warning from UAE:
Code:
warning: program is doing blitpri hacks.
phx is offline  
Old 10 December 2012, 17:14   #18
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,343
Quote:
Originally Posted by mark_k View Post
I believe the early German-designed A2000s (which had 512KB RAM on-board, the other 512KB in a CPU slot expansion card) also used the original DIP Agnus, so they will have the problem/bug too.
And also, the A2000 $C00000 RAM is true fast RAM, so the CPU won't be halted from running code there when the DMAF_BLITHOG bit is set.
mark_k is offline  
Old 11 December 2012, 12:00   #19
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by mark_k View Post
And also, the A2000 $C00000 RAM is true fast RAM, so the CPU won't be halted from running code there when the DMAF_BLITHOG bit is set.
i have done only preliminary tests, but up to now I was unable to put the bug in evidence.
Maybe only the NTSC Agnuses, which were produced earlier, have the bug?
My A2000A has a PAL Agnus.
TheDarkCoder is offline  
Old 11 December 2012, 12:38   #20
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,343
That wouldn't be an Agnus bug. The CPU should never be halted when the DMAF_BLITHOG bit is set and a blit is in progress, if it's running code out of true fast RAM (e.g. Zorro II fast RAM or $C00000 RAM on an A2000).

The CPU would be halted in that case if it's running code from $C00000 slow RAM as with an A500/A501 or B2000. Because Agnus treats CPU accesses to slow RAM the same as chip RAM.
mark_k is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Combining copper scrolling with copper background phx Coders. Asm / Hardware 16 13 February 2021 12:41
Best use of copper in a game donnie Retrogaming General Discussion 16 09 August 2010 21:34
Copper Bars Vortex Coders. Tutorials 51 26 June 2009 23:23
Is HOL backed up regularly for safety? rsn8887 HOL suggestions and feedback 1 18 October 2006 19:36
Online Safety Badders support.Hardware 1 22 August 2006 16:11

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:53.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11030 seconds with 15 queries