English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 27 November 2018, 00:27   #1
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 767
Question Blitter interrupt woes. Was: A cautionary tale: trying ... with interrupts

Edit: during the discussion in the thread it's become clear I have not actually fixed the issue. So please read the other posts as well to be clear on what the issue is (my first reply is post 8, which goes into a lot more detail). I suppose that's what you get for coding till late and then writing a post on EAB while tired but overly happy because you think you fixed something difficult

Sometimes you run into a bug that takes ages to fix and is worthwhile to share, if only to try to help others to not make your really hard to track down mistake. Or is that my mistake in being too smart for my own good? Anyway...

I've been working on a project that contains a Blitter interrupt handler on and off for quite some time now. And all was well, I queued and it blit. Until recently, when my previously apparently rock-solid code started to crash at completely unpredictable times. After three days of banging my head against a wall trying to find the problem, I managed to fix it just now.

My sin? I had tried to get around the double intreq acknowledge for A4000's by instead using a different custom chip access, as follows:
Code:
...
move.w    #$40,intreq(a6)      ; Acknowledge interrupt
move.w    (a0)+,bltsize(a6)    ; Start blitter (forces A4000 compatibility)
<pop stack here>
rte
And this worked fine for many months on my A500 (and WinUAE, where I do my developing)!

Now, very observant readers might note that I've just created a race-condition. A very, very rare race-condition (considering this runs on an A500), but a race-condition indeed. Not me though, I didn't notice

Well, three days ago I added some completely unrelated code and was forced to update the number of registers on the stack for the Blitter handler. And boom went the program. At random intervals. I've tried fixing nearly every part of the new code, but alas, it was the old code that was to blame

The moral here is to not try to over optimise, nor to try and be too smart around interrupts. For reference, here is the now working version and much more mundane version of the above bit (which should not create race conditions).
Code:
...
move.w    (a0)+,bltsize(a6)    ; Start blitter
move.w    #$40,intreq(a6)      ; Acknowledge interrupt
move.w    #$40,intreq(a6)      ; (twice for A4000)
<pop stack here>
rte
Hope this gave some enjoyment or at least some useful knowledge

Last edited by roondar; 27 November 2018 at 12:10. Reason: Updated post to reflect my OP being well, wrong - I did not fix it :S
roondar is offline  
Old 27 November 2018, 00:42   #2
arcanist
Registered User
 
Join Date: Dec 2017
Location: Austin, TX
Age: 36
Posts: 140
Ouch!

My first stab at a vblank sync routine waited for vpos == 0x12C then vpos != 0x12C. This worked fine except very rarely the display would stutter for quite a few frames. It took me a while to figure out that an almost perfectly frame-synced timer interrupt would drift over time to coincide precisely with that line, so CPU wouldn't see == 0x12C until it drifted further. Then it finally clicked why the examples I'd looked at were testing for >= and then <.
arcanist is offline  
Old 27 November 2018, 00:46   #3
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 767
It's always the 'almost always works' code that is hard to figure out
roondar is offline  
Old 27 November 2018, 02:32   #4
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,252
To be honest, I'm probably too tired to see the race condition.

Acknowledging the Blitter interrupt will allow new Blitter interrupts to be registered. But there can't be any new interrupts, because you didn't even start the Blitter. That happens with the following instruction.

Also your level 3 interrupt cannot interrupt itself, as long as RTE isn't executed and the interrupt level in SR is lowered.

The old solution with two INTREQ-writes after BLTSIZE even looks more dangerous to me, because a very fast blit could be finished before you cleared its IRQ flag for the second time.
phx is offline  
Old 27 November 2018, 07:09   #5
hth313
Registered User
hth313's Avatar
 
Join Date: May 2018
Location: Delta, Canada
Posts: 146
I am not used to the blitter, so I do not understand if there is something strange going on.

But, I agree with @phx, the solution looks more dangerous to me. If it gets another higher priority interrupt between starting the blitter and acknowledging the interrupt, it could miss the blitter ready interrupt.

Also why do you post increment
(a0)+
when the next thing that happens is to restore
a0
from the stack?
hth313 is offline  
Old 27 November 2018, 09:36   #6
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,458
Quote:
Originally Posted by phx View Post
To be honest, I'm probably too tired to see the race condition.
The same here, but maybe because I still have not got a nice strong espresso.
EDIT: no effect from coffee

Quote:
The old solution with two INTREQ-writes after BLTSIZE even looks more dangerous to me, because a very fast blit could be finished before you cleared its IRQ flag for the second time.
And not only this.. an IRQ with high level can occur immediately after the blitter start and disrupt future chain
(the blitter, meanwhile, could finish before the INTREQ cleanup and lost a loop).
EDIT: lol, before the coffee I had not even seen that ohth313 written the same

This can be a solution:
Code:
move.w    #$4000,intena(a6)    ; Disable
move.w    (a0),bltsize(a6)     ; Start blitter
move.w    #$40,intreq(a6)      ; Acknowledge interrupt
move.w    #$c000,intena(a6)    ; re-Enable and double bus access
<pop stack here>
rte
Anyway the speech is worth if BLTPRI=0 (and a decent sized blit ), otherwise what phx wrote could happen even in this situation
(but from what I remember you do not set it at 1 if you can).

EDIT2:
It would be interesting to check if for a single word blit (with BLTPRI=0, considering the prefetch of the blitter and the relative delayed destination write in memory) the BLIT INTREQ bit is set before or after the CPU cleaning in a slow machine (like on A500).

Last edited by ross; 27 November 2018 at 11:15. Reason: better explained, maybe..
ross is offline  
Old 27 November 2018, 11:10   #7
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,377
I can see a potential race condition, but in the new code, not in the old one

The code acknowledges interrupts after setting up the blitter.
If then another, higher priority interrupt, happens between these two instructions (actually three !) and takes a significant amount of time, the blitter will have finished before the interrupt gets acknowledged, missing an interrupt.

I personnally would advocate doing the interrupt ack at the start of the interrupt routine, not at the end. If the a4k problem only requires a second bus access after the ack (i don't know this issue so i'm just guessing), then setting up the blitter registers should easily do the trick.
meynaf is online now  
Old 27 November 2018, 11:46   #8
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 767
After reading the comments, I'm seriously beginning to wonder if I actually got this right. I'm also 100% sure that 0:46 isn't the best time to write exuberant posts about how I fixed something (which as it turns out, I may not have actually fixed). As such I have edited the first post a bit to clarify the problem is probably not fixed and encourage reading further

So let's move back a step and discuss...

First off, the easy bit
Quote:
Originally Posted by hth313 View Post
Also why do you post increment
(a0)+
when the next thing that happens is to restore
a0
from the stack?
You are correct, the post increment is not needed. It's only there because I copied that line from all the other registers I set (i.e. there's a bunch of move.w (a0)+ and move.l (a0)+ instructions directly above the bit I showed). I'll clear it out of the source when I redo this part.


---
Now about the part where I thought I had a race condition, but now I'm not sure what I have

First a bit of background: my code runs a number of different interrupts: there's a VBL handler, a Blitter handler as well as interrupts caused by the Pro Tracker player I'm using (which does both level 2 CIA and level 6 CIA interrupts, plus audio interrupts IIRC).

What does not go wrong: the code does not seem to 'miss' any blitter finished interrupts, although phx, hth313, ross and meynaf are correct that this could indeed happen if higher level interrupt triggers. This was something I had not yet considered and is indeed a potential race condition here - this will be fixed. However, it's not missing interrupts that was the problem as far as I can see.

What goes wrong: at some point during execution the Blitter interrupt handler is called to handle the next blit, even though the Blitter is still running . This causes memory corruption and crashes because the blitter will get fed new values while it's running. Not good!

I know this to be true because during my tests I added a 'panic mode' that would halt the machine with a red background if the blitter busy flag was still on when the Blitter queue handler was called. To my great surprise, this actually did happen on rare occasions. This is the bug I've been trying to fix. Some code might help.

The code that branches into the blitter interrupt handler looks something like this (I cut out unwanted elements)
Code:
lev3handler
	movem.l	d0-d7/a0-a6,-(sp)
	lea.l	$dff000,a6

	btst	#6,intreq+1(a6)		; Check for Blitter finished interrupt
	bne	BltIHQueue
<deal with other level 3 interrupts>
The blitter queue handler looks something like this (again, I removed parts).
Code:
BltIHQueue
	; Test if blitter still runs
	btst	#6,dmaconr(a6)
	beq	.cnt
	
.err
	move.w	#$f00,$180(a6)	; Blitter still running! Surely that can't be!?
	bra	.err
	
.cnt
	; Fetch next pointer from queue
	lea.l	blq_ptr(pc),a4
	move.l	(a4),a5
	move.l	(a5)+,a0	; A0 points to current element
	move.l	a5,(a4)		; Step forwards
	
	move.w	(a0)+,d0	; Fetch which handler to use
	jmp	.jmp_table(pc,d0.w)
	
.jmp_table
	... more entries ...
	bra	BltIHBob
	... more entries ...
	
	... and other code ...

BltIHBob
	move.l	(a0)+,bltcon0(a6)
	move.l	(a0)+,bltamod(a6)
	move.l	(a0)+,bltcmod(a6)
	move.l	(a0)+,bltapt(a6)
	move.l	(a0)+,bltbpt(a6)
	move.l	(a0),bltcpt(a6)
	move.l	(a0)+,bltdpt(a6)
	move.w	(a0),bltsize(a6)
	move.w	#$40,intreq(a6)      ; Acknowledge interrupt
	move.w	#$40,intreq(a6)      ; (twice for A4000)
	movem.l	(sp)+,d0-d7/a0-a6
	rte
This is the current code, I've not added the suggestions from the thread yet (apart from changing the last (a0)+ to (a0)). And yes, this code does trigger the 'impossible' test at the beginning.

Triggering the queue is currently done in as follows in the VBL interrupt: disable interrupts, verify that the queue and blitter are both done, update queue pointer to the newly created queue, set blitter finished interrupt flag in INTREQ and enable interrupts again. Note here that the current program does not go over a frame and the code is 'raster synced' so it's guaranteed that the initial trigger can only happen after the blitter queue is done (this is also visible in the WinUAE Debugger, the blits are the last thing happing during a frame).

So in essence my problem here is: how to make this work without either the handler triggering before a blit is done and how to make it work even if another interrupt triggers during the 'end phase'.

I'm guessing part of ross's suggestion will be part of this. But I just can't explain how the Blitter can ever still be running when the Blitter finished interrupt is triggered

Quote:
Originally Posted by meynaf View Post
I personnally would advocate doing the interrupt ack at the start of the interrupt routine, not at the end. If the a4k problem only requires a second bus access after the ack (i don't know this issue so i'm just guessing), then setting up the blitter registers should easily do the trick.
This is excellent advice and I will make use of it in the next version. AFAIK the a4k issue is indeed sidestepped by writing to the bus a second time. However, as I wrote above, this should only fix interrupts being missed and that does not seem to be the reason it crashes.

Last edited by roondar; 27 November 2018 at 12:08.
roondar is offline  
Old 27 November 2018, 12:45   #9
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,252
Quote:
Originally Posted by roondar View Post
Triggering the queue is currently done in as follows in the VBL interrupt: disable interrupts, verify that the queue and blitter are both done, update queue pointer to the newly created queue, set blitter finished interrupt flag in INTREQ and enable interrupts again.
As I understand the Blitter-IRQ flag is set manually here to get the Blitter Queue started? And you are doing it only once, or everytime you enqueue a new Blitter job? This could be a problem.

Quote:
Note here that the current program does not go over a frame and the code is 'raster synced' so it's guaranteed that the initial trigger can only happen after the blitter queue is done
Not sure if I understand that. The frame-rendering code starts with VERTB? And how do you make sure the Blitter Queue was completely processed, when adding the first Blit to the queue?

Quote:
But I just can't explain how the Blitter can ever still be running when the Blitter finished interrupt is triggered
As you wrote you are debugging with UAE, can't you just set a memory watch-point on the BLTSIZE register? So you can be sure nothing else triggered the Blitter between your Blitter interrupts.

Last edited by phx; 27 November 2018 at 12:48. Reason: Blitter -> Blitter Queue
phx is offline  
Old 27 November 2018, 13:08   #10
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 767
Quote:
Originally Posted by phx View Post
As I understand the Blitter-IRQ flag is set manually here to get the Blitter Queue started? And you are doing it only once, or everytime you enqueue a new Blitter job? This could be a problem.
Currently, once per frame, after I've enqueued all jobs. I've looked at this several times to make sure this is how I do it because you are right, it could be a problem.
Quote:
Not sure if I understand that. The frame-rendering code starts with VERTB? And how do you make sure the Blitter Queue was completely processed, when adding the first Blit to the queue?
The way I do this is by having the last item in the queue point to a 'queue done handler' instead of a blit operation, so the blitter queue handler gets triggered as usual when the last blit is finished and this 'done' handler then updates a flag in memory (blq_done) and acknowledges the interrupt.

If and only if the blq_done flag is set do I consider the queue ended. Naturally, starting the queue resets the flag before triggering the new queue. (other than that and unrelated to the accuracy of my code, the current code + blits finish way before the new VBL occurs so I know it's done when the next VBL come along)
Quote:
As you wrote you are debugging with UAE, can't you just set a memory watch-point on the BLTSIZE register? So you can be sure nothing else triggered the Blitter between your Blitter interrupts.
That could be a solution, but the only thing that should be able to trigger during the blitter interrupts is the ProTracker player and I'm pretty certain it doesn't use the Blitter (though you'd be more certain as I use the one from Solid Gold). Obviously, I could be wrong and it worth a look. At this point I'll try anything!

The only issue I foresee with this solution is that the issue is so rare. It can take a lot of frames before it goes wrong and if it goes wrong it doesn't usually reoccur immediately. I might still try it, but it might require me pressing the 'g' button several thousands of times before I find the problem

At any rate, thanks for the answers so far. I've already learned a few things that will be useful regardless of this issue.
roondar is offline  
Old 27 November 2018, 13:20   #11
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,377
Are you 100% sure you properly kill the OS in the background ?
Keeping the workbench running does trigger blitter operations...
meynaf is online now  
Old 27 November 2018, 13:28   #12
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 767
Quote:
Originally Posted by meynaf View Post
Are you 100% sure you properly kill the OS in the background ?
Keeping the workbench running does trigger blitter operations...
Workbench really shouldn't be running, the program uses a custom bootblock and a trackloader. By the time the bootblock is run everything else really should be off (part of the bootblock disables all interrupts).

I'm pretty certain I screw up something myself. Just haven't found what yet.

One of the things I'll try tonight is moving the intreq acknowledge to the start of the Blitter handler instead as you suggested. I still feel that shouldn't be why I get random Blitter interrupts while the Blitter is still running, but I'll try. I'll also go over the queue start/setup code with a fine toothed comb (again) just to make 100% sure it really cannot trigger when it shouldn't.
roondar is offline  
Old 27 November 2018, 13:29   #13
grond
Registered User

 
Join Date: Jun 2015
Location: Germany
Posts: 538
I thought doing the IRQ acknowledge twice in a row was more of a hardware thing than software. If my understanding is correct, it's not about slowing down the next bus access but about holding the correct value longer. The value to be read into the hardware register will be on the bus for a longer time and thus the IRQ acknowledged reliably. I suspect that if you put the second acknowledge instruction back in it will work again.
grond is offline  
Old 27 November 2018, 14:14   #14
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 767
As far as I know the double access is only needed to slow down the CPU on certain A4000 configurations because it otherwise outruns the motherboard. Perhaps this can also happen on other Amiga's, but that's what I've always understood.

Then again, I could be wrong.
roondar is offline  
Old 27 November 2018, 14:34   #15
grond
Registered User

 
Join Date: Jun 2015
Location: Germany
Posts: 538
Quote:
Originally Posted by roondar View Post
As far as I know the double access is only needed to slow down the CPU on certain A4000 configurations because it otherwise outruns the motherboard. Perhaps this can also happen on other Amiga's, but that's what I've always understood.
I think it also happened on the Vampire and probably also on 060s. If it is really just the CPU being too fast for the IRQ source to reset in time, then the problem should not occur if you acknowledge the IRQ at the beginning of the handler, then handle the IRQ and finally do the RTE, provided the handling takes longer than a second hardware write. If, on the other hand, the problem is related to the acknowledging signals settling and becoming stable before doing the RTE, only doing the write twice in a row would overcome the problem.
grond is offline  
Old 27 November 2018, 14:45   #16
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,231
My understanding of the whole IRQ acknowledge problem is this:

1. the interrupt handler executes, and its code and the exception stack frame are placed in the caches
2. the instruction to acknowledge the interrupt is issued, and the write-operation is given to the bus sequencer
3. the CPU doesn't need to wait for the sequencer, and continues to execute the rest of the handler from the caches
4. the write-operation has still not finished, and Paula (or whoever is handling it) has not updated the state of the interrupt lines, so the signal has not propagated through the system and to the CPU
5. the CPU pops the cached exception frame and exits the handler, and the interrupt triggers immediately again

I don't think it's an A4000 thing, it can possibly happen on any Amiga with a 68030 with caches enabled, but more likely to happen on fast 68040 and 68060 systems with large caches.

But I think Toni will have the final verdict on this.
Leffmann is offline  
Old 27 November 2018, 15:37   #17
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,377
Quote:
Originally Posted by Leffmann View Post
My understanding of the whole IRQ acknowledge problem is this:

1. the interrupt handler executes, and its code and the exception stack frame are placed in the caches
2. the instruction to acknowledge the interrupt is issued, and the write-operation is given to the bus sequencer
3. the CPU doesn't need to wait for the sequencer, and continues to execute the rest of the handler from the caches
4. the write-operation has still not finished, and Paula (or whoever is handling it) has not updated the state of the interrupt lines, so the signal has not propagated through the system and to the CPU
5. the CPU pops the cached exception frame and exits the handler, and the interrupt triggers immediately again

I don't think it's an A4000 thing, it can possibly happen on any Amiga with a 68030 with caches enabled, but more likely to happen on fast 68040 and 68060 systems with large caches.

But I think Toni will have the final verdict on this.
That explanation seems quite likely. But it won't happen on 68030 because any instruction accessing memory, even in cache, will wait for the last write to finish.
meynaf is online now  
Old 27 November 2018, 15:55   #18
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,458
Quote:
Originally Posted by Leffmann View Post
My understanding of the whole IRQ acknowledge problem is this:

1. the interrupt handler executes, and its code and the exception stack frame are placed in the caches
2. the instruction to acknowledge the interrupt is issued, and the write-operation is given to the bus sequencer
3. the CPU doesn't need to wait for the sequencer, and continues to execute the rest of the handler from the caches
4. the write-operation has still not finished, and Paula (or whoever is handling it) has not updated the state of the interrupt lines, so the signal has not propagated through the system and to the CPU
5. the CPU pops the cached exception frame and exits the handler, and the interrupt triggers immediately again

I don't think it's an A4000 thing, it can possibly happen on any Amiga with a 68030 with caches enabled, but more likely to happen on fast 68040 and 68060 systems with large caches.

But I think Toni will have the final verdict on this.
This.

So acknowledging IRQ at start could not suffice (like a routine that skip everything by a flag and exit).
ross is offline  
Old 27 November 2018, 16:43   #19
grond
Registered User

 
Join Date: Jun 2015
Location: Germany
Posts: 538
But then the "move.w (a0)+,bltsize(a6) ; Start blitter (forces A4000 compatibility)" in the first (unstable) example should be fine?
grond is offline  
Old 27 November 2018, 17:38   #20
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,458
Quote:
Originally Posted by grond View Post
But then the "move.w (a0)+,bltsize(a6) ; Start blitter (forces A4000 compatibility)" in the first (unstable) example should be fine?
Yes.
There could be a 'blitter storm' because of tiny blits or slow CPU response or bus hogged, but only before the queue is emptied, and certainly a 'blitter overlap' can not happen like experienced by roondar.
So probably something is wrong elsewhere and i'm very curious about it.
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
CIA interrupts... bloodline Coders. System 6 18 January 2018 11:33
UAE on Smart TV Stick ?? SkulleateR support.OtherUAE 4 03 February 2016 00:43
Interrupts and Multitasking: Examples? tygre Coders. General 13 22 December 2015 05:56
smart file system wilch support.WinUAE 5 07 March 2011 10:55
Advice on interrupts and jumps alexh Coders. General 11 20 May 2008 10:42

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:02.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.09384 seconds with 13 queries