26 November 2018, 23:27 | #1 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
|
Blitter interrupt woes. Was: A cautionary tale: trying ... with interrupts
Edit: during the discussion in the thread it's become clear I have not actually fixed the issue. So please read the other posts as well to be clear on what the issue is (my first reply is post 8, which goes into a lot more detail). I suppose that's what you get for coding till late and then writing a post on EAB while tired but overly happy because you think you fixed something difficult
Sometimes you run into a bug that takes ages to fix and is worthwhile to share, if only to try to help others to not make your really hard to track down mistake. Or is that my mistake in being too smart for my own good? Anyway... I've been working on a project that contains a Blitter interrupt handler on and off for quite some time now. And all was well, I queued and it blit. Until recently, when my previously apparently rock-solid code started to crash at completely unpredictable times. After three days of banging my head against a wall trying to find the problem, I managed to fix it just now. My sin? I had tried to get around the double intreq acknowledge for A4000's by instead using a different custom chip access, as follows: Code:
... move.w #$40,intreq(a6) ; Acknowledge interrupt move.w (a0)+,bltsize(a6) ; Start blitter (forces A4000 compatibility) <pop stack here> rte Now, very observant readers might note that I've just created a race-condition. A very, very rare race-condition (considering this runs on an A500), but a race-condition indeed. Not me though, I didn't notice Well, three days ago I added some completely unrelated code and was forced to update the number of registers on the stack for the Blitter handler. And boom went the program. At random intervals. I've tried fixing nearly every part of the new code, but alas, it was the old code that was to blame The moral here is to not try to over optimise, nor to try and be too smart around interrupts. For reference, here is the now working version and much more mundane version of the above bit (which should not create race conditions). Code:
... move.w (a0)+,bltsize(a6) ; Start blitter move.w #$40,intreq(a6) ; Acknowledge interrupt move.w #$40,intreq(a6) ; (twice for A4000) <pop stack here> rte Last edited by roondar; 27 November 2018 at 11:10. Reason: Updated post to reflect my OP being well, wrong - I did not fix it :S |
26 November 2018, 23:42 | #2 |
Registered User
Join Date: Dec 2017
Location: Austin, TX
Age: 41
Posts: 409
|
Ouch!
My first stab at a vblank sync routine waited for vpos == 0x12C then vpos != 0x12C. This worked fine except very rarely the display would stutter for quite a few frames. It took me a while to figure out that an almost perfectly frame-synced timer interrupt would drift over time to coincide precisely with that line, so CPU wouldn't see == 0x12C until it drifted further. Then it finally clicked why the examples I'd looked at were testing for >= and then <. |
26 November 2018, 23:46 | #3 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
|
It's always the 'almost always works' code that is hard to figure out
|
27 November 2018, 01:32 | #4 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,502
|
To be honest, I'm probably too tired to see the race condition.
Acknowledging the Blitter interrupt will allow new Blitter interrupts to be registered. But there can't be any new interrupts, because you didn't even start the Blitter. That happens with the following instruction. Also your level 3 interrupt cannot interrupt itself, as long as RTE isn't executed and the interrupt level in SR is lowered. The old solution with two INTREQ-writes after BLTSIZE even looks more dangerous to me, because a very fast blit could be finished before you cleared its IRQ flag for the second time. |
27 November 2018, 06:09 | #5 |
Registered User
Join Date: May 2018
Location: Delta, Canada
Posts: 192
|
I am not used to the blitter, so I do not understand if there is something strange going on.
But, I agree with @phx, the solution looks more dangerous to me. If it gets another higher priority interrupt between starting the blitter and acknowledging the interrupt, it could miss the blitter ready interrupt. Also why do you post increment (a0)+when the next thing that happens is to restore a0from the stack? |
27 November 2018, 08:36 | #6 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
|
The same here, but maybe because I still have not got a nice strong espresso.
EDIT: no effect from coffee Quote:
(the blitter, meanwhile, could finish before the INTREQ cleanup and lost a loop). EDIT: lol, before the coffee I had not even seen that ohth313 written the same This can be a solution: Code:
move.w #$4000,intena(a6) ; Disable move.w (a0),bltsize(a6) ; Start blitter move.w #$40,intreq(a6) ; Acknowledge interrupt move.w #$c000,intena(a6) ; re-Enable and double bus access <pop stack here> rte (but from what I remember you do not set it at 1 if you can). EDIT2: It would be interesting to check if for a single word blit (with BLTPRI=0, considering the prefetch of the blitter and the relative delayed destination write in memory) the BLIT INTREQ bit is set before or after the CPU cleaning in a slow machine (like on A500). Last edited by ross; 27 November 2018 at 10:15. Reason: better explained, maybe.. |
|
27 November 2018, 10:10 | #7 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,327
|
I can see a potential race condition, but in the new code, not in the old one
The code acknowledges interrupts after setting up the blitter. If then another, higher priority interrupt, happens between these two instructions (actually three !) and takes a significant amount of time, the blitter will have finished before the interrupt gets acknowledged, missing an interrupt. I personnally would advocate doing the interrupt ack at the start of the interrupt routine, not at the end. If the a4k problem only requires a second bus access after the ack (i don't know this issue so i'm just guessing), then setting up the blitter registers should easily do the trick. |
27 November 2018, 10:46 | #8 | ||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
|
After reading the comments, I'm seriously beginning to wonder if I actually got this right. I'm also 100% sure that 0:46 isn't the best time to write exuberant posts about how I fixed something (which as it turns out, I may not have actually fixed). As such I have edited the first post a bit to clarify the problem is probably not fixed and encourage reading further
So let's move back a step and discuss... First off, the easy bit Quote:
--- Now about the part where I thought I had a race condition, but now I'm not sure what I have First a bit of background: my code runs a number of different interrupts: there's a VBL handler, a Blitter handler as well as interrupts caused by the Pro Tracker player I'm using (which does both level 2 CIA and level 6 CIA interrupts, plus audio interrupts IIRC). What does not go wrong: the code does not seem to 'miss' any blitter finished interrupts, although phx, hth313, ross and meynaf are correct that this could indeed happen if higher level interrupt triggers. This was something I had not yet considered and is indeed a potential race condition here - this will be fixed. However, it's not missing interrupts that was the problem as far as I can see. What goes wrong: at some point during execution the Blitter interrupt handler is called to handle the next blit, even though the Blitter is still running . This causes memory corruption and crashes because the blitter will get fed new values while it's running. Not good! I know this to be true because during my tests I added a 'panic mode' that would halt the machine with a red background if the blitter busy flag was still on when the Blitter queue handler was called. To my great surprise, this actually did happen on rare occasions. This is the bug I've been trying to fix. Some code might help. The code that branches into the blitter interrupt handler looks something like this (I cut out unwanted elements) Code:
lev3handler movem.l d0-d7/a0-a6,-(sp) lea.l $dff000,a6 btst #6,intreq+1(a6) ; Check for Blitter finished interrupt bne BltIHQueue <deal with other level 3 interrupts> Code:
BltIHQueue ; Test if blitter still runs btst #6,dmaconr(a6) beq .cnt .err move.w #$f00,$180(a6) ; Blitter still running! Surely that can't be!? bra .err .cnt ; Fetch next pointer from queue lea.l blq_ptr(pc),a4 move.l (a4),a5 move.l (a5)+,a0 ; A0 points to current element move.l a5,(a4) ; Step forwards move.w (a0)+,d0 ; Fetch which handler to use jmp .jmp_table(pc,d0.w) .jmp_table ... more entries ... bra BltIHBob ... more entries ... ... and other code ... BltIHBob move.l (a0)+,bltcon0(a6) move.l (a0)+,bltamod(a6) move.l (a0)+,bltcmod(a6) move.l (a0)+,bltapt(a6) move.l (a0)+,bltbpt(a6) move.l (a0),bltcpt(a6) move.l (a0)+,bltdpt(a6) move.w (a0),bltsize(a6) move.w #$40,intreq(a6) ; Acknowledge interrupt move.w #$40,intreq(a6) ; (twice for A4000) movem.l (sp)+,d0-d7/a0-a6 rte Triggering the queue is currently done in as follows in the VBL interrupt: disable interrupts, verify that the queue and blitter are both done, update queue pointer to the newly created queue, set blitter finished interrupt flag in INTREQ and enable interrupts again. Note here that the current program does not go over a frame and the code is 'raster synced' so it's guaranteed that the initial trigger can only happen after the blitter queue is done (this is also visible in the WinUAE Debugger, the blits are the last thing happing during a frame). So in essence my problem here is: how to make this work without either the handler triggering before a blit is done and how to make it work even if another interrupt triggers during the 'end phase'. I'm guessing part of ross's suggestion will be part of this. But I just can't explain how the Blitter can ever still be running when the Blitter finished interrupt is triggered Quote:
Last edited by roondar; 27 November 2018 at 11:08. |
||
27 November 2018, 11:45 | #9 | |||
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,502
|
Quote:
Quote:
Quote:
Last edited by phx; 27 November 2018 at 11:48. Reason: Blitter -> Blitter Queue |
|||
27 November 2018, 12:08 | #10 | |||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
|
Quote:
Quote:
If and only if the blq_done flag is set do I consider the queue ended. Naturally, starting the queue resets the flag before triggering the new queue. (other than that and unrelated to the accuracy of my code, the current code + blits finish way before the new VBL occurs so I know it's done when the next VBL come along) Quote:
The only issue I foresee with this solution is that the issue is so rare. It can take a lot of frames before it goes wrong and if it goes wrong it doesn't usually reoccur immediately. I might still try it, but it might require me pressing the 'g' button several thousands of times before I find the problem At any rate, thanks for the answers so far. I've already learned a few things that will be useful regardless of this issue. |
|||
27 November 2018, 12:20 | #11 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,327
|
Are you 100% sure you properly kill the OS in the background ?
Keeping the workbench running does trigger blitter operations... |
27 November 2018, 12:28 | #12 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
|
Quote:
I'm pretty certain I screw up something myself. Just haven't found what yet. One of the things I'll try tonight is moving the intreq acknowledge to the start of the Blitter handler instead as you suggested. I still feel that shouldn't be why I get random Blitter interrupts while the Blitter is still running, but I'll try. I'll also go over the queue start/setup code with a fine toothed comb (again) just to make 100% sure it really cannot trigger when it shouldn't. |
|
27 November 2018, 12:29 | #13 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,919
|
I thought doing the IRQ acknowledge twice in a row was more of a hardware thing than software. If my understanding is correct, it's not about slowing down the next bus access but about holding the correct value longer. The value to be read into the hardware register will be on the bus for a longer time and thus the IRQ acknowledged reliably. I suspect that if you put the second acknowledge instruction back in it will work again.
|
27 November 2018, 13:14 | #14 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
|
As far as I know the double access is only needed to slow down the CPU on certain A4000 configurations because it otherwise outruns the motherboard. Perhaps this can also happen on other Amiga's, but that's what I've always understood.
Then again, I could be wrong. |
27 November 2018, 13:34 | #15 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,919
|
I think it also happened on the Vampire and probably also on 060s. If it is really just the CPU being too fast for the IRQ source to reset in time, then the problem should not occur if you acknowledge the IRQ at the beginning of the handler, then handle the IRQ and finally do the RTE, provided the handling takes longer than a second hardware write. If, on the other hand, the problem is related to the acknowledging signals settling and becoming stable before doing the RTE, only doing the write twice in a row would overcome the problem.
|
27 November 2018, 13:45 | #16 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
My understanding of the whole IRQ acknowledge problem is this:
1. the interrupt handler executes, and its code and the exception stack frame are placed in the caches 2. the instruction to acknowledge the interrupt is issued, and the write-operation is given to the bus sequencer 3. the CPU doesn't need to wait for the sequencer, and continues to execute the rest of the handler from the caches 4. the write-operation has still not finished, and Paula (or whoever is handling it) has not updated the state of the interrupt lines, so the signal has not propagated through the system and to the CPU 5. the CPU pops the cached exception frame and exits the handler, and the interrupt triggers immediately again I don't think it's an A4000 thing, it can possibly happen on any Amiga with a 68030 with caches enabled, but more likely to happen on fast 68040 and 68060 systems with large caches. But I think Toni will have the final verdict on this. |
27 November 2018, 14:37 | #17 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,327
|
Quote:
|
|
27 November 2018, 14:55 | #18 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
|
Quote:
So acknowledging IRQ at start could not suffice (like a routine that skip everything by a flag and exit). |
|
27 November 2018, 15:43 | #19 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,919
|
But then the "move.w (a0)+,bltsize(a6) ; Start blitter (forces A4000 compatibility)" in the first (unstable) example should be fine?
|
27 November 2018, 16:38 | #20 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
|
Quote:
There could be a 'blitter storm' because of tiny blits or slow CPU response or bus hogged, but only before the queue is emptied, and certainly a 'blitter overlap' can not happen like experienced by roondar. So probably something is wrong elsewhere and i'm very curious about it. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
CIA interrupts... | bloodline | Coders. System | 6 | 18 January 2018 10:33 |
UAE on Smart TV Stick ?? | SkulleateR | support.OtherUAE | 4 | 02 February 2016 23:43 |
Interrupts and Multitasking: Examples? | tygre | Coders. General | 13 | 22 December 2015 04:56 |
smart file system | wilch | support.WinUAE | 5 | 07 March 2011 09:55 |
Advice on interrupts and jumps | alexh | Coders. General | 11 | 20 May 2008 09:42 |
|
|