Blitter engine working on Interrupt Request

mcgeezer · 22 September 2018, 13:12

Hi All,

I'm looking at trying to get the most out of my sprite engine and wondered if any of the experienced coders here could help with explaining how a blitter/bob engine works with interrupt requests?

The aim is to not have the CPU waiting for the blitter all the time, I do quite a lot of large blits in my project and I want the CPU to be getting on with other things in the background if possible.

Cheers,
Geezer

ross · 22 September 2018, 14:53

Hi geezer, only some hint because the solutions can be vary and sometimes not so productive.

Practically what you want to avoid is the blitter wait code.

You can simply compute/create/fill a growing queue containing the values to be inserted into the blitter's registers and write an IRQ management routine that controls the BLIT bit during IRQ3 (level 3 is shared between copper, vbl and blitter).
Of course you must have set the same bit in INTENA to allow interrupts when the blitter finished its job.

During IRQ code, if the IRQ source is confirmed, then read values from the blitter queue (compiled by normal main code routine) and write all the hw registers, of course BLTSIZE last. Blitter start as usual. So you can purge the head on the queue and set pointer to next values.

Better if you do the acknowledge (move.w # $40,INTREQ) before the BLTSIZE write because if you have BLTPRI set and only chip/slow RAM you can end up the blitter operation before the INTREQ write..
Then you're good to RTE.

But there is a fundamental point that often moves you away from this method.
Latency for IRQ management: apart from the cycles to start the routine, also the saving of the registers and the various control code during the IRQ.

But there is a Sacred Graal: use the copper property to wait on blitter job completion (wait BFD bit) and use the very same copper to setup the blitter registers.
Practically this is very seldom implementation because is really hard to setup the copper (with the various video syncro effects) and at the same time enqueue blitter commands.

Good job!

Photon · 22 September 2018, 15:17

Quote:

Originally Posted by mcgeezer

Hi All,

I'm looking at trying to get the most out of my sprite engine and wondered if any of the experienced coders here could help with explaining how a blitter/bob engine works with interrupt requests?

The aim is to not have the CPU waiting for the blitter all the time, I do quite a lot of large blits in my project and I want the CPU to be getting on with other things in the background if possible.

Cheers,
Geezer

First, check that BLTPRI is on. (It must be on in order to fit as many blits as possible in a frame.)

And if it's on, chances are you are not waiting for the Blitter. I.e. the CPU doesn't get control back until the bob has been blitted anyway.

You can test this by making a blitwait that sets the background color before waiting and resets it after. If the color slivers are only 1px high and not wider than say 1/6th of a scanline, what you're seeing is the execution time of just 1 loop of the blitwait: it's already finished and you're waiting for nothing.

You can also count the repetitions of the blitwait loop and reset the counter each frame.

ross · 22 September 2018, 15:29

Yes, the BLTPRI bit could change completely your coding style and flow.
The same difference to thinking single task or multitask (it is not always true because many blitter modes do not use all the bus cycles but you can view it as a general rule)

mcgeezer · 22 September 2018, 16:10

Quote:

Originally Posted by Photon

First, check that BLTPRI is on. (It must be on in order to fit as many blits as possible in a frame.)

And if it's on, chances are you are not waiting for the Blitter. I.e. the CPU doesn't get control back until the bob has been blitted anyway.

You can test this by making a blitwait that sets the background color before waiting and resets it after. If the color slivers are only 1px high and not wider than say 1/6th of a scanline, what you're seeing is the execution time of just 1 loop of the blitwait: it's already finished and you're waiting for nothing.

You can also count the repetitions of the blitwait loop and reset the counter each frame.

So by this rationale I don't have to wait for the blitter if the Blit nasty is on?

Edit: Oddly, if I set the BLTPRI on my large blits are taking longer to complete.

phx · 22 September 2018, 16:44

Quote:

Originally Posted by mcgeezer

So by this rationale I don't have to wait for the blitter if the Blit nasty is on?

Only if your code is running in Chip/Slow RAM.

Quote:

Edit: Oddly, if I set the BLTPRI on my large blits are taking longer to complete.

How did you measure the completion? I doubt that the BBUSY-flag becomes inactive sooner, when you don't set BLTPRI.

ross · 22 September 2018, 17:01

Quote:

So by this rationale I don't have to wait for the blitter if the Blit nasty is on?

Quote:

Originally Posted by phx

Only if your code is running in Chip/Slow RAM.

There are fast processors (with ICache), so you can end-up writing to blitter registers before BBUSY is set-up by blitter logic, so beware..
A blitter-wait, or something that has the same effect, should be always used, also with BLTPRI set.

Well, if you optimize something to death for A500 can be acceptable the missed wait

[EDIT: and then the poor WHDLoad coder need to patch your code for the speedy people..]

ross · 22 September 2018, 17:09

Quote:

Originally Posted by mcgeezer

Edit: Oddly, if I set the BLTPRI on my large blits are taking longer to complete.

This is technically impossible.

Are you sure you're not reading the timing while the blitter is still running?
Remember that with BLTPRI=0 the processor has interleaved cycles with blitter on internal bus.

mcgeezer · 22 September 2018, 17:22

I've done a little video showing what is going on.

The background bit planes are done in 3 blits, I set the colour to Red,Green or Blue prior to blitting each plane respecively.

The first run is without the Blitter priority set and then with.

As you can see without it I don't see any CPU clocks for the first blit.

I have cycle exact set on WinUAE too so I'm not sure what is going on.

[ Show youtube player ]

Toni Wilen · 22 September 2018, 17:23

You still need to wait for blit even when running in chip ram with blitter nasty because not all channel mode combinations use all cycles. (for example D only, most fill modes, line draw)

Also due to CPU prefetch and blitter pipelining, there is 3-4 cycles available before blitter "really starts" after writing to BLTSIZE which allows CPU to execute following instruction, at least partially.

ross · 22 September 2018, 17:46

Quote:

Originally Posted by mcgeezer

I've done a little video showing what is going on.

The background bit planes are done in 3 blits, I set the colour to Red,Green or Blue prior to blitting each plane respecively.

The key is where the blitter wait is

If you do the wait before the blit (how indeed it must be done), then you actually change color (the yellow) practically immediately.

If you are doing the blitter wait after (the right way to take a timing) then you have the later effect, showing the actual time of the blit
(with BLTPRI=1 the internal bus is hogged by blitter so you cannot write on color register and the effect is the same as a blitter wait done after the blit).

As said before you still have the blitter active after the third blit

ross · 22 September 2018, 18:02

Quote:

Originally Posted by Toni Wilen

Also due to CPU prefetch and blitter pipelining, there is 3-4 cycles available before blitter "really starts"

Toni, so code like this with BLTPRI=1 can totally avoid a blitter-wait?:

(a6=custombase)
tst.w (a6)
tst.w (a6)
tst.w (a6)

move.w #something,blitter_reg(a6)

Photon · 22 September 2018, 18:04

Yes, if BLTPRI is off, control will return to the CPU after a few cycles (and will then execute the instructions that measure the blit time prematurely).

Blitwaits must always precede a blit. For measurement purposes (or other result-use purposes such as collision detection), you should temporarily add a Blitwait after, as well.

LeCaravage · 29 September 2018, 17:54

Just wondering, is it necessary to set the blitpri on ? I mean, if the blitter uses all 3 sources it will let very few cycles free for the cpu . The advantage of irq blitter is to simulate a "multitask" between blit operation and other calculations from the cpu.
The question will be obviously yes if the target was an Amiga 1200 with fastram or accelerator but not sure (I mean I don't know the answer) with a stock A1200.

roondar · 29 September 2018, 22:45

Quote:

Originally Posted by LeCaravage

Just wondering, is it necessary to set the blitpri on ? I mean, if the blitter uses all 3 sources it will let very few cycles free for the cpu . The advantage of irq blitter is to simulate a "multitask" between blit operation and other calculations from the cpu.
The question will be obviously yes if the target was an Amiga 1200 with fastram or accelerator but not sure (I mean I don't know the answer) with a stock A1200.

It all depends. If your program can do useful work during the blit, it's actually almost always faster to run with BLTPRI off, even on an 68000 based machine and definitely on a 68020+ machine.

Now make no mistake, your blit will take longer if you do this (it'll suffer a 25% or so penalty). However, your overall system performance will likely go up, more so if your useful work includes expensive instructions such as multiplications/divisions (on 68000 shifts/rotates as well).

However, if you can't or can't easily do useful work during a blit and instead need to spend a large percentage of the blit waiting on it to finish, then keeping BLTPRI on is always the better option.

What I personally do is have my blitwait macro set BLTPRI to ON at the start, wait for the blit to finish regularly (because a fast processor might outrun this switch) and then after waiting switch it back off. Then, I only call the blitwait macro immediately prior to setting the actual blitter registers. This way - in my experiments anyway - performance tends to be highest.

ross · 30 September 2018, 10:05

Quote:

Originally Posted by roondar

What I personally do is have my blitwait macro set BLTPRI to ON at the start, wait for the blit to finish regularly (because a fast processor might outrun this switch) and then after waiting switch it back off. Then, I only call the blitwait macro immediately prior to setting the actual blitter registers. This way - in my experiments anyway - performance tends to be highest.

Can be interesting to couple with http://eab.abime.net/showpost.php?p=...5&postcount=12
Considering that disabling BLTPRI at end of macro is an internal bus access you can remove a tst.w and eventually have even more performance.
[only for blit that use all the cycles]

roondar · 01 October 2018, 00:03

Quote:

Originally Posted by ross

Can be interesting to couple with http://eab.abime.net/showpost.php?p=...5&postcount=12
Considering that disabling BLTPRI at end of macro is an internal bus access you can remove a tst.w and eventually have even more performance.
[only for blit that use all the cycles]

That would indeed be interesting. I decided to test this macro as a replacement idea:

Code:

BlitWait    MACRO
            move.w    #$8400,dmacon(\1)
            tst.w     dmaconr(\1)
            move.w    #$0400,dmacon(\1)
            ENDM

My tests in WinUAE show that this actually seems to work fine (though only for all-cycles-used blits, as expected), even when I set the system to be a 68040 'fastest possible' with a bunch of fast memory. Didn't test it on real systems though, but it may be a good alternative for the standard way of blitter waiting that wastes quite a bit of time on a basic A500.

ross · 15 January 2019, 19:17

I reopen this thread because I have an interesting case study and maybe Toni could provide further explanations.

In a message he wrote:

Quote:

Originally Posted by Toni Wilen

You still need to wait for blit even when running in chip ram with blitter nasty because not all channel mode combinations use all cycles. (for example D only, most fill modes, line draw)

Also due to CPU prefetch and blitter pipelining, there is 3-4 cycles available before blitter "really starts" after writing to BLTSIZE which allows CPU to execute following instruction, at least partially.

So my thinking was:

Quote:

Originally Posted by ross

Toni, so code like this with BLTPRI=1 can totally avoid a blitter-wait?:

(a6=custombase)
tst.w (a6)
tst.w (a6)
tst.w (a6)

move.w #something,blitter_reg(a6)

My reasoning was: the first two tst.w cover 4 cycles before the blitter start and the third covers the hole I have before the second reading of source data.
Obviously in the case of using a blitter sequence that cover all the available cycles (for example ABCD, ABD, ACD, I dont consider the others since the BLTPRI would leave me with free cycles anyway).
Graphic example:
start blitter ->

- - A0 B0 C0 - A1 B1 C1 D0 A2 B2 C2 D1 D2

with tst.w (as X):
start blitter ->

X X A0 B0 C0 X A1 B1 C1 D0 A2 B2 C2 D1 D2

So no wait blit needed.

Then roondar wrote:

Quote:

Originally Posted by roondar

That would indeed be interesting. I decided to test this macro as a replacement idea:

Code:

BlitWait    MACRO
            move.w    #$8400,dmacon(\1)
            tst.w     dmaconr(\1)
            move.w    #$0400,dmacon(\1)
            ENDM

My tests in WinUAE show that this actually seems to work fine (though only for all-cycles-used blits, as expected), even when I set the system to be a 68040 'fastest possible' with a bunch of fast memory. Didn't test it on real systems though, but it may be a good alternative for the standard way of blitter waiting that wastes quite a bit of time on a basic A500.

At the beginning I was puzzled because it lacks an access to cover all the free cycles, but I thought I had maybe misunderstood what Toni meant for <<there is 3-4 cycles available before blitter "really starts">>.

Then today a surprise. I've a very tight blitter routine with ABD channels active and with this blitter wait:

Code:

    move.w    #$8400,DMACON(A6)
    tst.w     (a6)
    move.w    #$0400,DMACON(A6)
    move.l    a0,BLTBPTH(a6)

this routine every now and then can fail! (i've garbage on destination due to B channel pointing to wrong data..)

If I use:

Code:

    move.w    #$8400,DMACON(A6)
    tst.w     (a6)
    tst.w     (a6)
    move.w    #$0400,DMACON(A6)
    move.l    a0,BLTBPTH(a6)

it never fail, so maybe my first thinking was right.

I'm curious to understand why to roondar instead that kind of blitter wait always worked.
Any particular situation (his or mine) that give different results?

I'm on latest WinUAE 4.1.0, 030 custom x16 freq. (but this frequency is not significative, the bottleneck for instructions that require few cycles is only the internal bus access), all CPU caches active, CE options active, no Wait for or Immediate Blitter (so at maximum real machine compatibility for an expanded emulated machine).

roondar · 15 January 2019, 20:27

The fault is entirely mine, I worded my reply poorly.

I tested only a few combinations (AD & ABCD) to be exact. These use all available cycles even at the start and they work. It obviously wasn’t clear I only meant this particular use case and hadn’t tested all options.

ross · 15 January 2019, 20:44

Hi roondar, mine was not a criticism for you

I just want to understand what is the best way for this particular blithog/blitwait "combo".
This:

Code:

    move.w    #$8400,DMACON(A6)
    tst.w     (a6)
    tst.w     (a6)
    move.w    #$0400,DMACON(A6)

seems to ALWAYS work (and speed-up the result when we have intensive blitter usage!) but I would like to be sure.

22 September 2018, 13:12	#1
mcgeezer Registered User Join Date: Oct 2017 Location: Sunderland, England Posts: 2,702	Blitter engine working on Interrupt Request Hi All, I'm looking at trying to get the most out of my sprite engine and wondered if any of the experienced coders here could help with explaining how a blitter/bob engine works with interrupt requests? The aim is to not have the CPU waiting for the blitter all the time, I do quite a lot of large blits in my project and I want the CPU to be getting on with other things in the background if possible. Cheers, Geezer

22 September 2018, 14:53	#2
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Hi geezer, only some hint because the solutions can be vary and sometimes not so productive. Practically what you want to avoid is the blitter wait code. You can simply compute/create/fill a growing queue containing the values to be inserted into the blitter's registers and write an IRQ management routine that controls the BLIT bit during IRQ3 (level 3 is shared between copper, vbl and blitter). Of course you must have set the same bit in INTENA to allow interrupts when the blitter finished its job. During IRQ code, if the IRQ source is confirmed, then read values from the blitter queue (compiled by normal main code routine) and write all the hw registers, of course BLTSIZE last. Blitter start as usual. So you can purge the head on the queue and set pointer to next values. Better if you do the acknowledge (move.w # $40,INTREQ) before the BLTSIZE write because if you have BLTPRI set and only chip/slow RAM you can end up the blitter operation before the INTREQ write.. Then you're good to RTE. But there is a fundamental point that often moves you away from this method. Latency for IRQ management: apart from the cycles to start the routine, also the saving of the registers and the various control code during the IRQ. But there is a Sacred Graal: use the copper property to wait on blitter job completion (wait BFD bit) and use the very same copper to setup the blitter registers. Practically this is very seldom implementation because is really hard to setup the copper (with the various video syncro effects) and at the same time enqueue blitter commands. Good job! Last edited by ross; 22 September 2018 at 15:02. Reason: some grammar..., be patient with my english :(

15 January 2019, 20:44	#20
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Hi roondar, mine was not a criticism for you I just want to understand what is the best way for this particular blithog/blitwait "combo". This: Code: move.w #$8400,DMACON(A6) tst.w (a6) tst.w (a6) move.w #$0400,DMACON(A6) seems to ALWAYS work (and speed-up the result when we have intensive blitter usage!) but I would like to be sure.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Blitter interrupt during VERTB interrupt	phx	Coders. Asm / Hardware	38	01 October 2021 19:54
Request: Chaos Engine 2 graphics rips	CaptainNow	request.Other	7	20 June 2015 20:40
[WinUAE Request] Level 7 NMI Interrupt	Boltar	request.UAE Wishlist	2	26 December 2014 19:32
[Request] - Chaos Engine AGA	Zetr0	project.Sprites	9	03 November 2008 23:32
Thomas the Tank Engine II working	LordIvo	support.Games	5	13 December 2007 10:49

22 September 2018, 15:29	#4
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Yes, the BLTPRI bit could change completely your coding style and flow. The same difference to thinking single task or multitask (it is not always true because many blitter modes do not use all the bus cycles but you can view it as a general rule)

22 September 2018, 17:22	#9
mcgeezer Registered User Join Date: Oct 2017 Location: Sunderland, England Posts: 2,702	I've done a little video showing what is going on. The background bit planes are done in 3 blits, I set the colour to Red,Green or Blue prior to blitting each plane respecively. The first run is without the Blitter priority set and then with. As you can see without it I don't see any CPU clocks for the first blit. I have cycle exact set on WinUAE too so I'm not sure what is going on. [ Show youtube player ]

22 September 2018, 17:23	#10
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,506	You still need to wait for blit even when running in chip ram with blitter nasty because not all channel mode combinations use all cycles. (for example D only, most fill modes, line draw) Also due to CPU prefetch and blitter pipelining, there is 3-4 cycles available before blitter "really starts" after writing to BLTSIZE which allows CPU to execute following instruction, at least partially.

22 September 2018, 18:04	#13
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,602	Yes, if BLTPRI is off, control will return to the CPU after a few cycles (and will then execute the instructions that measure the blit time prematurely). Blitwaits must always precede a blit. For measurement purposes (or other result-use purposes such as collision detection), you should temporarily add a Blitwait after, as well.

29 September 2018, 17:54	#14
LeCaravage Registered User Join Date: May 2017 Location: AmigaLand Posts: 459	Just wondering, is it necessary to set the blitpri on ? I mean, if the blitter uses all 3 sources it will let very few cycles free for the cpu . The advantage of irq blitter is to simulate a "multitask" between blit operation and other calculations from the cpu. The question will be obviously yes if the target was an Amiga 1200 with fastram or accelerator but not sure (I mean I don't know the answer) with a stock A1200.

15 January 2019, 20:27	#19
roondar Registered User Join Date: Jul 2015 Location: The Netherlands Posts: 3,410	The fault is entirely mine, I worded my reply poorly. I tested only a few combinations (AD & ABCD) to be exact. These use all available cycles even at the start and they work. It obviously wasn’t clear I only meant this particular use case and hadn’t tested all options.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)