English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 22 September 2018, 14:12   #1
mcgeezer
Registered User

 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 857
Blitter engine working on Interrupt Request

Hi All,

I'm looking at trying to get the most out of my sprite engine and wondered if any of the experienced coders here could help with explaining how a blitter/bob engine works with interrupt requests?

The aim is to not have the CPU waiting for the blitter all the time, I do quite a lot of large blits in my project and I want the CPU to be getting on with other things in the background if possible.

Cheers,
Geezer
mcgeezer is online now  
Old 22 September 2018, 15:53   #2
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Hi geezer, only some hint because the solutions can be vary and sometimes not so productive.

Practically what you want to avoid is the blitter wait code.

You can simply compute/create/fill a growing queue containing the values to be inserted into the blitter's registers and write an IRQ management routine that controls the BLIT bit during IRQ3 (level 3 is shared between copper, vbl and blitter).
Of course you must have set the same bit in INTENA to allow interrupts when the blitter finished its job.

During IRQ code, if the IRQ source is confirmed, then read values from the blitter queue (compiled by normal main code routine) and write all the hw registers, of course BLTSIZE last. Blitter start as usual. So you can purge the head on the queue and set pointer to next values.

Better if you do the acknowledge (move.w # $40,INTREQ) before the BLTSIZE write because if you have BLTPRI set and only chip/slow RAM you can end up the blitter operation before the INTREQ write..
Then you're good to RTE.

But there is a fundamental point that often moves you away from this method.
Latency for IRQ management: apart from the cycles to start the routine, also the saving of the registers and the various control code during the IRQ.

But there is a Sacred Graal: use the copper property to wait on blitter job completion (wait BFD bit) and use the very same copper to setup the blitter registers.
Practically this is very seldom implementation because is really hard to setup the copper (with the various video syncro effects) and at the same time enqueue blitter commands.

Good job!

Last edited by ross; 22 September 2018 at 16:02. Reason: some grammar..., be patient with my english :(
ross is offline  
Old 22 September 2018, 16:17   #3
Photon
Moderator
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Posts: 4,596
Quote:
Originally Posted by mcgeezer View Post
Hi All,

I'm looking at trying to get the most out of my sprite engine and wondered if any of the experienced coders here could help with explaining how a blitter/bob engine works with interrupt requests?

The aim is to not have the CPU waiting for the blitter all the time, I do quite a lot of large blits in my project and I want the CPU to be getting on with other things in the background if possible.

Cheers,
Geezer
First, check that BLTPRI is on. (It must be on in order to fit as many blits as possible in a frame.)

And if it's on, chances are you are not waiting for the Blitter. I.e. the CPU doesn't get control back until the bob has been blitted anyway.

You can test this by making a blitwait that sets the background color before waiting and resets it after. If the color slivers are only 1px high and not wider than say 1/6th of a scanline, what you're seeing is the execution time of just 1 loop of the blitwait: it's already finished and you're waiting for nothing.

You can also count the repetitions of the blitwait loop and reset the counter each frame.
Photon is offline  
Old 22 September 2018, 16:29   #4
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Yes, the BLTPRI bit could change completely your coding style and flow.
The same difference to thinking single task or multitask (it is not always true because many blitter modes do not use all the bus cycles but you can view it as a general rule)
ross is offline  
Old 22 September 2018, 17:10   #5
mcgeezer
Registered User

 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 857
Quote:
Originally Posted by Photon View Post
First, check that BLTPRI is on. (It must be on in order to fit as many blits as possible in a frame.)

And if it's on, chances are you are not waiting for the Blitter. I.e. the CPU doesn't get control back until the bob has been blitted anyway.

You can test this by making a blitwait that sets the background color before waiting and resets it after. If the color slivers are only 1px high and not wider than say 1/6th of a scanline, what you're seeing is the execution time of just 1 loop of the blitwait: it's already finished and you're waiting for nothing.

You can also count the repetitions of the blitwait loop and reset the counter each frame.
So by this rationale I don't have to wait for the blitter if the Blit nasty is on?

Edit: Oddly, if I set the BLTPRI on my large blits are taking longer to complete.
mcgeezer is online now  
Old 22 September 2018, 17:44   #6
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,251
Quote:
Originally Posted by mcgeezer View Post
So by this rationale I don't have to wait for the blitter if the Blit nasty is on?
Only if your code is running in Chip/Slow RAM.

Quote:
Edit: Oddly, if I set the BLTPRI on my large blits are taking longer to complete.
How did you measure the completion? I doubt that the BBUSY-flag becomes inactive sooner, when you don't set BLTPRI.
phx is offline  
Old 22 September 2018, 18:01   #7
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Quote:
So by this rationale I don't have to wait for the blitter if the Blit nasty is on?
Quote:
Originally Posted by phx View Post
Only if your code is running in Chip/Slow RAM.
There are fast processors (with ICache), so you can end-up writing to blitter registers before BBUSY is set-up by blitter logic, so beware..
A blitter-wait, or something that has the same effect, should be always used, also with BLTPRI set.

Well, if you optimize something to death for A500 can be acceptable the missed wait
[EDIT: and then the poor WHDLoad coder need to patch your code for the speedy people..]

Last edited by ross; 22 September 2018 at 18:14.
ross is offline  
Old 22 September 2018, 18:09   #8
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Quote:
Originally Posted by mcgeezer View Post
Edit: Oddly, if I set the BLTPRI on my large blits are taking longer to complete.
This is technically impossible.

Are you sure you're not reading the timing while the blitter is still running?
Remember that with BLTPRI=0 the processor has interleaved cycles with blitter on internal bus.
ross is offline  
Old 22 September 2018, 18:22   #9
mcgeezer
Registered User

 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 857
I've done a little video showing what is going on.

The background bit planes are done in 3 blits, I set the colour to Red,Green or Blue prior to blitting each plane respecively.

The first run is without the Blitter priority set and then with.

As you can see without it I don't see any CPU clocks for the first blit.

I have cycle exact set on WinUAE too so I'm not sure what is going on.

[ Show youtube player ]
mcgeezer is online now  
Old 22 September 2018, 18:23   #10
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 43
Posts: 22,354
You still need to wait for blit even when running in chip ram with blitter nasty because not all channel mode combinations use all cycles. (for example D only, most fill modes, line draw)

Also due to CPU prefetch and blitter pipelining, there is 3-4 cycles available before blitter "really starts" after writing to BLTSIZE which allows CPU to execute following instruction, at least partially.
Toni Wilen is online now  
Old 22 September 2018, 18:46   #11
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Quote:
Originally Posted by mcgeezer View Post
I've done a little video showing what is going on.

The background bit planes are done in 3 blits, I set the colour to Red,Green or Blue prior to blitting each plane respecively.
The key is where the blitter wait is

If you do the wait before the blit (how indeed it must be done), then you actually change color (the yellow) practically immediately.

If you are doing the blitter wait after (the right way to take a timing) then you have the later effect, showing the actual time of the blit
(with BLTPRI=1 the internal bus is hogged by blitter so you cannot write on color register and the effect is the same as a blitter wait done after the blit).

As said before you still have the blitter active after the third blit

Last edited by ross; 22 September 2018 at 19:10.
ross is offline  
Old 22 September 2018, 19:02   #12
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Quote:
Originally Posted by Toni Wilen View Post
Also due to CPU prefetch and blitter pipelining, there is 3-4 cycles available before blitter "really starts"
Toni, so code like this with BLTPRI=1 can totally avoid a blitter-wait?:

(a6=custombase)
tst.w (a6)
tst.w (a6)
tst.w (a6)

move.w #something,blitter_reg(a6)
ross is offline  
Old 22 September 2018, 19:04   #13
Photon
Moderator
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Posts: 4,596
Yes, if BLTPRI is off, control will return to the CPU after a few cycles (and will then execute the instructions that measure the blit time prematurely).

Blitwaits must always precede a blit. For measurement purposes (or other result-use purposes such as collision detection), you should temporarily add a Blitwait after, as well.
Photon is offline  
Old 29 September 2018, 18:54   #14
LeCaravage
Registered User

LeCaravage's Avatar
 
Join Date: May 2017
Location: AmigaLand
Posts: 172
Just wondering, is it necessary to set the blitpri on ? I mean, if the blitter uses all 3 sources it will let very few cycles free for the cpu . The advantage of irq blitter is to simulate a "multitask" between blit operation and other calculations from the cpu.
The question will be obviously yes if the target was an Amiga 1200 with fastram or accelerator but not sure (I mean I don't know the answer) with a stock A1200.
LeCaravage is offline  
Old 29 September 2018, 23:45   #15
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 752
Quote:
Originally Posted by LeCaravage View Post
Just wondering, is it necessary to set the blitpri on ? I mean, if the blitter uses all 3 sources it will let very few cycles free for the cpu . The advantage of irq blitter is to simulate a "multitask" between blit operation and other calculations from the cpu.
The question will be obviously yes if the target was an Amiga 1200 with fastram or accelerator but not sure (I mean I don't know the answer) with a stock A1200.
It all depends. If your program can do useful work during the blit, it's actually almost always faster to run with BLTPRI off, even on an 68000 based machine and definitely on a 68020+ machine.

Now make no mistake, your blit will take longer if you do this (it'll suffer a 25% or so penalty). However, your overall system performance will likely go up, more so if your useful work includes expensive instructions such as multiplications/divisions (on 68000 shifts/rotates as well).

However, if you can't or can't easily do useful work during a blit and instead need to spend a large percentage of the blit waiting on it to finish, then keeping BLTPRI on is always the better option.

What I personally do is have my blitwait macro set BLTPRI to ON at the start, wait for the blit to finish regularly (because a fast processor might outrun this switch) and then after waiting switch it back off. Then, I only call the blitwait macro immediately prior to setting the actual blitter registers. This way - in my experiments anyway - performance tends to be highest.
roondar is offline  
Old 30 September 2018, 11:05   #16
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Quote:
Originally Posted by roondar View Post
What I personally do is have my blitwait macro set BLTPRI to ON at the start, wait for the blit to finish regularly (because a fast processor might outrun this switch) and then after waiting switch it back off. Then, I only call the blitwait macro immediately prior to setting the actual blitter registers. This way - in my experiments anyway - performance tends to be highest.
Can be interesting to couple with http://eab.abime.net/showpost.php?p=...5&postcount=12
Considering that disabling BLTPRI at end of macro is an internal bus access you can remove a tst.w and eventually have even more performance.
[only for blit that use all the cycles]
ross is offline  
Old 01 October 2018, 01:03   #17
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 752
Quote:
Originally Posted by ross View Post
Can be interesting to couple with http://eab.abime.net/showpost.php?p=...5&postcount=12
Considering that disabling BLTPRI at end of macro is an internal bus access you can remove a tst.w and eventually have even more performance.
[only for blit that use all the cycles]
That would indeed be interesting. I decided to test this macro as a replacement idea:

Code:
BlitWait    MACRO
            move.w    #$8400,dmacon(\1)
            tst.w     dmaconr(\1)
            move.w    #$0400,dmacon(\1)
            ENDM
My tests in WinUAE show that this actually seems to work fine (though only for all-cycles-used blits, as expected), even when I set the system to be a 68040 'fastest possible' with a bunch of fast memory. Didn't test it on real systems though, but it may be a good alternative for the standard way of blitter waiting that wastes quite a bit of time on a basic A500.
roondar is offline  
Old 15 January 2019, 20:17   #18
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
I reopen this thread because I have an interesting case study and maybe Toni could provide further explanations.

In a message he wrote:
Quote:
Originally Posted by Toni Wilen View Post
You still need to wait for blit even when running in chip ram with blitter nasty because not all channel mode combinations use all cycles. (for example D only, most fill modes, line draw)

Also due to CPU prefetch and blitter pipelining, there is 3-4 cycles available before blitter "really starts" after writing to BLTSIZE which allows CPU to execute following instruction, at least partially.
So my thinking was:
Quote:
Originally Posted by ross View Post
Toni, so code like this with BLTPRI=1 can totally avoid a blitter-wait?:

(a6=custombase)
tst.w (a6)
tst.w (a6)
tst.w (a6)

move.w #something,blitter_reg(a6)
My reasoning was: the first two tst.w cover 4 cycles before the blitter start and the third covers the hole I have before the second reading of source data.
Obviously in the case of using a blitter sequence that cover all the available cycles (for example ABCD, ABD, ACD, I dont consider the others since the BLTPRI would leave me with free cycles anyway).
Graphic example:
start blitter ->
- - A0 B0 C0 - A1 B1 C1 D0 A2 B2 C2 D1 D2

with tst.w (as X):
start blitter ->
X X A0 B0 C0 X A1 B1 C1 D0 A2 B2 C2 D1 D2

So no wait blit needed.

Then roondar wrote:
Quote:
Originally Posted by roondar View Post
That would indeed be interesting. I decided to test this macro as a replacement idea:

Code:
BlitWait    MACRO
            move.w    #$8400,dmacon(\1)
            tst.w     dmaconr(\1)
            move.w    #$0400,dmacon(\1)
            ENDM
My tests in WinUAE show that this actually seems to work fine (though only for all-cycles-used blits, as expected), even when I set the system to be a 68040 'fastest possible' with a bunch of fast memory. Didn't test it on real systems though, but it may be a good alternative for the standard way of blitter waiting that wastes quite a bit of time on a basic A500.
At the beginning I was puzzled because it lacks an access to cover all the free cycles, but I thought I had maybe misunderstood what Toni meant for <<there is 3-4 cycles available before blitter "really starts">>.

Then today a surprise. I've a very tight blitter routine with ABD channels active and with this blitter wait:
Code:
    move.w    #$8400,DMACON(A6)
    tst.w     (a6)
    move.w    #$0400,DMACON(A6)
    move.l    a0,BLTBPTH(a6)
this routine every now and then can fail! (i've garbage on destination due to B channel pointing to wrong data..)


If I use:
Code:
    move.w    #$8400,DMACON(A6)
    tst.w     (a6)
    tst.w     (a6)
    move.w    #$0400,DMACON(A6)
    move.l    a0,BLTBPTH(a6)
it never fail, so maybe my first thinking was right.

I'm curious to understand why to roondar instead that kind of blitter wait always worked.
Any particular situation (his or mine) that give different results?

I'm on latest WinUAE 4.1.0, 030 custom x16 freq. (but this frequency is not significative, the bottleneck for instructions that require few cycles is only the internal bus access), all CPU caches active, CE options active, no Wait for or Immediate Blitter (so at maximum real machine compatibility for an expanded emulated machine).
ross is offline  
Old 15 January 2019, 21:27   #19
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 752
The fault is entirely mine, I worded my reply poorly.

I tested only a few combinations (AD & ABCD) to be exact. These use all available cycles even at the start and they work. It obviously wasn’t clear I only meant this particular use case and hadn’t tested all options.
roondar is offline  
Old 15 January 2019, 21:44   #20
ross
Sum, ergo Cogito

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,457
Hi roondar, mine was not a criticism for you
I just want to understand what is the best way for this particular blithog/blitwait "combo".
This:
Code:
    move.w    #$8400,DMACON(A6)
    tst.w     (a6)
    tst.w     (a6)
    move.w    #$0400,DMACON(A6)
seems to ALWAYS work (and speed-up the result when we have intensive blitter usage!) but I would like to be sure.
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Blitter interrupt during VERTB interrupt phx Coders. Asm / Hardware 32 03 December 2017 12:34
Request: Chaos Engine 2 graphics rips LordNipple request.Other 7 20 June 2015 21:40
[WinUAE Request] Level 7 NMI Interrupt Boltar request.UAE Wishlist 2 26 December 2014 20:32
[Request] - Chaos Engine AGA Zetr0 project.Sprites 9 04 November 2008 00:32
Thomas the Tank Engine II *working* LordIvo support.Games 5 13 December 2007 11:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 22:59.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.10118 seconds with 15 queries