English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)
-   -   A500 BlitHog behavior details? (https://eab.abime.net/showthread.php?t=111733)

Jobbo 27 August 2022 22:52

A500 BlitHog behavior details?
 
To keep this on track I'll start by saying that I'm only interested in standard A500/68000/OCS/no-fast-ram behavior, having killed the OS and when hitting the hardware directly.

So, some recent posts have mentioned how the blitter in hog mode can lock out the cpu which could of course lead to problems.

However, most of what I've read leaves me with the impression that we should treat blithog as if it'll always block the cpu and should therefore be avoided at all costs or used only in very selective circumstances.

That isn't what I've experienced!

I have test code running 6 low-res bitplanes and I'm using the blitter for clears, fills and line draws. At no time is the blitter completely blocking the cpu.

I will say I'm using WinUAE and assuming it's accurate in cycle exact mode!

I feel like the some posts are therefore misleading. But I don't fully understand the details well enough myself. I've checked over the HRM and it doesn't seem to have the kind of detail I'm after, it's fairly vague about blithog.

I get that it's possible to lock out the cpu with enough dma activity in high-res or extreme copper/blitter/bitplane activity but it doesn't seem enough in the cases I'm running.

It also seems to me that a bunch of dma slots are available when the bitplanes aren't active during horizonal blanks. So, even with the blitter and copper going full steam there'll be some free slots for the cpu each line.

Anyway, as I admit, I don't know all the details but it seems as if most people don't either. And some are then saying to avoid blithog more that necessary.

Can anyone break down exactly how things work and why in my cases the cpu can make progress?

It'd also be good to understand exactly what circumstances are requires to truly and completely block the cpu?

It seems quite possible there's something I'm missing but I did run my code with blithog off and the blits are definitely a little slower. So I assume I'm using it correctly?

Speculation and guess work aren't going to be helpful so I'd really like to hear from anyone who really does know the details?

a/b 27 August 2022 23:29

A500 (68000) + chip mem (+ slow mem) + nasty on = cpu gets no access to bus until blitter is done.
For example: 5 bpls, nasty on, fx is doing 80 blits all started with a cpu without a single wait, no problems (both WinUAE and real hardware).

ross 27 August 2022 23:48

It all depends on the sequence of cycles of the blitter (which depends on the activated channels).
There are many combinations (even to do the same thing) that leave idle cycles that the CPU can safely use:
http://amiga.nvg.org/amiga/reference.../node0127.html

So Blithog isn't really a big deal for the kind of use you have to make of it.
Furthermore, if the blits you need to perform are not large and you know exactly where in the frame they occur, even a temporary block of the IRQs (or rather a time shift for execution) is not so dramatic.

Use it with caution, but use it.
But it is not always the best solution, especially if in the meantime you want to use the CPU for 'complex' calculations.
Many users activate it before Blitwait, in order to maximize the speed of the Blitter when the CPU have to do nothing but.. wait.

Jobbo 27 August 2022 23:53

Quote:

Originally Posted by a/b (Post 1561671)
A500 (68000) + chip mem (+ slow mem) + nasty on = cpu gets no access to bus until blitter is done.
For example: 5 bpls, nasty on, fx is doing 80 blits all started with a cpu without a single wait, no problems (both WinUAE and real hardware).

That's not my experience at all!

Edit: Maybe this is true iff the Blit is using all DMA channels?

Jobbo 28 August 2022 00:01

Quote:

Originally Posted by ross (Post 1561676)
It all depends on the sequence of cycles of the blitter (which depends on the activated channels).
There are many combinations (even to do the same thing) that le....

Thanks Ross.

I did read that page of the HRM and that's kind of what I understood but it's hard to see exactly what case you're getting at any time.

For my code I'm not CPU bound but I do want to run work in parallel. I am Blitter bound so I turn on Blithog and it speeds things up a little while slowing the CPU work slightly.

So, I'm left feeling that I'm not missing anything, it's just that other people are posting inaccurate information for some reason.

ross 28 August 2022 00:05

Quote:

Originally Posted by Jobbo (Post 1561678)
That's not my experience at all!

a/b was lucky here :p

Can be only by chance cause the type of blitter op., or the registers modified, or the line position where blit was started, or ..a/b knows what it does (;)) by accessing memory elsewhere; but a similar situation may exist.
Surely better not to tempt fate.. or hope that no one uses accelerators :D

EDIT:
I elaborate a little better for this case.

The blitter wait must be put because modifying the blitter registers while the operation is in progress has possibly heavy effects on the result (corruptions and possible crashes).
Neglecting the bug present in the first Agnus, a single wait is sufficient to avoid any problem (with the CPU, not with the Copper!).
But what actually happens if you don't wait and have the Blithog active and a combination of the blitter that uses all the cycles available on the bus?
Where can the danger come from? From the initial cycles (after the BLTSIZE write) in which the Blitter has not yet started at full the read/write accesses in memory.
You can see it from the link I posted (beware.. there are also cycles free at the end of blit.. before writing the last word, this is due to internal pipelining).
There are also a couple of idle cycles at start which are not present in the table, you can see it with a logic analyzer..

But usually 68k at 7 MHz is too slow to be able to do damage in those few cycles (but if you want it is possible :)).

ross 28 August 2022 00:15

Quote:

Originally Posted by Jobbo (Post 1561680)
I am Blitter bound so I turn on Blithog and it speeds things up a little while slowing the CPU work slightly.

Probably you have a combination with idle cycles.
This combo give the situation that you described.

Quote:

..it's just that other people are posting inaccurate information for some reason.
shhh, do not kick the hornet's nest!

;)

a/b 28 August 2022 04:20

Uhhh, ok. My blits in that particular case are 3-4 channels. Now if we are talking blit clear and similar affairs (only 1 channel used), yeah then there will be lots of unavoidable idle cycles. I simply didn't think about that case earlier, but it's common knowledge so that's that. In that particular case you will have to use blit waits.
Other than that, I wrote one more fx that used nasty but in the end I ended up converting it to copper blits because that saved me some cycles. Blits were using 2-3 channels and height was well over 1024 so the code was 2 cpu writes in a row to $58, no problem.

Now, if you are having a different experience, what kind of blits are you doing? Ctrl0, size, what happens after a write to size?

I also remember seeing stuff in demos, over the years, when I was poking around to see how the not-so-obvious fx work. Here is a couple of examples from a demo I've seen on real hardware a bunch of times back in the 90's (emulators make it so easy nowadays ><). 2-3 active bitplanes, not much pressure on the cpu...

This is a bunch of 8x8 bobs: 2-3 channels, blits are 1024 words, writes a different ctrl0 right afterwards, no problem.
Code:

        move.l        #$39260002,$40(a6)
...
        move.w        #$0001,$58(a6)
        move.l        #$0de49000,$40(a6)
...
        move.w        #$0001,$58(a6)
        move.l        #$0de49000,$40(a6)
...
        move.w        #$0001,$58(a6)
        move.l        #$0bfe0000,$40(a6)
...

Some kind of a SMC dot plotter: blitter does x/y to bset conversion. 2 channels, blits are 256+ words, writes a different ctrl0 or a different channel ptr right afterwards, no problem.
Code:

        move.l        #$69f00000,$40(a6)
...
        move.w        #$c001,$58(a6)
        move.l        #$39f00002,$40(a6)
...
        move.w        #$4001,$58(a6)
        move.l        #$39260002,$40(a6)
...
        move.w        #$5541,$58(a6)
        move.l        #$000483d0,$50(a6)
...
        move.w        #$5541,$58(a6)
        move.l        #$000483d0,$4c(a6)
...
        move.w        #$aac1,$58(a6)

With blits size of 256+ words, there's plenty of room to get at least one instruction through, and if that does happen the blits would be corrupt, but everything is working fine.
Just my practical experience, but curious why it doesn't work in your case. Would rather be safe than lucky...

Jobbo 28 August 2022 05:50

My blit in one example is a fill of 192x192x4 pixels which will use channels A and D.

I have blithog enabled and see that...

- Even when displaying 6 bitplanes there is still plenty of CPU activity during the blit.
- If I mess with my fill and turn on the C channel then the CPU still does not get locked out.
- But if I instead turn on the B channel then it does get locked out even in the vblank.

So, I guess my take-away is that it's not as clear cut as some seem to suggest and it's good to check. There are more times when you can run the blitter at full speed and still make CPU progress than some of the conventional wisdom would lead you to believe.

Block copies for tiles, for example seem like a good case where you might as well turn on blithog. Otherwise you could be throwing away blitter performance, depending on what the CPU is up to.

Edit: I got C and B mixed up in my tests so swap them around in the above.

Thomas Richter 28 August 2022 08:37

Certainly, that's what I was saying before - The blitter will in many cases not saturate the chip memory bus, depending on which channels are enabled. The consequence of this is that a fast CPU can outrun the blitter because with careful programming the CPU can utilize the full bandwidth, not just a part of it as the blitter.

ross 28 August 2022 09:27

Quote:

Originally Posted by a/b (Post 1561695)
Uhhh, ok. My blits in that particular case are 3-4
move.w #$0001,$58(a6)
move.l #$0de49000,$40(a6)

This is exactly what I was talking about.
Notice that source data in memory is explicit and offset also (so it need to be fetched from chip/slow memory).
This saturate the bus betwhen BLTSIZE write and blitter *real* start, so a blit wait can be avoided.

Quote:

Originally Posted by Jobbo (Post 1561697)
My blit in one example is a fill of 192x192x4 pixels which will use channels A and D...
..

All cases that concerns the active channels and the blitter cycle diagram.
There are special cases for fill mode:
https://eab.abime.net/showpost.php?p=939593&postcount=1
WinUAE is fully accurate about it.


Quote:

Originally Posted by Thomas Richter (Post 1561701)
The consequence of this is that a fast CPU can outrun the blitter because with careful programming the CPU can utilize the full bandwidth, not just a part of it as the blitter.

You keep reiterating this concept which is simply wrong.
CPU can never saturate internal/chipmem bus (no matter how fast it is). The Blitter can (enabling adeguate channels).
The problem with non blitter-wait and fast processors is for the reasons I explained.

ross 28 August 2022 09:41

@a/b: ah! another important thing! Changing the blitter registers while it is running does not always lead to corruption, some registers can be written without 'big' damage, simply the new value is used from that moment on.
Provided of course there are idle cycles that can be used during the run, or that Blithog is disabled.
But you really really REALLY need to know what you do.
I happened to do it...

malko 28 August 2022 09:52

Quote:

Originally Posted by ross (Post 1561706)
Quote:

Originally Posted by thomas richter (Post 1561701)
[...] The consequence of this is that a fast cpu can outrun the blitter because with careful programming the cpu can utilize the full bandwidth, not just a part of it as the blitter.

You keep reiterating this concept which is simply wrong.
Cpu can never saturate internal/chipmem bus (no matter how fast it is). The blitter can (enabling adeguate channels).
The problem with non blitter-wait and fast processors is for the reasons i explained.

@Thomas: what about to provide a simple code sample (as possible) to demontrate
what you are saying ?

ross 28 August 2022 10:14

Quote:

Originally Posted by malko (Post 1561708)
@Thomas: what about to provide a simple code sample (as possible) to demontrate
what you are saying ?

The example code would be trivial.

Since the internal bus is at 3.5 MHz and 16 bit, it should be able to read (or write) at 7MB/sec. Unfortunately, the maximum it can reach with *any* CPU is 3.5MB.
For 32bit buses (like on the A3000 or AGA) it should be able to reach around 14MB/sec and unfortunately it will not exceed 7MB..
Equally simple is the test to check if the blitter saturates the bus: a simple AD copy of data, blithog active, easily reaches 3.5MB sec (therefore 7MB/sec, read+write).
Unfortunately as the blitter is 16 bit, same speed on 32bit buses.

This is also easily visible on clear memory tests, where the max speed is achieved by interleaving Blitter and CPU, where the CPU use all cycles it can (half of the possible) and Blitter the remaining (half of the possible because of solo D channel and idle cycles).

roondar 28 August 2022 10:42

Regarding the fact there's been some discussion about the cyle use of the Blitter and whether or not it can saturate the bus, I think it's useful to look at the HRM and draw our conclusions from there. There's a table (found here online: http://amigadev.elowar.com/read/ADCD.../node0127.html) on page 182 of my version of the HRM. This outlines most but not all situations. Essentially, it covers the 16 normal cases possible when selecting active channels, but excludes fill mode (and if I read it correctly, it also doesn't talk about line mode).

Ross has already pointed out a post by Toni which talks about the extra variants, which is useful to read.

Taking these diagrams, we can construct a simple table showing the amount of idle cylce the blit will have during the main part of the blit. That is to say, the first and last words output of any blit can have more idle cycles than the rest of a blit. Considering that most blits run for far more than 3 words*, these special cases are usually only relevant for the need to consider Blitter waits - not so much for whether or not the Blitter saturates the bus. Hence, these first and last words can be ignored for this question.

Here then is such a table, which notes the number of idle cycles per word of 'output':
Code:

Channels active  Idle cycles  Saturates bus?
  none  (0)        n/a          n/a
      D  (1)        1 per word  no
    C  (1)        1 per word  no
    CD  (2)        1 per word  no
    B    (1)        2 per word  no
    B D  (2)        1 per word  no
    BC  (2)        1 per word  no
    BCD  (3)        1 per word  no
  A    (1)        1 per word  no
  A  D  (2)        none        yes
  A C  (2)        none        yes
  A CD  (3)        none        yes
  AB    (2)        1 per word  no
  AB D  (3)        none        yes
  ABC  (3)        none        yes
  ABCD  (4)        none        yes

This table shows there are many combinations of channels that do not saturate the bus. However: as can also be seen in the table above, excluding the D channel only blit, there's at least one channel combination for each of the blits (i.e. 1, 2 or 3 sources) involving the D channel that does saturate the bus. There's also at least one channel combination for each of the blits involving 2 or more of channels A,B or C, but no D channel that also saturates the bus.

Now, fill and line mode are special cases. If I understand the fill mode correctly, this adds one extra idle cycle per word output on top of what the channel combination used would normally have and so will never saturate the bus. Line mode I'm not so certain about, but IIRC that also has one idle cycle involved.

In conclusion: it's technically true that there are quite a few channel combinations that don't saturate the bus and in the case of special operations (fill/line) you can't ever saturate the bus. However, assuming no fill mode or line mode, in every case there seems to be at least one channel combination available that achieves the same thing as a channel combination that can't manage to saturate the bus, which does saturate the bus. In my view then, the Blitter is perfectly capable of saturating the Chip Memory bus in most (but not all) real world scenarios. The idea that it can't is simply not correct.

*) Example: copying a simple 16x16 tile in 4 bitplanes involves 64 words to be blit, of which only 2 are the special cases - the rest aren't. This is a very small blit, and it already has over 96% of the words in the non-special case situation.

ross 28 August 2022 11:06

Good summary post roondar!

It would be nice to have also a complete table of fill and line cases.
They are all data that I probably have scattered in my file mess, too complicated to collect :)

Thomas Richter 28 August 2022 11:33

Quote:

Originally Posted by ross (Post 1561706)
CPU can never saturate internal/chipmem bus (no matter how fast it is). The Blitter can (enabling adeguate channels).

Apparently, not. See the OP.

chb 28 August 2022 11:37

Quote:

Originally Posted by ross (Post 1561719)
Good summary post roondar!

I agree, very handy table, thanks roondar.

Quote:

Originally Posted by ross (Post 1561719)
It would be nice to have also a complete table of fill and line cases.
They are all data that I probably have scattered in my file mess, too complicated to collect :)

AFAIK (from that slightly error riddled HRM errata and the thread linked above), fill mode adds one idle cycle for all cases that include D. Line mode seems to have one idle cycle per output word and one cycle where the blitter does nothing, but which is not available to the CPU (one in four mem access slots is an idle cycle).

EDIT: For fill mode, the errata covers only the cases A->D, B->D, AB->D and D only, but does not mention ABC->D, AC->D or C->D

roondar 28 August 2022 11:44

Quote:

Originally Posted by ross (Post 1561719)
Good summary post roondar!

It would be nice to have also a complete table of fill and line cases.
They are all data that I probably have scattered in my file mess, too complicated to collect

I wasn't entirely sure about the number of idle cycles for fills & lines (I just know these modes have idle cyles, not how many in what scenario), so I left them out.
Quote:

Originally Posted by Thomas Richter (Post 1561723)
Apparently, not. See the OP.

This is misleading, the Blitter definitely can saturate the Chip Memory bus. See the HRM and the table I provided based on it. The CPU definitely can't.
The OP is using fill mode, which is a special case where the Blitter indeed can't, but this is not indicative of the general case.

ross 28 August 2022 11:53

Quote:

Originally Posted by Thomas Richter (Post 1561723)
Apparently, not. See the OP.

Everything has been explained in the continuation of the messages.
The OP uses an operation (fill) which combined with the active channels (AD) leaves idle cycles not used by the blitter, even if Blithog active.
So all this has nothing to do with which processor and what speed it accesses the bus, there will always be idle/cpu free cycles, which if used to write on the blitter registers or on the memory cells involved in the operation will create corruption.
The OP also specified that by activating C channel the Blithog leaves the CPU out of the bus: this would do it for any processor, at any speed, attempting access!
But it would be counterproductive to activate a channel for nothing, so better use those free cycles for something else or use a blitter-wait.


All times are GMT +2. The time now is 18:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.05181 seconds with 11 queries