Use of MOVE16 - Page 3

Karlos · 03 January 2024, 11:44

Quote:

Originally Posted by Thomas Richter

Except that the application using MOVE16 is not fully under control on these contexts. A MOVE16 from CPU-local memory to CPU-local memory is safe provided both source and destination are fully mapped and not in an invalid page, or if the page descriptor of the invalid page does not map to an address that matches an address in the cache. So, how exactly can one easily test this condition?.

So for the purposes of this discussion, this would be based on prior knowledge, e.g. an operation within an application, rather than having to detect anything.

Don_Adan · 03 January 2024, 11:58

Perhaps it can be related to Amiga turbo boards too:

This is a turbocard for for Amiga A3000 and A4000(T) with up to 128MB SDRAM, an 68060CPU running at 50/75/100MHz even a 68040 on 25MHz.
The RAM is in the Amiga CPU-ram space and therefore autodetect.
The RAM can handle move16-bursts and DMA from the Amiga

Anyway I dont hear about FastATA problems with move16 command. Except some faulty 68040 in first version of FastATA.

Thomas Richter · 03 January 2024, 12:14

Quote:

Originally Posted by Don_Adan

Perhaps it can be related to Amiga turbo boards too:

*Sigh* Again, there is an erratum in some turbo boards that can cause issues with MOVE16. However, assuming that you have a board that is not affected, there are other CPU related issues with MOVE16 that are unrelated to the board. But look, everything has been already said in the thread.

Don_Adan · 03 January 2024, 13:23

Perhaps You right. Seems that Apollo turbo boards are better than Blizzard turbo boards for handling move16:

http://www.elbox.com/tests/fastata_speed_pl.html

Karlos · 03 January 2024, 13:37

My motivation for move16 in the old days was to speed up pixel conversions between fast ram and vram on my BVision. That was connected over minipci and may or may not have had issues.

The strategy was to allocate a cache aligned few rows on the stack as a workspace, use move16 to copy from the source bitmap to the workspace, manipulate it there, then ship it off to vram using move16 again. The hope was that this would avoid thrashing the cache with all the source pixels so that only the workspace needed to be hot. This was back in the 3.1/3.5 CGX4 days, so there's probably much better ways of doing it now.

On the whole, this worked, but I do remember glitches in some of the conversions that would affect a short span of pixels at random. In hindsight though, this could have been my stack workspace being insufficiently protected during a context switch (I was relying on lock bitmap tags, but I wasn't using forbid/disable).

No.3 · 03 January 2024, 20:34

Hmmm...

I try to summarize:

there are 11 kind of people/opinions:

01: don't use Move16, it may work, but very likely it will cause problems

10: Move16 works, only special combinations of extension cards, memory types, CPU revision may cause problems

11: don't know, tested some Move16 cases for themselves and these worked

Quote:

Originally Posted by Karlos

Are there any well-defined contexts in which move16 is safe, e.g. local (for an accelerator) fast memory?

I think this is the crucial question (and I do not understand Thomas answer to it).

From the 68040 errata:

Quote:

4.) If a MOVE16 instruction has both source & destination addresses
hitting in the same copyback mode cache line (effectively a cache line
push), the source is dirty in the cache line, and the access is write-
protected, then the dirty cached data may be lost.

5.) MOVE16 (Ax)+,(Ay)+ where Ax=Ay is functionally the same as MOVE16
(Ax),(Ay)+. The address register only gets incremented once and the
line is copied over itself instead of copied into the next line.

12.) (MC68040 & MC68LC040 only) MOVE16 write accesses to a memory page
marked invalid may improperly invalidate a dirty cache line. To avoid
this case set the physical address field in all invalid MMU
descriptors to a physical page which is NEVER mapped in the system. A
MOVE16 write fault will never find a matching line in the cache to
(incorrectly) invalidate.

5.) is no problem

4.) and 12.) =

68060 errata (Rev 4.0 - 10/18/96):

Move16 is not mentioned ?

Karlos · 03 January 2024, 23:44

I think the crux of Thor's argument is that regardless of whether or not move16 is safe in some circumstances is not enough because the application calling it isn't necessarily in control of all the factors that can cause a failure. Secondly, it's not just whether or not the chip has problems with either the source or destination but the behaviour of legacy Amiga buses and glue logic back to the CPU under the conditions of a move16 transfer. I had some issues with it back my pixsl bashing days but as I say, in hindsight it could have been improperly protecting my cache aligned area of stack. If have to find the old sources, which likely isn't happening.

So there's an intersection of a fairly wide number of issues, certain combinations of which are not possible to properly anticipate for any generic memory copy routine.

Point 12 from the errata sounds bad. This is what I think Thor was describing when he said it can cause problems at a distance. Notice it says may improperly invalidate a dirty cacheine. That's pretty open ended depending on how you read it. Can it invalidate cache lines that are nothing to do with your expected move?

AestheticDebris · 04 January 2024, 00:55

Quote:

Originally Posted by Karlos

Notice it says may improperly invalidate a dirty cacheine. That's pretty open ended depending on how you read it. Can it invalidate cache lines that are nothing to do with your expected move?

Yes, I believe so. Which seems a pretty good reason to avoid it to me, unless you're doing something where speed is critical and crashing the whole system isn't a big deal.

Thomas Richter · 04 January 2024, 01:00

That pretty much summarizes it. On Motorola's errata sheet, 12 is the cruncher because an application does not have full control on the side conditions to avoid the issue, and yes, it can invalidate cache lines that are in no "obvious" relation to the move (i.e. neither the source nor the destination).

Leaving this aside, there is a second issue Motorola of course cannot document in the errata sheet because it's an erratum at the turbo-board to zorro bridge level, and there it is not under full control of the application either because not all boards are affected.

The issue is that MOVE16 bursts, regardless of whether caches are turned on or off. Normally, for the 68040 onwards, bursts only happen with caches on, and otherwise not, but MOVE16 is an exception. Unfortunately, Zorro does not document bursts, and the logic on the board somehow has to prevent them, but it seems that this does not always work correctly, for some boards.

So, there are at least two issues, not one.

SpeedGeek · 04 January 2024, 15:31

Quote:

Originally Posted by Karlos

I think the crux of Thor's argument is that regardless of whether or not move16 is safe in some circumstances is not enough because the application calling it isn't necessarily in control of all the factors that can cause a failure. Secondly, it's not just whether or not the chip has problems with either the source or destination but the behaviour of legacy Amiga buses and glue logic back to the CPU under the conditions of a move16 transfer. I had some issues with it back my pixsl bashing days but as I say, in hindsight it could have been improperly protecting my cache aligned area of stack. If have to find the old sources, which likely isn't happening.

So there's an intersection of a fairly wide number of issues, certain combinations of which are not possible to properly anticipate for any generic memory copy routine.

Point 12 from the errata sounds bad. This is what I think Thor was describing when he said it can cause problems at a distance. Notice it says may improperly invalidate a dirty cacheine. That's pretty open ended depending on how you read it. Can it invalidate cache lines that are nothing to do with your expected move?

What he failed to explain was that it was not necessary to mark any address space as invalid with the MMU. This was an optional choice Mike Sinz made with the Commodore 68040.library and the 3rd party CPU library developers basically inherited this choice.

Note: Doobrey's DummyCDstrap option for 1MB Kickstart ROM's will correct this for the $E00000 address space.

He also failed to mention that Move.l and MoveM will also cause an MMU fault when accessing invalid pages. The real difference for Move16 is that it can also cause a loss of data by invalidating a cache line.

So in one case your system crashes with the possibility of recovery and the other case without it. The moral here is that it's really better to avoid the crash in the first place and let the system achieve it's maximum performance potential.

You can play it safe and avoid Move16 completely or you can accept a little risk and avoid a 50-60% performance penalty on large block moves in Copyback memory. So what is your choice?

Karlos · 04 January 2024, 16:07

Quote:

You can play it safe and avoid Move16 completely or you can accept a little risk and avoid a 50-60% performance penalty on large block moves in Copyback memory. So what is your choice?

Well, if you are designing a system component of any kind, especially something as critical as a commonly used library or a driver, you avoid it. Simple.

Knowingly introducing a bug that may randomly crash someone's system and risk data loss in the process is a terrible thing to do.

I'd still consider it for an application with locally rolled routines that can benefit but using them versus a slower bur safer fallback should be something the end user can decide on.

Thomas Richter · 04 January 2024, 17:04

Quote:

Originally Posted by SpeedGeek

He also failed to mention that Move.l and MoveM will also cause an MMU fault when accessing invalid pages. The real difference for Move16 is that it can also cause a loss of data by invalidating a cache line.

Having an invalid mapped page does not mean that something is wrong in the system, or that the system crashes. Invalid mapped pages are actually quite common on systems with virtual memory. In such cases, the operating system interferes, swaps memory, provides the page and the CPU continues, this is what exception handling is good for, and this is what the mmulib provides.

That is also possible on the amiga, though probably less popular for virtual memory. I used the technique for a couple of graphics board drivers, for example the RetinaZ2 card which only supports fragmented memory. There, with the use of the MMU and invalid pages, you can actually provide a linear frame buffer transparent to the using application, which is a nice thing to have. Whenever the MMU accesses a page for which the fragment is currently not visible, an exception is raised, the exception handler interferes and reconfigures the memory fragments of the NCR chip, and continues. The result is speed-wise quite ok, you do not notice the exceptions and have an illusion of a linear frame buffer.

Quote:

Originally Posted by SpeedGeek

You can play it safe and avoid Move16 completely or you can accept a little risk and avoid a 50-60% performance penalty on large block moves in Copyback memory. So what is your choice?

"A little risk" is not a design paradigm for system programming. Either you know your algorithm works, or you don't. Nothing is more "irritating" to users than a system that works "mostly" - except in case you want to save something important to disk, for example.

Karlos · 04 January 2024, 17:08

Quote:

Originally Posted by Karlos

I'd still consider it for an application with locally rolled routines that can benefit but using them versus a slower bur safer fallback should be something the end user can decide on.

A case in point being the RTG mode RAM > VRAM transfer for TKG

SpeedGeek · 04 January 2024, 18:16

Quote:

Originally Posted by Karlos

Well, if you are designing a system component of any kind, especially something as critical as a commonly used library or a driver, you avoid it. Simple.

Knowingly introducing a bug that may randomly crash someone's system and risk data loss in the process is a terrible thing to do.

I'd still consider it for an application with locally rolled routines that can benefit but using them versus a slower bur safer fallback should be something the end user can decide on.

Using Move16 properly is not "Knowingly Introducing" a bug, rather it's the improper use of Move16, or the MMU, or certain "Broken by Design" accelerator cards which introduce the bug.

Thomas Richter · 04 January 2024, 18:31

So you say that using "invalid pages" is "improper use of the MMU"? I guess any system programmer would disagree heavily because that is one of the central functionalities of the MMU. Also, nobody is going to "unbreak" existing Turbo-Boards. We are in Amiga-land and have to live with what we have. MOVE16 is in the same category as TAS or CAS: Zorro neither supports RWM-instructions, and thus such instructions should not be used either.

No.3 · 04 January 2024, 18:37

Quote:

Originally Posted by Thomas Richter

So, there are at least two issues, not one.

and how about the 060 ? In the 68060 errata Move16 is not mentioned?

SpeedGeek · 04 January 2024, 21:01

Quote:

Originally Posted by Thomas Richter

So you say that using "invalid pages" is "improper use of the MMU"? I guess any system programmer would disagree heavily because that is one of the central functionalities of the MMU. Also, nobody is going to "unbreak" existing Turbo-Boards. We are in Amiga-land and have to live with what we have. MOVE16 is in the same category as TAS or CAS: Zorro neither supports RWM-instructions, and thus such instructions should not be used either.

What system programmers should agree on, is that based on the Motorola errata that the combination of Move16 and MMU invalid pages introduces a potential compatibility problem. Now, exactly how to proceed to resolve the problem is what they probably won't ever agree on. On the matter of the "Unbreaking" of turbo boards, the few of these boards which actually exist are not worth fixing. It is better to spend your money on a product which is newer and correctly designed.

Karlos · 04 January 2024, 21:06

Quote:

Originally Posted by SpeedGeek

What system programmers should agree on, is that based on the Motorola errata that the combination of Move16 and MMU invalid pages introduces a potential compatibility problem. Now, exactly how to proceed to resolve the problem is what they probably won't ever agree on.

System programmers, at least any worth the title, will just avoid it if they know it's a problem. Playing roulette with instructions that are potentially crash inducing is best up to application developers.

Thomas Richter · 04 January 2024, 21:46

Quote:

Originally Posted by SpeedGeek

On the matter of the "Unbreaking" of turbo boards, the few of these boards which actually exist are not worth fixing. It is better to spend your money on a product which is newer and correctly designed.

Right. On a retro system. Right. Instead of avoiding just software using instable instructions. Right... Which 060 based turbo board would you recommend for my A2000 exactly?

Bruce Abbott · 05 January 2024, 01:00

Quote:

Originally Posted by Karlos

Playing roulette with instructions that are potentially crash inducing is best up to application developers.

It should be treated like any other hardware issue, either with the OS hiding it from the application so it doesn't have to worry about it, or at least telling the app what it can safely do. An application should be designed to be as inclusive as possible, without requiring the user to make decisions about what the hardware is (maybe) capable of.

We are a small community now, and fracturing it further by developing code that only works on specific hardware just fractures it more. It's bad enough that many of us can't run some interesting applications because we don't have the right OS or CPU, but limiting it to particular revisions of certain hardware combinations is far worse - unless your aim is to deliberately limit the audience for your application - which is just mean.

Quote:

Originally Posted by Thomas Richter

Right. On a retro system. Right. Instead of avoiding just software using instable instructions. Right...

Yes. The Amiga is a retro platform, so we should stick to the retro rules.

Rule #1: It shall be done as was proscribed back-in-the-day. Commodore said MOVE16 was a no-no so it should remain so. Developers of new hardware shouldn't have to worry about supporting it.

Rule #2: rules were made to be broken. Gould couldn't stop us doing whatever we liked with our hardware, and that still applies today!

As we are painfully aware, #2 caused no end of trouble. All those coders thinking they were so hot, when they were really just lazy. Real programmers figure out how to get stuff done within the rules. This is even more important today with the increasing variety of hardware combinations people have.

03 January 2024, 23:44	#47
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,499	I think the crux of Thor's argument is that regardless of whether or not move16 is safe in some circumstances is not enough because the application calling it isn't necessarily in control of all the factors that can cause a failure. Secondly, it's not just whether or not the chip has problems with either the source or destination but the behaviour of legacy Amiga buses and glue logic back to the CPU under the conditions of a move16 transfer. I had some issues with it back my pixsl bashing days but as I say, in hindsight it could have been improperly protecting my cache aligned area of stack. If have to find the old sources, which likely isn't happening. So there's an intersection of a fairly wide number of issues, certain combinations of which are not possible to properly anticipate for any generic memory copy routine. Point 12 from the errata sounds bad. This is what I think Thor was describing when he said it can cause problems at a distance. Notice it says may improperly invalidate a dirty cacheine. That's pretty open ended depending on how you read it. Can it invalidate cache lines that are nothing to do with your expected move? Last edited by Karlos; 04 January 2024 at 00:07.

03 January 2024, 11:58	#42
Don_Adan Registered User Join Date: Jan 2008 Location: Warsaw/Poland Age: 56 Posts: 2,049	Perhaps it can be related to Amiga turbo boards too: This is a turbocard for for Amiga A3000 and A4000(T) with up to 128MB SDRAM, an 68060CPU running at 50/75/100MHz even a 68040 on 25MHz. The RAM is in the Amiga CPU-ram space and therefore autodetect. The RAM can handle move16-bursts and DMA from the Amiga Anyway I dont hear about FastATA problems with move16 command. Except some faulty 68040 in first version of FastATA.

03 January 2024, 13:23	#44
Don_Adan Registered User Join Date: Jan 2008 Location: Warsaw/Poland Age: 56 Posts: 2,049	Perhaps You right. Seems that Apollo turbo boards are better than Blizzard turbo boards for handling move16: http://www.elbox.com/tests/fastata_speed_pl.html

03 January 2024, 13:37	#45
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,499	My motivation for move16 in the old days was to speed up pixel conversions between fast ram and vram on my BVision. That was connected over minipci and may or may not have had issues. The strategy was to allocate a cache aligned few rows on the stack as a workspace, use move16 to copy from the source bitmap to the workspace, manipulate it there, then ship it off to vram using move16 again. The hope was that this would avoid thrashing the cache with all the source pixels so that only the workspace needed to be hot. This was back in the 3.1/3.5 CGX4 days, so there's probably much better ways of doing it now. On the whole, this worked, but I do remember glitches in some of the conversions that would affect a short span of pixels at random. In hindsight though, this could have been my stack workspace being insufficiently protected during a context switch (I was relying on lock bitmap tags, but I wasn't using forbid/disable).

04 January 2024, 01:00	#49
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,319	That pretty much summarizes it. On Motorola's errata sheet, 12 is the cruncher because an application does not have full control on the side conditions to avoid the issue, and yes, it can invalidate cache lines that are in no "obvious" relation to the move (i.e. neither the source nor the destination). Leaving this aside, there is a second issue Motorola of course cannot document in the errata sheet because it's an erratum at the turbo-board to zorro bridge level, and there it is not under full control of the application either because not all boards are affected. The issue is that MOVE16 bursts, regardless of whether caches are turned on or off. Normally, for the 68040 onwards, bursts only happen with caches on, and otherwise not, but MOVE16 is an exception. Unfortunately, Zorro does not document bursts, and the logic on the board somehow has to prevent them, but it seems that this does not always work correctly, for some boards. So, there are at least two issues, not one.

04 January 2024, 18:31	#55
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,319	So you say that using "invalid pages" is "improper use of the MMU"? I guess any system programmer would disagree heavily because that is one of the central functionalities of the MMU. Also, nobody is going to "unbreak" existing Turbo-Boards. We are in Amiga-land and have to live with what we have. MOVE16 is in the same category as TAS or CAS: Zorro neither supports RWM-instructions, and thus such instructions should not be used either.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)