Use of MOVE16 - Page 2

thebajaguy · 05 February 2023, 05:31

Late to the topic, but I thought it important to add some personal testing evidence along with my own experience from the GVP Tech side.

Any accelerator - with onboard memory - which is designed to support a burst memory access by the 68040 or the 68060 - meaning a synchronous bus access where a 2-1-1-1, 3-1-1-1 or possibly 4-1-1-1 clocked access/response might happen - is never a concern and should always work. Marginal memory components might contribute to problems on a given card, and the hardware burst-inhibit option (if available) should resolve a specific issue by translating the bus access into a standard 2-2-2-2 or 3-3-3-3 or whatever timing of stacked 4x, but standard, 32-bit access. This burst-inhibit is what most accelerator-motherboard interface glue does for the A3000/A4000 - sans those few which might be designed for it, and might try to burst-access to the RAMSEY 32-bit memory, and then only if RAMSEY is in a burst-capable access mode. That is a mode which is moot on the A4000 (w/DRAM type only), and is quite rare on the A3000 as those with both SCRAM installed and being functional is equally rare - most negate the mode due to earlier RAMSEY-04 bugs, and a need for a matching RAMSEY-07/SDMAC-04, of which the latter fixed version is unobtainable.

Those timing numbers offered are only symbolic, and not intended to match what actual DRAM controllers, or modern SRAM on these cards, might translate into at their given clock. I merely point out the benefit of a real hardware burst, vs what happens when the hardware doesn't support it (or is in a negate setting).

What is now a well tested issue - found on GVP TekMagic/T-Rex A2000/A4000 both the 040/060 versions of the cards, the earlier GVP G-Force 040 cards for the A2000 and A3000, the Commodore A3640 and even a modernized 3660 (with or without the -2 wait state GALs), along with several P5 accelerator cards (I have an unmodified BPPC/060, and a client's A2060), and Thomas has other similar pieces for both the A2000 and the A3000/A4000 - is that quad-longword burst access, translated natively down to 4x 32-bit accesses, but pushed against the 16-bit memory busses on the A2000 or similar in the A3000/A4000 Buster space (w/A2091 RAM, GVP HC8 DPRC memory, GVP HC2 standard DRAM memory, and other generic 16-bit FastRAM Z2 memory used as targets) - an access which causes each 32-bit longword to again be heavily wait-stated (for 7MHz) and halved into double 16-bit data transfers, has been well proven on all of these 040/060 accelerators to have a >high potential< to eventually hang the system bus - REGARDLESS OF AMIGA SYSTEM or CPU CARD. The solution to this was to negate the CPU caching on this very slow memory bus area, preventing the access-stacking. The CPU then backs off the 4-longword converted burst from the cache, and does slower 32-bit longword transactions with additional access gaps - created by the natural code retrieval and execution process. It no longer slams the accelerator-motherboard glue logic as heavily, which then lowers the rate which the even slower target bus is hit.

This is where the more recent (in the last 2+ years) adjustments to MuLibs and 16-bit RAM cache setting changes came from. We took the time to prove it on multiple popular pieces of hardware and platforms. I used libraries from MuLibs, Ralph Babel, P5, and C=, and used the MuSetCacheMode tool to alter the MMU/Cache settings on the target Z2 memory spaces after a default setup.

I have also found times when I can get the BigRAM+ Z3 memory cards, and the zz9900 256M FastRAM, to hang against the faster 68060 cards - also with the cache/burst setting on. Their typical benchmark speeds with the cache on max out @ 2-3 times the Z2 memory speeds, but they do vary oddly at times, and sometimes hang the system like the 16-bit boards. Stability is excellent and benchmarks never vary when I turn off the data cache on them.

Speedgeek always raises the issue of benchmarking memory/bus performance values with the cache on. This is a real use-case situation where the cache being turned off has real-world and useful implications.

In all the tests, Copyback or WriteThrough mode didn't matter.

Translating this bus-access behavior into the use of MOVE16 - and using this instruction multiple times against these slower memory bus targets - ignores the cache setting (above) that helps prevent the slower bus hang issue. It again drives the same problem of multiple stacked 4x 32-bit longword hits against a much slower 7MHz 16-bit bus.

On the accelerator card memory, to and from that target, it should be safe. On any other Amiga bus, conservative care should be taken. PERIOD.

Quoting Michael Sinz on the specific topic of Move16:

"On the Amiga, MOVE16 is not supported 100%. "

Source: http://www.sinz.org/Michael.Sinz/Enf...nforcer.c.html

That should be enough reason to write software conservatively, and only implement the more potent performance options where it will be 100% successful.

SpeedGeek · 05 February 2023, 18:00

Hmm, I though was a GVP specific hardware issue? But I guess I will have to do more extensive Move16 testing on my A3660 with Zorro2 bus Fast RAM. I previously did some testing and found no problems.

Of course, Move16 is not supported with Enforcer because it relies on the MMU for debugging. Using an invalid address with Move16 could cause an MMU fault which Enforcer can not recover from. But exceptional cases are no good reason to sacrifice the performance benefits of Move16 for the standard cases.

P.S.
Enforcer could be updated to better handle MMU faults, so it's not fair to put all of the blame on Move16.

Karlos · 05 February 2023, 23:48

I've definitely encountered issues with move16 on my BlizzPPC 68040. I wrote a bunch of conversion routines for on the fly RTG pixel format conversion. Most of these worked by transferring successive lines from the source to an aligned buffer on the stack, which is then locally manipulated and then moved as lines to the eventual destination. So, as a simple example, converting 32-bit ARGB to RGB555 would read a pair of cache lines for every one written to the destination. The intent was to keep the caches clear as possible, only the local working area would be cached.

What I observed is occasional glitches in the output that could not be reproduced when replacing the move16 operations for the equivalent set of four move.l operations. The glitches were not observed on an Apollo 68040 with mediator/voodoo.

I did once read that some 68040 had dodgy move16 bugs. Maybe this was one of them. I expected better from Phase5 though, lol

Thomas Richter · 06 February 2023, 05:51

Quote:

Originally Posted by SpeedGeek

Hmm, I though was a GVP specific hardware issue?

No, at least my B2060 is also affected. I do not have a full list of problematic boards, but at least for the known boards, P5Init and/or GVPInit disable caching in the Z2 area.

Quote:

Originally Posted by SpeedGeek

Of course, Move16 is not supported with Enforcer because it relies on the MMU for debugging.

That is unrelated. Enforcer is spelled MuForce today, and the exception handler in the mmu.library does support Move16 access faults. *However*, and that is just another issue, all versions of the 68040 and 68060 have an erratum concerning Move16, namely that they take the MSBs of an invalid page descriptor as physical tag for a cache line, and invalidate the affected cache line no matter whether the actual descriptor type is invalid or not. Thus, should the invalid descriptor contain by pure chance the physical address of another existing cache line in its MSBs, Move16 will invalidate the cache, i.e. modified cache contents will not be written back, and will be lost. Thus, Move16 can cause silent data corruption. Now, while MuForce itself places zeros in the MSBs of the invalid descriptors and thus redirects the access to the (invalid anyhow) zero page, a program is free to put anything it wants there, for example the sector number of where a swapped out page lies, and may thus cause real havoc. Note that MuForce is only one user of the mmu.lib service, and not "exclusively" subscribed to it. Other existing users are MuEVD/Shapeshifter video and the Retina P96 driver.

Quote:

Originally Posted by SpeedGeek

Enforcer could be updated to better handle MMU faults, so it's not fair to put all of the blame on Move16.

The blame is to put on Motorola by selling CPU with known issues, in particular related to Move16, and board vendors by not taking "irregular" bursts into account. Probably the extra logic was not worth the trouble, who knows. The hardware is not well-equipped to handle Move16 gracefully as it is really an exceptional case. Move16 into local 32 bit memory close to the CPU works, all provided you can ensure that you never move from or into an invalid page. But those are many "ifs".

SpeedGeek · 06 February 2023, 15:50

Quote:

Originally Posted by Karlos

I've definitely encountered issues with move16 on my BlizzPPC 68040. I wrote a bunch of conversion routines for on the fly RTG pixel format conversion. Most of these worked by transferring successive lines from the source to an aligned buffer on the stack, which is then locally manipulated and then moved as lines to the eventual destination. So, as a simple example, converting 32-bit ARGB to RGB555 would read a pair of cache lines for every one written to the destination. The intent was to keep the caches clear as possible, only the local working area would be cached.

What I observed is occasional glitches in the output that could not be reproduced when replacing the move16 operations for the equivalent set of four move.l operations. The glitches were not observed on an Apollo 68040 with mediator/voodoo.

I did once read that some 68040 had dodgy move16 bugs. Maybe this was one of them. I expected better from Phase5 though, lol

I've looked for the errata on early mask set 68040s and not had much luck finding it. All I found was the product change notice which states which errata are corrected with the MC qualified mask sets:

https://www.nxp.com/docs/en/errata/MC68040DE_D.txt

But I think the OEM evaluation XC mask set users will always have the "Heat" errata as their primary motivation to upgrade.

SpeedGeek · 06 February 2023, 16:07

Quote:

Originally Posted by Thomas Richter

No, at least my B2060 is also affected. I do not have a full list of problematic boards, but at least for the known boards, P5Init and/or GVPInit disable caching in the Z2 area. That is unrelated. Enforcer is spelled MuForce today, and the exception handler in the mmu.library does support Move16 access faults. *However*, and that is just another issue, all versions of the 68040 and 68060 have an erratum concerning Move16, namely that they take the MSBs of an invalid page descriptor as physical tag for a cache line, and invalidate the affected cache line no matter whether the actual descriptor type is invalid or not. Thus, should the invalid descriptor contain by pure chance the physical address of another existing cache line in its MSBs, Move16 will invalidate the cache, i.e. modified cache contents will not be written back, and will be lost. Thus, Move16 can cause silent data corruption. Now, while MuForce itself places zeros in the MSBs of the invalid descriptors and thus redirects the access to the (invalid anyhow) zero page, a program is free to put anything it wants there, for example the sector number of where a swapped out page lies, and may thus cause real havoc. Note that MuForce is only one user of the mmu.lib service, and not "exclusively" subscribed to it. Other existing users are MuEVD/Shapeshifter video and the Retina P96 driver.

The blame is to put on Motorola by selling CPU with known issues, in particular related to Move16, and board vendors by not taking "irregular" bursts into account. Probably the extra logic was not worth the trouble, who knows. The hardware is not well-equipped to handle Move16 gracefully as it is really an exceptional case. Move16 into local 32 bit memory close to the CPU works, all provided you can ensure that you never move from or into an invalid page. But those are many "ifs".

Motorola specified workarounds with the errata (and user manuals) for both the 68040 and 68060. Motorola did not tell users to stop using Move16.

If there are vendor specific hardware bugs (which can actually be confirmed) then that's only a reason to stop using Move16 for that particular accelerator card. As of right now, the so called "Instability Problem" or "Occasional Lockups" is hard to reproduce, and could be caused by numerous other faults.

I hoped someone with a logic analyzer and data logging software could provide some conclusive evidence on the matter. Unfortunately, that has not yet happened.

Thomas Richter · 06 February 2023, 17:03

Quote:

Originally Posted by SpeedGeek

Motorola specified workarounds with the errata (and user manuals) for both the 68040 and 68060. Motorola did not tell users to stop using Move16.

The "workaround" is to ensure that the MSBs of invalid descriptors does not correspond to a valid page. If you call that a "workaround" - how or if that can be ensured is nothing Motorola can help you with. The other workaround is simply not to use the instruction. That seems quite simple to implement.

Quote:

Originally Posted by SpeedGeek

If there are vendor specific hardware bugs (which can actually be confirmed) then that's only a reason to stop using Move16 for that particular accelerator card.

Do you have a list how all known accelerator boards behave in this respect? Because I don't. I know some affected boards, and on those boards I know the workaround for *regular accesses* is enabled. This requires a program author to check the board configuration for a database whether a specific instruction is stable in particular configurations. While I can address such pecularities at system level by vendor-specific kludges (aka "GVPInit", "P5Init" and "ACAInit") requiring something similar at application level that may or or may not MOVE16 depending on configuration seems to be asking a bit much. Do you really expect every user to test whether the board is affected or not?

Quote:

Originally Posted by SpeedGeek

As of right now, the so called "Instability Problem" or "Occasional Lockups" is hard to reproduce, and could be caused by numerous other faults.

Ah, but does that "other fault" really matter if I have a perfectly working workaround, namely simply to avoid the bursting to begin with?

Quote:

Originally Posted by SpeedGeek

I hoped someone with a logic analyzer and data logging software could provide some conclusive evidence on the matter. Unfortunately, that has not yet happened.

That would give additional clues at what exactly goes wrong and is as such of course interesting, but it helps you little how else to avoid the issue. There is already a workaround that does its job nicely, after all. Don't burst over Zorro. What really sets me up is your approach "oh well, if it doesn't work, let's crash the user". I consider this an unacceptable approach for system software. The system /must/ work robustly with the default setup, no matter what. If you want a fast system, the Amiga is not the right choice anyhow, and there are better approaches how to speed up the system without compromizing system stability.

Don_Adan · 06 February 2023, 22:37

Perhaps MOVE16 on Amiga is the often using by FastATA driver. If I remember right info after first FastATA was produced, then some older 68040 was faulty and has problems with FastATA driver. Later FastATA driver was updated and has several copy routines with and without MOVE16 command. I dont know if FastATA driver autodetect faulty 68040 or this is configurable by user.

SpeedGeek · 07 February 2023, 14:29

Quote:

Originally Posted by Thomas Richter

The "workaround" is to ensure that the MSBs of invalid descriptors does not correspond to a valid page. If you call that a "workaround" - how or if that can be ensured is nothing Motorola can help you with. The other workaround is simply not to use the instruction. That seems quite simple to implement. Do you have a list how all known accelerator boards behave in this respect? Because I don't. I know some affected boards, and on those boards I know the workaround for *regular accesses* is enabled. This requires a program author to check the board configuration for a database whether a specific instruction is stable in particular configurations. While I can address such pecularities at system level by vendor-specific kludges (aka "GVPInit", "P5Init" and "ACAInit") requiring something similar at application level that may or or may not MOVE16 depending on configuration seems to be asking a bit much. Do you really expect every user to test whether the board is affected or not? Ah, but does that "other fault" really matter if I have a perfectly working workaround, namely simply to avoid the bursting to begin with?

That would give additional clues at what exactly goes wrong and is as such of course interesting, but it helps you little how else to avoid the issue. There is already a workaround that does its job nicely, after all. Don't burst over Zorro. What really sets me up is your approach "oh well, if it doesn't work, let's crash the user". I consider this an unacceptable approach for system software. The system /must/ work robustly with the default setup, no matter what. If you want a fast system, the Amiga is not the right choice anyhow, and there are better approaches how to speed up the system without compromizing system stability.

No, I don't have a list of all the accelerator Boards which MIGHT be affected. I only know that my system (and many others) are not affected. I also know that those claiming to be affected don't have conclusive proof (from the Board manufacturer) or some valid hardware logic test results which make sense.

Thebajaguy (just like you) initially failed to realize Burst cycles are not possible on the Zorro2 bus. The TBI logic signal forces the attempted Burst cycles to be performed as 4 sequential (but separate) longword transfers. This results in 8 sequential (but separate) word transfers on the Zorro2 bus. He now somehow concludes that this longer cycle time is causing problems. He again failed to realize that the Zorro2 bus does not have a maximum cycle time specification.

That a movem.l (Ax)+,d0-d3 would do exactly the same thing on the Zorro2 bus. In fact, the common SAS C full register stack pushes and pulls would move a much larger number of longwords (2x words) on the Zorro2 bus. Also, the original exec copymem() and copymemquick() functions will movem.l more longwords (2x words) on the Zorro2 bus.

Again, this is not a very convincing argument or explanation of the so called "Instability" and "Ocassional Lockup" problems. What would be convincing is a logic analyzer result showing a bug in the dynamic bus sizing logic.

As far as the crashing the system issue, if 3% of Amiga users have to do some extra work to avoid crashing their system that's fine by me. What is not fine by me, is telling 97% of Amiga users you must accept this huge performance loss so the 3% can save some extra work.

SpeedGeek · 07 February 2023, 14:44

Quote:

Originally Posted by Don_Adan

Perhaps MOVE16 on Amiga is the often using by FastATA driver. If I remember right info after first FastATA was produced, then some older 68040 was faulty and has problems with FastATA driver. Later FastATA driver was updated and has several copy routines with and without MOVE16 command. I dont know if FastATA driver autodetect faulty 68040 or this is configurable by user.

There is no easy* way to "Autodetect" a faulty 68040. So it would likely be configurable by the user. But I am curious as to why the FastATA developers believed Move16 would offer them any significant performance advantage? The FastATA registers would be managed as Zorro3 I/O registers which are non-cache-able.

*It may be possible to do a Move16 errata test to determine if the 68040 was faulty but the reliability of such tests are not always guaranteed.

SpeedGeek · 06 March 2023, 00:57

Okay, I now have some Move16 test results from my A3000 A3660 system with A2091 Zorro2 Fast RAM (See the images below).

- The CMP image shows the A2091 memory at the highest priority.
- The MMU image shows the A2091 memory in Copyback mode.
- The Move16 image is the updated benchmark tool previously posted on this thread.
- The LHA image shows a test of a large LHA file. LHA extensively calls Copymem() which was patched with CMQ&B040. So now, we can see if Move16 is causing any data corruption problems.

After extensive testing (approx. 1 hour), I find no instability, no lockups, no crashing and no data corruption problems.

No.3 · 02 January 2024, 22:37

You, SpeedGeek, directed to this thread

Quote:

Originally Posted by SpeedGeek

There is already a MOVE16 discussion thread here:

https://eab.abime.net/showthread.php?t=102820

What is the point in repeating what's already been posted on that thread?

I read it and still do not know if I can or should not use Move16 for my own CopyMem.

[SPOILER]
my use case would be Move16 (A0)+,(A1)+ and A0 and A1 are guarenteed to be longword aligned and I could live if with it if it would be Fast-Mem only.
[/SPOILER]

SpeedGeek · 02 January 2024, 23:13

Quote:

Originally Posted by No.3

You, SpeedGeek, directed to this thread

I read it and still do not know if I can or should not use Move16 for my own CopyMem.

[SPOILER]
my use case would be Move16 (A0)+,(A1)+ and A0 and A1 are guarenteed to be longword aligned and I could live if with it if it would be Fast-Mem only.
[/SPOILER]

Did you not find the updated Move16 benchmark tool posted on this thread? You can also duplicate my test with CMQ&B040 and LHA files. Then assuming your system runs stable, you can proceed to develop your own Move16 code.

Galahad/FLT · 02 January 2024, 23:38

Quote:

Originally Posted by SpeedGeek

There is no easy* way to "Autodetect" a faulty 68040. So it would likely be configurable by the user. But I am curious as to why the FastATA developers believed Move16 would offer them any significant performance advantage? The FastATA registers would be managed as Zorro3 I/O registers which are non-cache-able.

*It may be possible to do a Move16 errata test to determine if the 68040 was faulty but the reliability of such tests are not always guaranteed.

I would think autodetecting a faulty 040 would be easy.

Simply run an extensive test that repeats the MOVE16 over and over again and then check that the moved data is as expected should surely be enough?

Obviously a faulty 040 will have a discrepancy in the copied data, and then you would default to the 4 move.l insttead.

Thomas Richter · 03 January 2024, 00:43

Quote:

Originally Posted by Galahad/FLT

I would think autodetecting a faulty 040 would be easy.

Depending on what you expect. Easy in the sense of "is there a 68040 in the system". Then yes, it's faulty, because it goes through all revisions. Some issues have not been fixed at all, and the issue is not that obvious as in "run the instruction and observe the wrong result". Please find details in the 68040 errata sheet. The issue, if you would have followed the entire thread, is neither the 68040 as such, but the overall integration into the system, in particular how the bridge from the accelerator board to zorro behaves.

Quote:

Originally Posted by Galahad/FLT

Obviously a faulty 040 will have a discrepancy in the copied data, and then you would default to the 4 move.l insttead.

That is exactly *not* how it works. The trouble is not corruption of the copied data, but corruption of *some other data* that is still in the cache, but that has been invalidated instead of having written it back, and the issue is "the system hangs", not that the data is corrupt.

Karlos · 03 January 2024, 00:55

Are there any well-defined contexts in which move16 is safe, e.g. local (for an accelerator) fast memory?

Galahad/FLT · 03 January 2024, 00:56

Quote:

Originally Posted by Thomas Richter

Depending on what you expect. Easy in the sense of "is there a 68040 in the system". Then yes, it's faulty, because it goes through all revisions. Some issues have not been fixed at all, and the issue is not that obvious as in "run the instruction and observe the wrong result". Please find details in the 68040 errata sheet. The issue, if you would have followed the entire thread, is neither the 68040 as such, but the overall integration into the system, in particular how the bridge from the accelerator board to zorro behaves. That is exactly *not* how it works. The trouble is not corruption of the copied data, but corruption of *some other data* that is still in the cache, but that has been invalidated instead of having written it back, and the issue is "the system hangs", not that the data is corrupt.

Right, but you're missing the point.

If the 040 doesn't act randomly then if its the data in the cache it can still be tested over and over.

And it doesn't matter which 040 revision is present, so long as the data you're testing is correct over an extended period that would usually cause the problem, then you can determine which move you use.

If your contention is that there is no pattern or logic to what data is compromised, then what the fuck were Motorola doing releasing it like that?

And you need to make up your mind, either the 040 is the cause of the problem or it isn't, which is it?

I can do sarcastic pedantry as well, but I take it less from others, maybe you should use the advent of 2024 to be less so.

It's not endearing!

Don_Adan · 03 January 2024, 02:42

Quote:

Originally Posted by SpeedGeek

There is no easy* way to "Autodetect" a faulty 68040. So it would likely be configurable by the user. But I am curious as to why the FastATA developers believed Move16 would offer them any significant performance advantage? The FastATA registers would be managed as Zorro3 I/O registers which are non-cache-able.

*It may be possible to do a Move16 errata test to determine if the 68040 was faulty but the reliability of such tests are not always guaranteed.

I think that You can dissasemble FastATA scsi.device to see how it works, I suspect it uses autodetection of faulty 68040. I dissasembled FastATA driver many years ago to see if I it uses something useful for standard scsi.device. But it dont uses useful for standard scsi.device code. From my memory FastATA has no problem using Move16 for 68040/68060 turbo boards.

AestheticDebris · 03 January 2024, 08:56

Quote:

Originally Posted by Galahad/FLT

And you need to make up your mind, either the 040 is the cause of the problem or it isn't, which is it?

It's not the 68040, per se, but the combination of the 68040 with the Amiga chipset (there are 68040 bugs which can make things worse, but even "fixed" ones may not play well with the Amiga). The problem with detection is that the specifics of how the Amiga memory architecture works means it's difficult to be sure which ranges of addresses might be affected, so even if you can construct such a test you can't really be sure unless you systematically try it everywhere.

Personally I'd say there's enough anecdotal evidence it isn't entirely reliable and Commodore used to advise against it, so probably not worth the hassle. YMMV.

Thomas Richter · 03 January 2024, 11:40

Quote:

Originally Posted by Karlos

Are there any well-defined contexts in which move16 is safe, e.g. local (for an accelerator) fast memory?

Except that the application using MOVE16 is not fully under control on these contexts. A MOVE16 from CPU-local memory to CPU-local memory is safe provided both source and destination are fully mapped and not in an invalid page, or if the page descriptor of the invalid page does not map to an address that matches an address in the cache. So, how exactly can one easily test this condition?

Again, there is *more than one* issue with MOVE16. There are a couple of errata in the 68040 and 68060, and they are specified, and there are *more errata* in some combinations of CPU turbo boards with bursting and Zorro transfer that *also* affect MOVE16.

But really, anything has been said on this in the thread.

05 February 2023, 05:31	#21
thebajaguy Registered User Join Date: Mar 2017 Location: Rhode Island / United States Posts: 203	Late to the topic, but I thought it important to add some personal testing evidence along with my own experience from the GVP Tech side. Any accelerator - with onboard memory - which is designed to support a burst memory access by the 68040 or the 68060 - meaning a synchronous bus access where a 2-1-1-1, 3-1-1-1 or possibly 4-1-1-1 clocked access/response might happen - is never a concern and should always work. Marginal memory components might contribute to problems on a given card, and the hardware burst-inhibit option (if available) should resolve a specific issue by translating the bus access into a standard 2-2-2-2 or 3-3-3-3 or whatever timing of stacked 4x, but standard, 32-bit access. This burst-inhibit is what most accelerator-motherboard interface glue does for the A3000/A4000 - sans those few which might be designed for it, and might try to burst-access to the RAMSEY 32-bit memory, and then only if RAMSEY is in a burst-capable access mode. That is a mode which is moot on the A4000 (w/DRAM type only), and is quite rare on the A3000 as those with both SCRAM installed and being functional is equally rare - most negate the mode due to earlier RAMSEY-04 bugs, and a need for a matching RAMSEY-07/SDMAC-04, of which the latter fixed version is unobtainable. Those timing numbers offered are only symbolic, and not intended to match what actual DRAM controllers, or modern SRAM on these cards, might translate into at their given clock. I merely point out the benefit of a real hardware burst, vs what happens when the hardware doesn't support it (or is in a negate setting). What is now a well tested issue - found on GVP TekMagic/T-Rex A2000/A4000 both the 040/060 versions of the cards, the earlier GVP G-Force 040 cards for the A2000 and A3000, the Commodore A3640 and even a modernized 3660 (with or without the -2 wait state GALs), along with several P5 accelerator cards (I have an unmodified BPPC/060, and a client's A2060), and Thomas has other similar pieces for both the A2000 and the A3000/A4000 - is that quad-longword burst access, translated natively down to 4x 32-bit accesses, but pushed against the 16-bit memory busses on the A2000 or similar in the A3000/A4000 Buster space (w/A2091 RAM, GVP HC8 DPRC memory, GVP HC2 standard DRAM memory, and other generic 16-bit FastRAM Z2 memory used as targets) - an access which causes each 32-bit longword to again be heavily wait-stated (for 7MHz) and halved into double 16-bit data transfers, has been well proven on all of these 040/060 accelerators to have a >high potential< to eventually hang the system bus - REGARDLESS OF AMIGA SYSTEM or CPU CARD. The solution to this was to negate the CPU caching on this very slow memory bus area, preventing the access-stacking. The CPU then backs off the 4-longword converted burst from the cache, and does slower 32-bit longword transactions with additional access gaps - created by the natural code retrieval and execution process. It no longer slams the accelerator-motherboard glue logic as heavily, which then lowers the rate which the even slower target bus is hit. This is where the more recent (in the last 2+ years) adjustments to MuLibs and 16-bit RAM cache setting changes came from. We took the time to prove it on multiple popular pieces of hardware and platforms. I used libraries from MuLibs, Ralph Babel, P5, and C=, and used the MuSetCacheMode tool to alter the MMU/Cache settings on the target Z2 memory spaces after a default setup. I have also found times when I can get the BigRAM+ Z3 memory cards, and the zz9900 256M FastRAM, to hang against the faster 68060 cards - also with the cache/burst setting on. Their typical benchmark speeds with the cache on max out @ 2-3 times the Z2 memory speeds, but they do vary oddly at times, and sometimes hang the system like the 16-bit boards. Stability is excellent and benchmarks never vary when I turn off the data cache on them. Speedgeek always raises the issue of benchmarking memory/bus performance values with the cache on. This is a real use-case situation where the cache being turned off has real-world and useful implications. In all the tests, Copyback or WriteThrough mode didn't matter. Translating this bus-access behavior into the use of MOVE16 - and using this instruction multiple times against these slower memory bus targets - ignores the cache setting (above) that helps prevent the slower bus hang issue. It again drives the same problem of multiple stacked 4x 32-bit longword hits against a much slower 7MHz 16-bit bus. On the accelerator card memory, to and from that target, it should be safe. On any other Amiga bus, conservative care should be taken. PERIOD. Quoting Michael Sinz on the specific topic of Move16: "On the Amiga, MOVE16 is not supported 100%. " Source: http://www.sinz.org/Michael.Sinz/Enf...nforcer.c.html That should be enough reason to write software conservatively, and only implement the more potent performance options where it will be 100% successful. Last edited by thebajaguy; 05 February 2023 at 06:11.

05 February 2023, 18:00	#22
SpeedGeek Moderator Join Date: Dec 2010 Location: Wisconsin USA Age: 60 Posts: 841	Hmm, I though was a GVP specific hardware issue? But I guess I will have to do more extensive Move16 testing on my A3660 with Zorro2 bus Fast RAM. I previously did some testing and found no problems. Of course, Move16 is not supported with Enforcer because it relies on the MMU for debugging. Using an invalid address with Move16 could cause an MMU fault which Enforcer can not recover from. But exceptional cases are no good reason to sacrifice the performance benefits of Move16 for the standard cases. P.S. Enforcer could be updated to better handle MMU faults, so it's not fair to put all of the blame on Move16. Last edited by SpeedGeek; 05 February 2023 at 18:13.

06 March 2023, 00:57	#31
SpeedGeek Moderator Join Date: Dec 2010 Location: Wisconsin USA Age: 60 Posts: 841	Okay, I now have some Move16 test results from my A3000 A3660 system with A2091 Zorro2 Fast RAM (See the images below). - The CMP image shows the A2091 memory at the highest priority. - The MMU image shows the A2091 memory in Copyback mode. - The Move16 image is the updated benchmark tool previously posted on this thread. - The LHA image shows a test of a large LHA file. LHA extensively calls Copymem() which was patched with CMQ&B040. So now, we can see if Move16 is causing any data corruption problems. After extensive testing (approx. 1 hour), I find no instability, no lockups, no crashing and no data corruption problems. Attached Thumbnails

05 February 2023, 23:48	#23
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,242	I've definitely encountered issues with move16 on my BlizzPPC 68040. I wrote a bunch of conversion routines for on the fly RTG pixel format conversion. Most of these worked by transferring successive lines from the source to an aligned buffer on the stack, which is then locally manipulated and then moved as lines to the eventual destination. So, as a simple example, converting 32-bit ARGB to RGB555 would read a pair of cache lines for every one written to the destination. The intent was to keep the caches clear as possible, only the local working area would be cached. What I observed is occasional glitches in the output that could not be reproduced when replacing the move16 operations for the equivalent set of four move.l operations. The glitches were not observed on an Apollo 68040 with mediator/voodoo. I did once read that some 68040 had dodgy move16 bugs. Maybe this was one of them. I expected better from Phase5 though, lol

06 February 2023, 22:37	#28
Don_Adan Registered User Join Date: Jan 2008 Location: Warsaw/Poland Age: 55 Posts: 2,006	Perhaps MOVE16 on Amiga is the often using by FastATA driver. If I remember right info after first FastATA was produced, then some older 68040 was faulty and has problems with FastATA driver. Later FastATA driver was updated and has several copy routines with and without MOVE16 command. I dont know if FastATA driver autodetect faulty 68040 or this is configurable by user.

03 January 2024, 00:55	#36
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,242	Are there any well-defined contexts in which move16 is safe, e.g. local (for an accelerator) fast memory?

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)