05 February 2023, 05:31 | #21 |
Registered User
Join Date: Mar 2017
Location: Rhode Island / United States
Posts: 203
|
Late to the topic, but I thought it important to add some personal testing evidence along with my own experience from the GVP Tech side.
Any accelerator - with onboard memory - which is designed to support a burst memory access by the 68040 or the 68060 - meaning a synchronous bus access where a 2-1-1-1, 3-1-1-1 or possibly 4-1-1-1 clocked access/response might happen - is never a concern and should always work. Marginal memory components might contribute to problems on a given card, and the hardware burst-inhibit option (if available) should resolve a specific issue by translating the bus access into a standard 2-2-2-2 or 3-3-3-3 or whatever timing of stacked 4x, but standard, 32-bit access. This burst-inhibit is what most accelerator-motherboard interface glue does for the A3000/A4000 - sans those few which might be designed for it, and might try to burst-access to the RAMSEY 32-bit memory, and then only if RAMSEY is in a burst-capable access mode. That is a mode which is moot on the A4000 (w/DRAM type only), and is quite rare on the A3000 as those with both SCRAM installed and being functional is equally rare - most negate the mode due to earlier RAMSEY-04 bugs, and a need for a matching RAMSEY-07/SDMAC-04, of which the latter fixed version is unobtainable. Those timing numbers offered are only symbolic, and not intended to match what actual DRAM controllers, or modern SRAM on these cards, might translate into at their given clock. I merely point out the benefit of a real hardware burst, vs what happens when the hardware doesn't support it (or is in a negate setting). What is now a well tested issue - found on GVP TekMagic/T-Rex A2000/A4000 both the 040/060 versions of the cards, the earlier GVP G-Force 040 cards for the A2000 and A3000, the Commodore A3640 and even a modernized 3660 (with or without the -2 wait state GALs), along with several P5 accelerator cards (I have an unmodified BPPC/060, and a client's A2060), and Thomas has other similar pieces for both the A2000 and the A3000/A4000 - is that quad-longword burst access, translated natively down to 4x 32-bit accesses, but pushed against the 16-bit memory busses on the A2000 or similar in the A3000/A4000 Buster space (w/A2091 RAM, GVP HC8 DPRC memory, GVP HC2 standard DRAM memory, and other generic 16-bit FastRAM Z2 memory used as targets) - an access which causes each 32-bit longword to again be heavily wait-stated (for 7MHz) and halved into double 16-bit data transfers, has been well proven on all of these 040/060 accelerators to have a >high potential< to eventually hang the system bus - REGARDLESS OF AMIGA SYSTEM or CPU CARD. The solution to this was to negate the CPU caching on this very slow memory bus area, preventing the access-stacking. The CPU then backs off the 4-longword converted burst from the cache, and does slower 32-bit longword transactions with additional access gaps - created by the natural code retrieval and execution process. It no longer slams the accelerator-motherboard glue logic as heavily, which then lowers the rate which the even slower target bus is hit. This is where the more recent (in the last 2+ years) adjustments to MuLibs and 16-bit RAM cache setting changes came from. We took the time to prove it on multiple popular pieces of hardware and platforms. I used libraries from MuLibs, Ralph Babel, P5, and C=, and used the MuSetCacheMode tool to alter the MMU/Cache settings on the target Z2 memory spaces after a default setup. I have also found times when I can get the BigRAM+ Z3 memory cards, and the zz9900 256M FastRAM, to hang against the faster 68060 cards - also with the cache/burst setting on. Their typical benchmark speeds with the cache on max out @ 2-3 times the Z2 memory speeds, but they do vary oddly at times, and sometimes hang the system like the 16-bit boards. Stability is excellent and benchmarks never vary when I turn off the data cache on them. Speedgeek always raises the issue of benchmarking memory/bus performance values with the cache on. This is a real use-case situation where the cache being turned off has real-world and useful implications. In all the tests, Copyback or WriteThrough mode didn't matter. Translating this bus-access behavior into the use of MOVE16 - and using this instruction multiple times against these slower memory bus targets - ignores the cache setting (above) that helps prevent the slower bus hang issue. It again drives the same problem of multiple stacked 4x 32-bit longword hits against a much slower 7MHz 16-bit bus. On the accelerator card memory, to and from that target, it should be safe. On any other Amiga bus, conservative care should be taken. PERIOD. Quoting Michael Sinz on the specific topic of Move16: "On the Amiga, MOVE16 is not supported 100%. " Source: http://www.sinz.org/Michael.Sinz/Enf...nforcer.c.html That should be enough reason to write software conservatively, and only implement the more potent performance options where it will be 100% successful. Last edited by thebajaguy; 05 February 2023 at 06:11. |
05 February 2023, 18:00 | #22 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
|
Hmm, I though was a GVP specific hardware issue? But I guess I will have to do more extensive Move16 testing on my A3660 with Zorro2 bus Fast RAM. I previously did some testing and found no problems.
Of course, Move16 is not supported with Enforcer because it relies on the MMU for debugging. Using an invalid address with Move16 could cause an MMU fault which Enforcer can not recover from. But exceptional cases are no good reason to sacrifice the performance benefits of Move16 for the standard cases. P.S. Enforcer could be updated to better handle MMU faults, so it's not fair to put all of the blame on Move16. Last edited by SpeedGeek; 05 February 2023 at 18:13. |
05 February 2023, 23:48 | #23 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,242
|
I've definitely encountered issues with move16 on my BlizzPPC 68040. I wrote a bunch of conversion routines for on the fly RTG pixel format conversion. Most of these worked by transferring successive lines from the source to an aligned buffer on the stack, which is then locally manipulated and then moved as lines to the eventual destination. So, as a simple example, converting 32-bit ARGB to RGB555 would read a pair of cache lines for every one written to the destination. The intent was to keep the caches clear as possible, only the local working area would be cached.
What I observed is occasional glitches in the output that could not be reproduced when replacing the move16 operations for the equivalent set of four move.l operations. The glitches were not observed on an Apollo 68040 with mediator/voodoo. I did once read that some 68040 had dodgy move16 bugs. Maybe this was one of them. I expected better from Phase5 though, lol |
06 February 2023, 05:51 | #24 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,248
|
No, at least my B2060 is also affected. I do not have a full list of problematic boards, but at least for the known boards, P5Init and/or GVPInit disable caching in the Z2 area.
Quote:
|
|
06 February 2023, 15:50 | #25 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
|
Quote:
https://www.nxp.com/docs/en/errata/MC68040DE_D.txt But I think the OEM evaluation XC mask set users will always have the "Heat" errata as their primary motivation to upgrade. |
|
06 February 2023, 16:07 | #26 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
|
Quote:
If there are vendor specific hardware bugs (which can actually be confirmed) then that's only a reason to stop using Move16 for that particular accelerator card. As of right now, the so called "Instability Problem" or "Occasional Lockups" is hard to reproduce, and could be caused by numerous other faults. I hoped someone with a logic analyzer and data logging software could provide some conclusive evidence on the matter. Unfortunately, that has not yet happened. |
|
06 February 2023, 17:03 | #27 | |||
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,248
|
Quote:
Quote:
Quote:
|
|||
06 February 2023, 22:37 | #28 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 2,006
|
Perhaps MOVE16 on Amiga is the often using by FastATA driver. If I remember right info after first FastATA was produced, then some older 68040 was faulty and has problems with FastATA driver. Later FastATA driver was updated and has several copy routines with and without MOVE16 command. I dont know if FastATA driver autodetect faulty 68040 or this is configurable by user.
|
07 February 2023, 14:29 | #29 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
|
Quote:
Thebajaguy (just like you) initially failed to realize Burst cycles are not possible on the Zorro2 bus. The TBI logic signal forces the attempted Burst cycles to be performed as 4 sequential (but separate) longword transfers. This results in 8 sequential (but separate) word transfers on the Zorro2 bus. He now somehow concludes that this longer cycle time is causing problems. He again failed to realize that the Zorro2 bus does not have a maximum cycle time specification. That a movem.l (Ax)+,d0-d3 would do exactly the same thing on the Zorro2 bus. In fact, the common SAS C full register stack pushes and pulls would move a much larger number of longwords (2x words) on the Zorro2 bus. Also, the original exec copymem() and copymemquick() functions will movem.l more longwords (2x words) on the Zorro2 bus. Again, this is not a very convincing argument or explanation of the so called "Instability" and "Ocassional Lockup" problems. What would be convincing is a logic analyzer result showing a bug in the dynamic bus sizing logic. As far as the crashing the system issue, if 3% of Amiga users have to do some extra work to avoid crashing their system that's fine by me. What is not fine by me, is telling 97% of Amiga users you must accept this huge performance loss so the 3% can save some extra work. Last edited by SpeedGeek; 08 February 2023 at 12:33. |
|
07 February 2023, 14:44 | #30 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
|
Quote:
*It may be possible to do a Move16 errata test to determine if the 68040 was faulty but the reliability of such tests are not always guaranteed. Last edited by SpeedGeek; 16 March 2023 at 15:30. |
|
06 March 2023, 00:57 | #31 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
|
Okay, I now have some Move16 test results from my A3000 A3660 system with A2091 Zorro2 Fast RAM (See the images below).
- The CMP image shows the A2091 memory at the highest priority. - The MMU image shows the A2091 memory in Copyback mode. - The Move16 image is the updated benchmark tool previously posted on this thread. - The LHA image shows a test of a large LHA file. LHA extensively calls Copymem() which was patched with CMQ&B040. So now, we can see if Move16 is causing any data corruption problems. After extensive testing (approx. 1 hour), I find no instability, no lockups, no crashing and no data corruption problems. |
02 January 2024, 22:37 | #32 | |
Registered User
Join Date: Sep 2022
Location: Switzerland
Posts: 119
|
You, SpeedGeek, directed to this thread
Quote:
[SPOILER] my use case would be Move16 (A0)+,(A1)+ and A0 and A1 are guarenteed to be longword aligned and I could live if with it if it would be Fast-Mem only. [/SPOILER] |
|
02 January 2024, 23:13 | #33 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
|
Quote:
Last edited by SpeedGeek; 02 January 2024 at 23:55. |
|
02 January 2024, 23:38 | #34 | |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,014
|
Quote:
Simply run an extensive test that repeats the MOVE16 over and over again and then check that the moved data is as expected should surely be enough? Obviously a faulty 040 will have a discrepancy in the copied data, and then you would default to the 4 move.l insttead. |
|
03 January 2024, 00:43 | #35 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,248
|
Depending on what you expect. Easy in the sense of "is there a 68040 in the system". Then yes, it's faulty, because it goes through all revisions. Some issues have not been fixed at all, and the issue is not that obvious as in "run the instruction and observe the wrong result". Please find details in the 68040 errata sheet. The issue, if you would have followed the entire thread, is neither the 68040 as such, but the overall integration into the system, in particular how the bridge from the accelerator board to zorro behaves. That is exactly *not* how it works. The trouble is not corruption of the copied data, but corruption of *some other data* that is still in the cache, but that has been invalidated instead of having written it back, and the issue is "the system hangs", not that the data is corrupt.
|
03 January 2024, 00:55 | #36 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,242
|
Are there any well-defined contexts in which move16 is safe, e.g. local (for an accelerator) fast memory?
|
03 January 2024, 00:56 | #37 | |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,014
|
Quote:
If the 040 doesn't act randomly then if its the data in the cache it can still be tested over and over. And it doesn't matter which 040 revision is present, so long as the data you're testing is correct over an extended period that would usually cause the problem, then you can determine which move you use. If your contention is that there is no pattern or logic to what data is compromised, then what the fuck were Motorola doing releasing it like that? And you need to make up your mind, either the 040 is the cause of the problem or it isn't, which is it? I can do sarcastic pedantry as well, but I take it less from others, maybe you should use the advent of 2024 to be less so. It's not endearing! |
|
03 January 2024, 02:42 | #38 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 2,006
|
Quote:
|
|
03 January 2024, 08:56 | #39 | |
Registered User
Join Date: May 2023
Location: Norwich
Posts: 415
|
Quote:
Personally I'd say there's enough anecdotal evidence it isn't entirely reliable and Commodore used to advise against it, so probably not worth the hassle. YMMV. |
|
03 January 2024, 11:40 | #40 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,248
|
Quote:
Again, there is *more than one* issue with MOVE16. There are a couple of errata in the 68040 and 68060, and they are specified, and there are *more errata* in some combinations of CPU turbo boards with bursting and Zorro transfer that *also* affect MOVE16. But really, anything has been said on this in the thread. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
|
|