07 June 2020, 15:40 | #1 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Quote:
Now, since the 040 and 060 also optionally support burst they can and do function (with some hardware adaptations) on any classic Amiga 68K bus system. Hence, MOVE16 has absolutely no compatibility problem with the Amiga 68K bus. Thus, the principle concerns with MOVE16 "Safe Usage" are 1) The 16 byte source and destination alignment requirement and 2) The Motorola/Freescale documented Errata conditions. But this should be a comparatively simple problem to solve any for any Software developer making such extraordinary efforts as described above. ***************** Mod note: Some posts moved to a new thread from: http://eab.abime.net/showthread.php?t=102568 Last edited by lilalurl; 22 June 2020 at 21:39. |
|
09 June 2020, 00:59 | #2 |
Registered User
Join Date: Sep 2006
Location: New Sandusky
Posts: 942
|
I've never had any problem using MOVE16, ever. It either operates quickly on cached writes or stalls the CPU while the writes happen on uncached memory and at least saves instruction loads on word-length operations while it's doing it.
|
15 June 2020, 08:36 | #3 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
No, and no again. Once again: MOVE16 bursts always, even on non-cachable data, even on data that goes over the Zorro bus. Which means that there need to be logic on the board that disables bursting to go through Zorro. Which may or may not be the case - you cannot in general assume that the hardware fixes a software configuration issue for you.
|
15 June 2020, 17:38 | #4 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Quote:
ERROR (M68K PRM): "Line transfers are performed using burst reads and writes, which begin with the long word pointed to by the effective address of the source and destination, respectively. An address register used in the post increment addressing mode is incremented by 16 after the transfer." CORRECTION: "Line transfers are optionally performed using burst reads and writes, which begin with the long word pointed to by the effective address of the source and destination, respectively. An address register used in the post increment addressing mode is incremented by 16 after the transfer." REFERENCE (M68040 Users Manual): "5.4.6 Transfer Burst Inhibit (TBI) This input signal indicates to the processor that the accessed device cannot support burst mode accesses and that the requested line transfer should be divided into individual longword transfers. Asserting TBI with TA terminates the first data transfer of a line access, which causes the processor to terminate the burst and access the remaining data for the line as three successive long-word transfers. During alternate bus master accesses, the M68040 samples the TBI to detect completion of each bus transfer." BTW, my trusty Commodore A3640 has never run a burst cycle since burst is permanently disabled on this card. Last edited by SpeedGeek; 07 March 2023 at 18:00. Reason: typo correction |
|
15 June 2020, 18:30 | #5 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
|
|
15 June 2020, 20:24 | #6 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Quote:
Please specify examples rather than claiming such "Rare" Boards exist. Any 040 or 060 Board with missing /TBI logic is faulty, defective and non-functional. Such a board will fail in early exec when the instruction cache is enabled. The instruction cache also optionally uses burst for line transfers but the Kickstart ROM controllers Gary, Fat Gary and Gale do not support burst. |
|
15 June 2020, 20:39 | #7 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
Kickstart startup disables caching in the 1st 16MB, thus for all of Zorro-II devices, the kickstart itself, and chipmem, and enable caching for 32-bit RAM on board. Combine that with a custom 68040.library that keeps caching disabled in the Zorro-area, and the board boots and works fine. Expansion RAM in the Zorro-II region would remain non-cachable, and slow, but it is slow anyhow, and the board would use the 32-bit RAM on board outside the 24-bit area. Not that I haven't mentioned this before. MOVE16 is *not* a recommended option, same as TAS, CAS and CAS2. It *may* appear to work in many circumstances, but not reliably on all boards in all circumstances. |
|
16 June 2020, 00:29 | #8 |
Registered User
Join Date: Sep 2006
Location: New Sandusky
Posts: 942
|
Sounds like there'd be a market in fixing these boards to add burst inhibit. Just some GALs piggbacked on that assert the line when it snoops an address matching chip and zorro space.
|
19 June 2020, 16:55 | #9 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Quote:
Assuming such a Board actually ever did exist, it was probably a prototype or developer Board with only a small number of such Boards ever produced. exec only disables the cache temporarily (probably for polling the hardware) but the cache is enabled by boot time so this Board would need a special exec.library in addition to the special 68040.library to keep the cache disabled. Now, for you to go on telling 99.9% of the 040 and 060 users which can use MOVE16 quite reliably it is "Not Recommended" shows just how irrational you really are. You went from "Incompatible with the Bus" to an absurdly exceptional case to attempt to justify your claims. Very sad indeed... |
|
19 June 2020, 18:24 | #10 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
|
|
20 June 2020, 15:18 | #11 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Quote:
- 68020+ dependent Software - 68K FPU dependent Software - 68K MMU dependent Software - 68000 dependent Software - 7 MHz dependent Software - Chip RAM size dependent Software - Fast RAM size dependent Software - ECS chip set dependent Software - AGA chip set dependent Software - Amiga OS version dependent Software - 3rd party Hardware dependent Software (Including "Broken by Design" Software workarounds) etc., etc. and so on... |
|
20 June 2020, 21:11 | #12 | ||||
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
None of that can be applied easily to MOVE16 since you cannot so easily determine whether the instruction is robust. Both bad programming practise and certainly discouraged. To abstract from 68K idiocracies, CBM already added GetCC() to exec. To abstract from the 7Mhz clock, we have the timer.device, or - failing that - custom chip or CIA accesses with guaranteed timing. Quote:
Quote:
For that, we have version checks and a version information in libraries and devices. Otherwise, see "resource exhaustion". There is quite a difference if a program aborts with an error requester if the necessary library version is not available, or crashes due to a hardware conflict you cannot easily test for. Quote:
The problem with MOVE16 is that is not so easy to detect error situations (though you may). The "best effort" approach would be to include an internal dispatcher in the software to two implementations (with and without MOVE16), and include an upfront test to test whether it works reliable. I am not sure whether the latter can be made working reliably. I am not even clear whether there is much of a practical advantage for MOVE16 in realistic scenarious. |
||||
21 June 2020, 14:30 | #13 | |||
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Quote:
Since MOVE16 always uses the cache "WriteThrough" policy it's performance is never affected by the worst case of the Copyback mode. If you want to see this "Worst Case" performance try using the AIBB memory test on any 040+ system with the data cache enabled and then disabled. Quote:
Quote:
|
|||
21 June 2020, 15:34 | #14 | ||
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
Quote:
In my experience, there is nothing to be gained from MOVE16 except potential instabilities. The only case where you might gain something is if you move from CPU-local (32-bit) memory into itself, thus essentially CopyMemQuick(). The best approach here is, however, not to move data around in first place, i.e. use a better algorithm. If the data cannot be deposited in the right target buffer, for example due to DMA restrictions, or specialized hardware (graphics boards), then the advantage of MOVE16 diminishes anyhow as you run into the issue how to reach "this other data" over a Zorro II/III bus which does not support bursting. |
||
22 June 2020, 15:06 | #15 | |||
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Quote:
Quote:
BTW, burst support only affects memory transfer rates but does affect the specific performance benefits of the MOVE16 instruction itself. Quote:
I just explained two ways in which MOVE16 outperforms other instructions. I even explained an easy way for you to observe the worst case performance of the Copyback cache. So, if you still stubbornly refuse to get it, than any further discussion is pointless. I agree this is off topic but a moderator would have to create the new thread and move the posts. So you should probably PM a moderator for assistance. |
|||
22 June 2020, 18:41 | #16 | ||||
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
Quote:
I would really wish there would be a generic way how to supply a CPU library, and I am really trying hard to make the best possible approximation of such a thing, but the reality tells another story, much to my own displeasure. I wish it would be different, but it isn't. Quote:
The only situation where it would make a difference speedwise is if you would move from CPU-local 32-bit memory to itself. Unfortunately, the whole architecture of the Amiga makes it impossible to detect whether a particular piece of memory satisfies this requirement, and hence whether MOVE16 is beneficial. Worst, in most situations where you *would* want to move memory around, you want to move it from CPU-local memory over Zorro II/III or the chip mem Bus, and this is exactly the situation where MOVE16 has no benefits because the bus is saturated anyhow, and where it has risks, so you would want to avoid to use it. Thus, there is rarely any practical gain, except in benchmarks, just risks that cannot be controlled properly because the system does not provide any means to control them. Quote:
Just that my practical experiments showed that it does not. So why should I bother taking a risk if the net benefit is nil when I would need it? |
||||
23 June 2020, 10:07 | #17 |
Registered User
Join Date: Sep 2006
Location: New Sandusky
Posts: 942
|
My 2 cents:
Whenever I had to duplicate a chunk of memory for some reason (e.g. create a backup buffer of "original data" when I'm going to modify the original) as long as the data was bigger than the data cache, MOVE16 was always a win. It was also a (very small) win if I had to copy an image buffer into display memory. The two cards I own are an A3640 and a CSPPC/060. I never had any problems. C= and Phase 5 represent the #1 and #2 040/060 cards out on the market, probably the large majority. If a tool works well for the vast majority of what's out there, why not use it? The only thing that would make me think twice would be if any of the #3 cards (Apollo?) had a problem. It's a tool that definitely works well for what it's intended to do. If it causes problems on some very early 040 cards, well, then they can run a MOVE16-clean version of whatever is created. It would be interesting if a test program could be created to see if there's a problem on any of the very early cards from RCS, PPI, GVP, etc. I know that PPI included a jumper on the Mercury that sets the burst inhibit on all off-card addresses. |
23 June 2020, 10:49 | #18 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
Frankly, over the last 30 years of Amiga progamming, I had more than enough problems with incompetent third party patches, creating problems on *my* side as a developer causing failures of my software, and I would prefer if authors would be a bit more conservative with programs they provide to the public. In case of failure, it really troublesome and anyoing to hunt down somebody else's bugs. |
|
24 June 2020, 18:24 | #19 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 839
|
Aminet Move16 Benchmark update (SpeedGeek 2020)
The old Move16v2 Benchmark from Aminet was updated as follows: - Removed all NOPs from MOVE16 loop code (The author was probably using either a really old version of 68040.library or an early XC68040 mask set CPU if he believed these NOPs were actually needed). - Applied PatchFor020 (Yes, the SAS Crap compiler still generates standard 68000 maths even for 020+ compiled code) Here are some benchmark results from my A3000 + A3640 @ 40 MHz system: **FAST TO FAST** MoveMem = 1.5800 secs MoveMem16 = 1.0400 secs **Fast to CHIP** MoveMem = 2.7600 secs MoveMem16 = 2.0800 secs **CHIP to Fast** MoveMem = 3.2800 secs MoveMem16 = 2.1000 secs **CHIP to CHIP** MoveMem = 2.9200 secs MoveMem16 = 2.8800 secs NOTES: This Benchmark uses a large block size which means MOVE16 will always be faster than MOVE.L. Someone who is skilled with the SAS Crap compiler could recompile it with a smaller block size (e.g. < Data cache size) to demonstrate when MOVE.L may be faster. Enjoy! Last edited by SpeedGeek; 02 July 2020 at 13:08. |
25 June 2020, 13:11 | #20 | |
Registered User
Join Date: Sep 2006
Location: New Sandusky
Posts: 942
|
Quote:
Look, I get where you're coming from. There may well be some very early 68040 cards that do not set burst inhibit properly and instead rely entirely on cache control to avoid burst. This is a broken design, but whatever, they exist. Having released code just not work with no explanation or no idea where the problem is coming from would be very aggravating. But by that same argument we should never use EHB mode because EHB didn't exist on the early NTSC A1000s. My argument is that the number of cards without burst inhibit are very small, and that's not reason not to use faster code just because some can't use it. Instead just leave an optional setting for people who have these cards to use slower code. It's not the responsibility of the coder to always nerf his code because there's a small number of systems with a broken design. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
|
|