View Single Post
Old 25 January 2015, 18:19   #1
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
Lightbulb Copymem Quick & Big Released!

CopyMem Quick & Big v1.7
Parts of patch install code by Dirk Busse 1999
Enhanced patch code by SpeedGeek 2021

INTRODUCTION:
CMQ&B is a big and faster Copymem + Copymemquick patch.
The main goal is to give the fastest possible results with Testit
from COPMQR28. In order to obtain these fast results
CMQ&B must have the redundant and bloated code needed
to handle many "Worst Case" copies.

FEATURES:
- Installs one of the fastest CMQ patches for 68020+ Amigas
- New JMP copy code speeds up small copies
- Safely exits if the patch is already installed (e.g. a good patch
program should really avoid patching itself)

REQUIREMENTS:
- Amiga with 68020+

NOTES:
CMQ&B is an extension of CMQ&S. It has some extra code to
handle many small and misaligned copies. There are trade offs in
supporting these "Worst Case" copies. Specifically, The Best Case
performance has been reduced and the size of the patch has
increased to 320 bytes.

HISTORY:
v1.6 first release
v1.7 Updated Big loop code with faster instructions. Increased
Big loop copy size to 112 bytes. Replaced Small loop copy code
with new JMP copy code for <= 108 bytes.

******************************************************
CopyMem Quick & Big040 v2.3
Parts of patch install code by Dirk Busse 1999
Enhanced patch code by SpeedGeek 2021

INTRODUCTION:
CMQ&B040 is a big and faster Copymem + Copymemquick patch.
The main goal is to give the fastest possible results with Testit
from COPMQR28. In order to obtain these fast results CMQ&B040
must have the redundant and bloated code needed to handle
many "Worst Case" copies.

FEATURES:
- Automatically installs one of the fastest CMQ patches for 040+
- The Move16 address is restricted only for performance reasons
(See Notes)
- New smart buffer copy code handles MOVE16 alignment
restrictions
- User selected 1024-8192 byte Block Size options allow "Tuning"
the MoveL vs. Move16 performance of your system. Since v2.1 the
default Block size is 4096
- Safely exits if the patch is already installed (e.g. a good patch
program should really avoid patching itself)

REQUIREMENTS:
- Amiga with 68040+
- Move16 is only enabled for the (minimum) Block Size version
you installed (larger sizes always qualify).

NOTES:
CMQ&B040 is an extension of CMQ&S. It has some extra code to
handle many small and misaligned copies. There are trade offs in
supporting these "Worst Case" copies. Specifically, The Best Case
performance has been reduced and the size of the patch has
increased to 540 bytes. Since v2.1 stack usage is now 84 bytes per
misaligned large block copy.

Move16 does not cause a burst access problem with Chip RAM since
it simply is not possible to access Chip RAM in this way. Burst
operation is controlled in Hardware (See Transfer Burst Inhibit
operation in the 040 manual). The Smart buffer copy loop is
address restricted (for performance reasons only) when the
destination address is in Chip RAM.

Block size "Tuning" options are application specific. If you want the
fastest copy results for Fast RAM use the Block size = Data cache
size option. If you want better multitasking performance use the
Block size = 1/2 Data cache size option. If a particular Software
application targets non-cacheable memory (e.g. Chip RAM or Graphics
Board RAM) the Block size = Smallest option may be faster for that
particular case.

HISTORY:
v1.7 first release
v1.8 minor change
- removed obsolete Copymemquick source address compare code
v1.9 New smart buffer copy code provides a BIG SPEED UP
since the MOVE16 alignment restrictions are well handled!
v2.0 Fixed a seldom occurring but serious bug with internal Smart
buffer usage.
- Nested call large block copies (WHEN MISALIGNED!) could corrupt
each others data when sharing the same buffer. This fix uses a stack
based buffer solution which results in a private buffer for each call.
v2.1 Many changes
- Fixed a rarely occurring stack size bug when the stack was word
aligned and offset by one word from a 16 byte aligned address.
- Added code to test for the Move16 address bug and safely exit upon
detection
- Added code to restrict Smart buffer copy usage when the
destination address is in Chip RAM.
- Added code to change the default Block size
v2.2 minor change
- Removed "Move16 Bug" detection code. This was a blunder due to
Ax = Ay meaning the same registers rather than the same addresses.
v2.3 minor change
- Changed address register longword math to word math for the
Smart buffer copy loop. This is a small optimization but we always
want the fastest possible results

*************************************************************
CopyMem Quick & Big040 SAFER v2.3
Parts of patch install code by Dirk Busse 1999
Enhanced patch code by SpeedGeek 2024

INTRODUCTION:
CMQ&B040_SAFER is a special version of CMQ&B040 which is
intended to be somewhat safer than the standard version. However, it
should not ever be considered 100% safe. More specifically, it should
provide the ability to crash without a loss of data as described in several
of the Motorola Move16 errata cases.

This version has some extra code to test if the source and
destination addresses are equal. This is a user program bug, but
it's still safer to avoid using Move16 in this particular case.
There is also code to test these specified destination addresses:

- $E00000 EXT ROM space (512 KB)
- $F80000 STD ROM space (512 KB)

The EXT ROM space is marked MMU invalid for 512 KB Kickstart ROM
systems by most 68040 and 68060 libraries. While this should not
be the case for 1 MB ROM systems this address space may be MMU
write protected by ROM remapping tools. The STD ROM address
space may also be MMU write protected by ROM remapping tools.

If I understand the Motorola documentation correctly, there should
be no need to test the source address for Move16 since the MMU
invalid address space doesn't have any valid data to become cached
and invalidated.

UNSAFE USAGE:
This version does NOT attempt to be safe with any possible
reported hardware bugs such as:

- The early mask set 68040 (e.g. "XC" variant CPUs)
- Broken or defective 68040/060 accelerators and turbo boards

This version is not safe nor recommended for use with the
mmu.library (AKA MMUlib by ThoR).

NOTES: Small block copy performance is not affected by the extra
Move16 safety code. But of course, large block copy performance
will be reduced. Testit results will not be provided with this
special version.

HISTORY:

v2.3 First Safer version
- Added code to test for equal source and destination addresses
and avoid using Move16 for this specific case.
- Added code to test specified destination addresses and avoid
using Move16 for those cases.

*************************************************************
Attached Files
File Type: lha CMQ&B17.LHA (1.1 KB, 219 views)
File Type: lha CMQ&B040_23.LHA (2.3 KB, 171 views)
File Type: lha CMQ&B040_SAFER23.LHA (1.7 KB, 29 views)

Last edited by SpeedGeek; 07 January 2024 at 17:57.
SpeedGeek is offline  
 
Page generated in 0.04792 seconds with 12 queries