English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. System (https://eab.abime.net/forumdisplay.php?f=113)
-   -   Copymem Quick & Big Released! (https://eab.abime.net/showthread.php?t=76777)

SpeedGeek 25 January 2015 18:19

Copymem Quick & Big Released!
 
3 Attachment(s)
CopyMem Quick & Big v1.7
Parts of patch install code by Dirk Busse 1999
Enhanced patch code by SpeedGeek 2021

INTRODUCTION:
CMQ&B is a big and faster Copymem + Copymemquick patch.
The main goal is to give the fastest possible results with Testit
from COPMQR28. In order to obtain these fast results
CMQ&B must have the redundant and bloated code needed
to handle many "Worst Case" copies.

FEATURES:
- Installs one of the fastest CMQ patches for 68020+ Amigas
- New JMP copy code speeds up small copies
- Safely exits if the patch is already installed (e.g. a good patch
program should really avoid patching itself)

REQUIREMENTS:
- Amiga with 68020+

NOTES:
CMQ&B is an extension of CMQ&S. It has some extra code to
handle many small and misaligned copies. There are trade offs in
supporting these "Worst Case" copies. Specifically, The Best Case
performance has been reduced and the size of the patch has
increased to 320 bytes.

HISTORY:
v1.6 first release
v1.7 Updated Big loop code with faster instructions. Increased
Big loop copy size to 112 bytes. Replaced Small loop copy code
with new JMP copy code for <= 108 bytes.

******************************************************
CopyMem Quick & Big040 v2.3
Parts of patch install code by Dirk Busse 1999
Enhanced patch code by SpeedGeek 2021

INTRODUCTION:
CMQ&B040 is a big and faster Copymem + Copymemquick patch.
The main goal is to give the fastest possible results with Testit
from COPMQR28. In order to obtain these fast results CMQ&B040
must have the redundant and bloated code needed to handle
many "Worst Case" copies.

FEATURES:
- Automatically installs one of the fastest CMQ patches for 040+
- The Move16 address is restricted only for performance reasons
(See Notes)
- New smart buffer copy code handles MOVE16 alignment
restrictions
- User selected 1024-8192 byte Block Size options allow "Tuning"
the MoveL vs. Move16 performance of your system. Since v2.1 the
default Block size is 4096
- Safely exits if the patch is already installed (e.g. a good patch
program should really avoid patching itself)

REQUIREMENTS:
- Amiga with 68040+
- Move16 is only enabled for the (minimum) Block Size version
you installed (larger sizes always qualify).

NOTES:
CMQ&B040 is an extension of CMQ&S. It has some extra code to
handle many small and misaligned copies. There are trade offs in
supporting these "Worst Case" copies. Specifically, The Best Case
performance has been reduced and the size of the patch has
increased to 540 bytes. Since v2.1 stack usage is now 84 bytes per
misaligned large block copy.

Move16 does not cause a burst access problem with Chip RAM since
it simply is not possible to access Chip RAM in this way. Burst
operation is controlled in Hardware (See Transfer Burst Inhibit
operation in the 040 manual). The Smart buffer copy loop is
address restricted (for performance reasons only) when the
destination address is in Chip RAM.

Block size "Tuning" options are application specific. If you want the
fastest copy results for Fast RAM use the Block size = Data cache
size option. If you want better multitasking performance use the
Block size = 1/2 Data cache size option. If a particular Software
application targets non-cacheable memory (e.g. Chip RAM or Graphics
Board RAM) the Block size = Smallest option may be faster for that
particular case.

HISTORY:
v1.7 first release
v1.8 minor change
- removed obsolete Copymemquick source address compare code
v1.9 New smart buffer copy code provides a BIG SPEED UP
since the MOVE16 alignment restrictions are well handled!
v2.0 Fixed a seldom occurring but serious bug with internal Smart
buffer usage.
- Nested call large block copies (WHEN MISALIGNED!) could corrupt
each others data when sharing the same buffer. This fix uses a stack
based buffer solution which results in a private buffer for each call.
v2.1 Many changes
- Fixed a rarely occurring stack size bug when the stack was word
aligned and offset by one word from a 16 byte aligned address.
- Added code to test for the Move16 address bug and safely exit upon
detection
- Added code to restrict Smart buffer copy usage when the
destination address is in Chip RAM.
- Added code to change the default Block size
v2.2 minor change
- Removed "Move16 Bug" detection code. This was a blunder due to
Ax = Ay meaning the same registers rather than the same addresses.
v2.3 minor change
- Changed address register longword math to word math for the
Smart buffer copy loop. This is a small optimization but we always
want the fastest possible results

*************************************************************
CopyMem Quick & Big040 SAFER v2.3
Parts of patch install code by Dirk Busse 1999
Enhanced patch code by SpeedGeek 2024

INTRODUCTION:
CMQ&B040_SAFER is a special version of CMQ&B040 which is
intended to be somewhat safer than the standard version. However, it
should not ever be considered 100% safe. More specifically, it should
provide the ability to crash without a loss of data as described in several
of the Motorola Move16 errata cases.

This version has some extra code to test if the source and
destination addresses are equal. This is a user program bug, but
it's still safer to avoid using Move16 in this particular case.
There is also code to test these specified destination addresses:

- $E00000 EXT ROM space (512 KB)
- $F80000 STD ROM space (512 KB)

The EXT ROM space is marked MMU invalid for 512 KB Kickstart ROM
systems by most 68040 and 68060 libraries. While this should not
be the case for 1 MB ROM systems this address space may be MMU
write protected by ROM remapping tools. The STD ROM address
space may also be MMU write protected by ROM remapping tools.

If I understand the Motorola documentation correctly, there should
be no need to test the source address for Move16 since the MMU
invalid address space doesn't have any valid data to become cached
and invalidated.

UNSAFE USAGE:
This version does NOT attempt to be safe with any possible
reported hardware bugs such as:

- The early mask set 68040 (e.g. "XC" variant CPUs)
- Broken or defective 68040/060 accelerators and turbo boards

This version is not safe nor recommended for use with the
mmu.library (AKA MMUlib by ThoR).

NOTES: Small block copy performance is not affected by the extra
Move16 safety code. But of course, large block copy performance
will be reduced. Testit results will not be provided with this
special version.

HISTORY:

v2.3 First Safer version
- Added code to test for equal source and destination addresses
and avoid using Move16 for this specific case.
- Added code to test specified destination addresses and avoid
using Move16 for those cases.

*************************************************************

SpeedGeek 25 January 2015 18:24

Some Testit results for CMQ&B 1.7:

Code:

This test will compare the old CopyMem/CopyMemQuick routines with
the new ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 282 times (long -> long offset)
Old CopyMem        :  1.23 secs
New CopyMem        :  1.30 secs (+ 4.9%)
Old CopyMemQuick:  1.24 secs
New CopyMemQuick:  1.25 secs (+ 0.0%)

Copying 65536 bytes 73 times (long -> long+1 offset)
Old CopyMem        :  0.33 secs
New CopyMem        :  0.71 secs (+115.1%)

Copying 65536 bytes 206 times (long -> even offset)
Old CopyMem        :  0.96 secs
New CopyMem        :  1.31 secs (+36.4%)

Copying 65536 bytes 73 times (long -> even+1 offset)
Old CopyMem        :  0.34 secs
New CopyMem        :  0.70 secs (+102.9%)

Copying 65536 bytes 73 times (long+1 -> long offset)
Old CopyMem        :  0.36 secs
New CopyMem        :  0.71 secs (+97.2%)

Copying 65536 bytes 191 times (long+1 -> long+1 offset)
Old CopyMem        :  0.83 secs
New CopyMem        :  0.88 secs (+ 6.0%)

Copying 65536 bytes 73 times (long+1 -> even offset)
Old CopyMem        :  0.33 secs
New CopyMem        :  0.71 secs (+115.1%)

Copying 65536 bytes 250 times (long+1 -> even+1 offset)
Old CopyMem        :  1.18 secs
New CopyMem        :  1.26 secs (+ 6.8%)

Copying 65536 bytes 250 times (even -> long offset)
Old CopyMem        :  1.30 secs
New CopyMem        :  1.28 secs (- 0.8%)

Copying 65536 bytes 73 times (even -> long+1 offset)
Old CopyMem        :  0.38 secs
New CopyMem        :  0.71 secs (+86.8%)

Copying 65536 bytes 191 times (even -> even offset)
Old CopyMem        :  0.83 secs
New CopyMem        :  0.89 secs (+ 7.2%)

Copying 65536 bytes 73 times (even -> even+1 offset)
Old CopyMem        :  0.33 secs
New CopyMem        :  0.71 secs (+115.1%)

Copying 65536 bytes 73 times (even+1 -> long offset)
Old CopyMem        :  0.38 secs
New CopyMem        :  0.70 secs (+81.6%)

Copying 65536 bytes 206 times (even+1 -> long+1 offset)
Old CopyMem        :  1.06 secs
New CopyMem        :  1.29 secs (+21.7%)

Copying 65536 bytes 73 times (even+1 -> even offset)
Old CopyMem        :  0.38 secs
New CopyMem        :  0.71 secs (+86.8%)

Copying 65536 bytes 282 times (even+1 -> even+1 offset)
Old CopyMem        :  1.23 secs
New CopyMem        :  1.30 secs (+ 4.9%)

Copying 1024 bytes 16950 times (long -> long offset)
Old CopyMem        :  1.20 secs
New CopyMem        :  1.29 secs (+ 7.5%)
Old CopyMemQuick:  1.15 secs
New CopyMemQuick:  1.28 secs (+11.3%)

Copying 1024 bytes 4700 times (long -> long+1 offset)
Old CopyMem        :  0.36 secs
New CopyMem        :  0.75 secs (+105.5%)

Copying 1024 bytes 12000 times (even -> even offset)
Old CopyMem        :  0.86 secs
New CopyMem        :  0.93 secs (+ 7.0%)

Copying 128 bytes 98000 times (long -> long offset)
Old CopyMem        :  0.98 secs
New CopyMem        :  1.01 secs (+ 3.1%)
Old CopyMemQuick:  0.78 secs
New CopyMemQuick:  0.91 secs (+16.7%)

Copying 128 bytes 77500 times (even -> even offset)
Old CopyMem        :  0.80 secs
New CopyMem        :  0.90 secs (+11.2%)

Copying 19 bytes 294000 times (long -> long offset)
Old CopyMem        :  0.40 secs
New CopyMem        :  0.86 secs (+115.0%)

Copying 18 bytes 311000 times (long -> long offset)
Old CopyMem        :  0.41 secs
New CopyMem        :  0.71 secs (+73.2%)

Copying 17 bytes 331500 times (long -> long offset)
Old CopyMem        :  0.43 secs
New CopyMem        :  0.81 secs (+88.4%)

Copying 16 bytes 478000 times (long -> long offset)
Old CopyMem        :  0.56 secs
New CopyMem        :  1.03 secs (+82.1%)
Old CopyMemQuick:  0.35 secs
New CopyMemQuick:  0.53 secs (+51.4%)

Copying 8 bytes 530000 times (long -> long offset)
Old CopyMem        :  0.43 secs
New CopyMem        :  0.90 secs (+107.0%)
Old CopyMemQuick:  0.20 secs
New CopyMemQuick:  0.35 secs (+75.0%)

Copying 4 bytes 715000 times (long -> long offset)
Old CopyMem        :  0.43 secs
New CopyMem        :  0.58 secs (+34.9%)
Old CopyMemQuick:  0.11 secs
New CopyMemQuick:  0.30 secs (+163.6%)

Copying 1 bytes 1095000 times (long -> long offset)
Old CopyMem        :  0.61 secs
New CopyMem        :  0.13 secs (-78.7%)

Total timing:
-------------
Old routines        :  22.88 secs
New routines        :  29.83 secs
Total slowdown        :  30.37 %

Some Testit results for CMQ&B040 2.1:
Code:

This test will compare the old CopyMem/CopyMemQuick routines with
the new ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 282 times (long -> long offset)
Old CopyMem        :  1.23 secs
New CopyMem        :  1.30 secs (+ 4.9%)
Old CopyMemQuick:  1.24 secs
New CopyMemQuick:  1.25 secs (+ 0.0%)

Copying 65536 bytes 73 times (long -> long+1 offset)
Old CopyMem        :  0.33 secs
New CopyMem        :  0.71 secs (+115.1%)

Copying 65536 bytes 206 times (long -> even offset)
Old CopyMem        :  0.96 secs
New CopyMem        :  1.31 secs (+36.4%)

Copying 65536 bytes 73 times (long -> even+1 offset)
Old CopyMem        :  0.34 secs
New CopyMem        :  0.70 secs (+102.9%)

Copying 65536 bytes 73 times (long+1 -> long offset)
Old CopyMem        :  0.36 secs
New CopyMem        :  0.71 secs (+97.2%)

Copying 65536 bytes 191 times (long+1 -> long+1 offset)
Old CopyMem        :  0.83 secs
New CopyMem        :  0.88 secs (+ 6.0%)

Copying 65536 bytes 73 times (long+1 -> even offset)
Old CopyMem        :  0.33 secs
New CopyMem        :  0.71 secs (+115.1%)

Copying 65536 bytes 250 times (long+1 -> even+1 offset)
Old CopyMem        :  1.18 secs
New CopyMem        :  1.26 secs (+ 6.8%)

Copying 65536 bytes 250 times (even -> long offset)
Old CopyMem        :  1.30 secs
New CopyMem        :  1.28 secs (- 0.8%)

Copying 65536 bytes 73 times (even -> long+1 offset)
Old CopyMem        :  0.38 secs
New CopyMem        :  0.71 secs (+86.8%)

Copying 65536 bytes 191 times (even -> even offset)
Old CopyMem        :  0.83 secs
New CopyMem        :  0.89 secs (+ 7.2%)

Copying 65536 bytes 73 times (even -> even+1 offset)
Old CopyMem        :  0.33 secs
New CopyMem        :  0.71 secs (+115.1%)

Copying 65536 bytes 73 times (even+1 -> long offset)
Old CopyMem        :  0.38 secs
New CopyMem        :  0.70 secs (+81.6%)

Copying 65536 bytes 206 times (even+1 -> long+1 offset)
Old CopyMem        :  1.06 secs
New CopyMem        :  1.29 secs (+21.7%)

Copying 65536 bytes 73 times (even+1 -> even offset)
Old CopyMem        :  0.38 secs
New CopyMem        :  0.71 secs (+86.8%)

Copying 65536 bytes 282 times (even+1 -> even+1 offset)
Old CopyMem        :  1.23 secs
New CopyMem        :  1.30 secs (+ 4.9%)

Copying 1024 bytes 16950 times (long -> long offset)
Old CopyMem        :  1.20 secs
New CopyMem        :  1.29 secs (+ 7.5%)
Old CopyMemQuick:  1.15 secs
New CopyMemQuick:  1.28 secs (+11.3%)

Copying 1024 bytes 4700 times (long -> long+1 offset)
Old CopyMem        :  0.36 secs
New CopyMem        :  0.75 secs (+105.5%)

Copying 1024 bytes 12000 times (even -> even offset)
Old CopyMem        :  0.86 secs
New CopyMem        :  0.93 secs (+ 7.0%)

Copying 128 bytes 98000 times (long -> long offset)
Old CopyMem        :  0.98 secs
New CopyMem        :  1.01 secs (+ 3.1%)
Old CopyMemQuick:  0.78 secs
New CopyMemQuick:  0.91 secs (+16.7%)

Copying 128 bytes 77500 times (even -> even offset)
Old CopyMem        :  0.80 secs
New CopyMem        :  0.90 secs (+11.2%)

Copying 19 bytes 294000 times (long -> long offset)
Old CopyMem        :  0.40 secs
New CopyMem        :  0.86 secs (+115.0%)

Copying 18 bytes 311000 times (long -> long offset)
Old CopyMem        :  0.41 secs
New CopyMem        :  0.71 secs (+73.2%)

Copying 17 bytes 331500 times (long -> long offset)
Old CopyMem        :  0.43 secs
New CopyMem        :  0.81 secs (+88.4%)

Copying 16 bytes 478000 times (long -> long offset)
Old CopyMem        :  0.56 secs
New CopyMem        :  1.03 secs (+82.1%)
Old CopyMemQuick:  0.35 secs
New CopyMemQuick:  0.53 secs (+51.4%)

Copying 8 bytes 530000 times (long -> long offset)
Old CopyMem        :  0.43 secs
New CopyMem        :  0.90 secs (+107.0%)
Old CopyMemQuick:  0.20 secs
New CopyMemQuick:  0.35 secs (+75.0%)

Copying 4 bytes 715000 times (long -> long offset)
Old CopyMem        :  0.43 secs
New CopyMem        :  0.58 secs (+34.9%)
Old CopyMemQuick:  0.11 secs
New CopyMemQuick:  0.30 secs (+163.6%)

Copying 1 bytes 1095000 times (long -> long offset)
Old CopyMem        :  0.61 secs
New CopyMem        :  0.13 secs (-78.7%)

Total timil timing:
-------------
Old routines        :  48.48 secs
New routines        :  77.88 secs
Total slowdown        :  60.64 %


HanSolo 26 January 2015 15:39

Thanks for good patch. What is your next project ?

SpeedGeek 27 January 2015 12:32

** NEWS UPDATE **

CMQ&B040 v1.8 released

v1.8 minor change
- removed obsolete Copymemquick source address compare code

@HanSolo
When there's nothing more to do on this project maybe some scsi.device stuff...

nogginthenog 14 October 2016 21:54

Hey SpeedGeek,

Where can I find version 1.8? Doesn't seem to be on Aminet.

arti 15 October 2016 12:16

Shouldn't New CopyMem have shorter times?

SpeedGeek 15 October 2016 16:27

Quote:

Originally Posted by nogginthenog (Post 1116598)
Hey SpeedGeek,

Where can I find version 1.8? Doesn't seem to be on Aminet.

That's because it's here on EAB (in post #1). ;)

Quote:

Originally Posted by arti (Post 1116709)
Shouldn't New CopyMem have shorter times?

Testit always compares the old Copymem against Copymemquicker 2.8. So old Copymem = CMQ&B and new Copymem = Copymemquicker 2.8. :rolleyes

nogginthenog 15 October 2016 17:01

Quote:

Originally Posted by SpeedGeek (Post 1116760)
That's because it's here on EAB (in post #1). ;)

Ah, I see my problem. I opened CMQ&B.LHA which says version 1.6 in the readme.

Thanks, I'll give it a try. :great

trixster 16 October 2016 10:47

How does CMQ&B compare in speed to matthey's CM060?

http://aminet.net/package/util/boot/CopyMem

SpeedGeek 04 July 2020 13:58

** 2ND NEWS UPDATE **

CMQ&B040 1.9 released!

-v1.9 New smart buffer copy code provides a BIG SPEED UP
since the MOVE16 alignment restrictions are well handled!
(See new Testit results).

SpeedGeek 11 July 2020 03:10

** 3RD NEWS UPDATE **

CMQ&B040 2.0 released!

v2.0 Fixed a seldom occuring but serious bug with internal Smart
buffer usage.
- Nested call large block copies (WHEN MISALIGNED!) could corrupt
each others data when sharing the same buffer. This fix uses a stack
based buffer solution which results in a private buffer for each call.

rabidgerry 15 December 2020 21:38

Quote:

Originally Posted by SpeedGeek (Post 1413161)
** 3RD NEWS UPDATE **

CMQ&B040 2.0 released!

v2.0 Fixed a seldom occuring but serious bug with internal Smart
buffer usage.
- Nested call large block copies (WHEN MISALIGNED!) could corrupt
each others data when sharing the same buffer. This fix uses a stack
based buffer solution which results in a private buffer for each call.

Tried using this but may be I don't have it installed right. How will I know if CopyMem is working on my machine or not?

I have CopyMem060 on mine.

SpeedGeek 15 December 2020 22:42

Quote:

Originally Posted by rabidgerry (Post 1446512)
Tried using this but may be I don't have it installed right. How will I know if CopyMem is working on my machine or not?

I have CopyMem060 on mine.

If the patch fails to install, the return code is 20. So you could make an IF FAIL script. You could also just download Testit from COPMQR28 (Aminet) and determine your own results. ;)

BTW, the so called "060 Optimized" CMQ patches really don't offer much of a performance difference from the 040 CMQ patches.

rabidgerry 15 December 2020 23:52

Well I'm not getting a fail code. It just boots up and then that's it. Also the version I got was from 09 and off Aminet, so I defo have an old version I think. No matter I just followed the guide that said stick it in where ever and then invoke in your startup after setpatch somewhere and make sure you type the command Run before hand. But I dunno what it's doing or what performance enhancement I'm getting.

You have to pardon my ignorance BTW

Well actually you don't but please do :laughing

****edit**** ok so I've just realised your patch is a different thing entirely. Perhaps I should scrap the copymem then and install yours! I downloaded to try!

SpeedGeek 13 February 2021 15:16

** 4TH NEWS UPDATE **

CMQ&B 1.7 released!

v1.7 Updated Big loop code with faster instructions. Increased
Big loop copy size to 112 bytes. Replaced Small loop copy code
with new JMP copy code for <= 108 bytes (See new testit results for 1.7).

SpeedGeek 26 April 2021 14:10

** 5TH NEWS UPDATE **

CMQ&B040 2.1 released!

v2.1 Many changes
- Fixed a rarely occurring stack size bug when the stack was word
aligned and offset by one word from a 16 byte aligned address.
- Added code to test for the Move16 address bug and safely exit upon
detection
- Added code to restrict Smart buffer copy usage when the
destination address is in Chip RAM.
- Added code to change the default Block size

SpeedGeek 12 May 2021 13:54

** 6TH NEWS UPDATE **

CMQ&B040 2.2 released!

v2.2 minor change
- Removed "Move16 Bug" detection code. This was a blunder due to
Ax = Ay meaning the same registers rather than the same addresses.

SpeedGeek 07 June 2021 21:26

** 7TH NEWS UPDATE **

CMQ&B040 2.3 released!

v2.3 minor change
- Changed address register longword math to word math for the
Smart buffer copy loop. This is a small optimization but we always
want the fastest possible results

koobo 29 November 2023 10:32

Did/does anyone ever notice a real improvement from these CopyMem-improvement patches? Or maybe measure how many calls and what kind of parameters would be generated when using the OS for some ordinary tasks?

There were a lot of these patches, I also did one back in the day and was happy with myself. Whether it made any difference, that's another matter.

derSammler 29 November 2023 10:37

As it says in the description:
Quote:

The main goal is to give the fastest possible results with Testit from COPMQR28.
It's obviously for people who like to brag with benchmark results. (no negative notion intended)

I doubt you find much, if any software, that will be much fast with that patch compared to other similar patches.


All times are GMT +2. The time now is 05:44.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.04941 seconds with 11 queries