25 January 2015, 18:19 | #1 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
Copymem Quick & Big Released!
CopyMem Quick & Big v1.7
Parts of patch install code by Dirk Busse 1999 Enhanced patch code by SpeedGeek 2021 INTRODUCTION: CMQ&B is a big and faster Copymem + Copymemquick patch. The main goal is to give the fastest possible results with Testit from COPMQR28. In order to obtain these fast results CMQ&B must have the redundant and bloated code needed to handle many "Worst Case" copies. FEATURES: - Installs one of the fastest CMQ patches for 68020+ Amigas - New JMP copy code speeds up small copies - Safely exits if the patch is already installed (e.g. a good patch program should really avoid patching itself) REQUIREMENTS: - Amiga with 68020+ NOTES: CMQ&B is an extension of CMQ&S. It has some extra code to handle many small and misaligned copies. There are trade offs in supporting these "Worst Case" copies. Specifically, The Best Case performance has been reduced and the size of the patch has increased to 320 bytes. HISTORY: v1.6 first release v1.7 Updated Big loop code with faster instructions. Increased Big loop copy size to 112 bytes. Replaced Small loop copy code with new JMP copy code for <= 108 bytes. ****************************************************** CopyMem Quick & Big040 v2.3 Parts of patch install code by Dirk Busse 1999 Enhanced patch code by SpeedGeek 2021 INTRODUCTION: CMQ&B040 is a big and faster Copymem + Copymemquick patch. The main goal is to give the fastest possible results with Testit from COPMQR28. In order to obtain these fast results CMQ&B040 must have the redundant and bloated code needed to handle many "Worst Case" copies. FEATURES: - Automatically installs one of the fastest CMQ patches for 040+ - The Move16 address is restricted only for performance reasons (See Notes) - New smart buffer copy code handles MOVE16 alignment restrictions - User selected 1024-8192 byte Block Size options allow "Tuning" the MoveL vs. Move16 performance of your system. Since v2.1 the default Block size is 4096 - Safely exits if the patch is already installed (e.g. a good patch program should really avoid patching itself) REQUIREMENTS: - Amiga with 68040+ - Move16 is only enabled for the (minimum) Block Size version you installed (larger sizes always qualify). NOTES: CMQ&B040 is an extension of CMQ&S. It has some extra code to handle many small and misaligned copies. There are trade offs in supporting these "Worst Case" copies. Specifically, The Best Case performance has been reduced and the size of the patch has increased to 540 bytes. Since v2.1 stack usage is now 84 bytes per misaligned large block copy. Move16 does not cause a burst access problem with Chip RAM since it simply is not possible to access Chip RAM in this way. Burst operation is controlled in Hardware (See Transfer Burst Inhibit operation in the 040 manual). The Smart buffer copy loop is address restricted (for performance reasons only) when the destination address is in Chip RAM. Block size "Tuning" options are application specific. If you want the fastest copy results for Fast RAM use the Block size = Data cache size option. If you want better multitasking performance use the Block size = 1/2 Data cache size option. If a particular Software application targets non-cacheable memory (e.g. Chip RAM or Graphics Board RAM) the Block size = Smallest option may be faster for that particular case. HISTORY: v1.7 first release v1.8 minor change - removed obsolete Copymemquick source address compare code v1.9 New smart buffer copy code provides a BIG SPEED UP since the MOVE16 alignment restrictions are well handled! v2.0 Fixed a seldom occurring but serious bug with internal Smart buffer usage. - Nested call large block copies (WHEN MISALIGNED!) could corrupt each others data when sharing the same buffer. This fix uses a stack based buffer solution which results in a private buffer for each call. v2.1 Many changes - Fixed a rarely occurring stack size bug when the stack was word aligned and offset by one word from a 16 byte aligned address. - Added code to test for the Move16 address bug and safely exit upon detection - Added code to restrict Smart buffer copy usage when the destination address is in Chip RAM. - Added code to change the default Block size v2.2 minor change - Removed "Move16 Bug" detection code. This was a blunder due to Ax = Ay meaning the same registers rather than the same addresses. v2.3 minor change - Changed address register longword math to word math for the Smart buffer copy loop. This is a small optimization but we always want the fastest possible results ************************************************************* CopyMem Quick & Big040 SAFER v2.3 Parts of patch install code by Dirk Busse 1999 Enhanced patch code by SpeedGeek 2024 INTRODUCTION: CMQ&B040_SAFER is a special version of CMQ&B040 which is intended to be somewhat safer than the standard version. However, it should not ever be considered 100% safe. More specifically, it should provide the ability to crash without a loss of data as described in several of the Motorola Move16 errata cases. This version has some extra code to test if the source and destination addresses are equal. This is a user program bug, but it's still safer to avoid using Move16 in this particular case. There is also code to test these specified destination addresses: - $E00000 EXT ROM space (512 KB) - $F80000 STD ROM space (512 KB) The EXT ROM space is marked MMU invalid for 512 KB Kickstart ROM systems by most 68040 and 68060 libraries. While this should not be the case for 1 MB ROM systems this address space may be MMU write protected by ROM remapping tools. The STD ROM address space may also be MMU write protected by ROM remapping tools. If I understand the Motorola documentation correctly, there should be no need to test the source address for Move16 since the MMU invalid address space doesn't have any valid data to become cached and invalidated. UNSAFE USAGE: This version does NOT attempt to be safe with any possible reported hardware bugs such as: - The early mask set 68040 (e.g. "XC" variant CPUs) - Broken or defective 68040/060 accelerators and turbo boards This version is not safe nor recommended for use with the mmu.library (AKA MMUlib by ThoR). NOTES: Small block copy performance is not affected by the extra Move16 safety code. But of course, large block copy performance will be reduced. Testit results will not be provided with this special version. HISTORY: v2.3 First Safer version - Added code to test for equal source and destination addresses and avoid using Move16 for this specific case. - Added code to test specified destination addresses and avoid using Move16 for those cases. ************************************************************* Last edited by SpeedGeek; 07 January 2024 at 17:57. |
25 January 2015, 18:24 | #2 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
Some Testit results for CMQ&B 1.7:
Code:
This test will compare the old CopyMem/CopyMemQuick routines with the new ones you have installed. A great variety of tests will be run, and this might take some time, especially if your system has a slow processor. Initiating test (please be patient...) Copying 65536 bytes 282 times (long -> long offset) Old CopyMem : 1.23 secs New CopyMem : 1.30 secs (+ 4.9%) Old CopyMemQuick: 1.24 secs New CopyMemQuick: 1.25 secs (+ 0.0%) Copying 65536 bytes 73 times (long -> long+1 offset) Old CopyMem : 0.33 secs New CopyMem : 0.71 secs (+115.1%) Copying 65536 bytes 206 times (long -> even offset) Old CopyMem : 0.96 secs New CopyMem : 1.31 secs (+36.4%) Copying 65536 bytes 73 times (long -> even+1 offset) Old CopyMem : 0.34 secs New CopyMem : 0.70 secs (+102.9%) Copying 65536 bytes 73 times (long+1 -> long offset) Old CopyMem : 0.36 secs New CopyMem : 0.71 secs (+97.2%) Copying 65536 bytes 191 times (long+1 -> long+1 offset) Old CopyMem : 0.83 secs New CopyMem : 0.88 secs (+ 6.0%) Copying 65536 bytes 73 times (long+1 -> even offset) Old CopyMem : 0.33 secs New CopyMem : 0.71 secs (+115.1%) Copying 65536 bytes 250 times (long+1 -> even+1 offset) Old CopyMem : 1.18 secs New CopyMem : 1.26 secs (+ 6.8%) Copying 65536 bytes 250 times (even -> long offset) Old CopyMem : 1.30 secs New CopyMem : 1.28 secs (- 0.8%) Copying 65536 bytes 73 times (even -> long+1 offset) Old CopyMem : 0.38 secs New CopyMem : 0.71 secs (+86.8%) Copying 65536 bytes 191 times (even -> even offset) Old CopyMem : 0.83 secs New CopyMem : 0.89 secs (+ 7.2%) Copying 65536 bytes 73 times (even -> even+1 offset) Old CopyMem : 0.33 secs New CopyMem : 0.71 secs (+115.1%) Copying 65536 bytes 73 times (even+1 -> long offset) Old CopyMem : 0.38 secs New CopyMem : 0.70 secs (+81.6%) Copying 65536 bytes 206 times (even+1 -> long+1 offset) Old CopyMem : 1.06 secs New CopyMem : 1.29 secs (+21.7%) Copying 65536 bytes 73 times (even+1 -> even offset) Old CopyMem : 0.38 secs New CopyMem : 0.71 secs (+86.8%) Copying 65536 bytes 282 times (even+1 -> even+1 offset) Old CopyMem : 1.23 secs New CopyMem : 1.30 secs (+ 4.9%) Copying 1024 bytes 16950 times (long -> long offset) Old CopyMem : 1.20 secs New CopyMem : 1.29 secs (+ 7.5%) Old CopyMemQuick: 1.15 secs New CopyMemQuick: 1.28 secs (+11.3%) Copying 1024 bytes 4700 times (long -> long+1 offset) Old CopyMem : 0.36 secs New CopyMem : 0.75 secs (+105.5%) Copying 1024 bytes 12000 times (even -> even offset) Old CopyMem : 0.86 secs New CopyMem : 0.93 secs (+ 7.0%) Copying 128 bytes 98000 times (long -> long offset) Old CopyMem : 0.98 secs New CopyMem : 1.01 secs (+ 3.1%) Old CopyMemQuick: 0.78 secs New CopyMemQuick: 0.91 secs (+16.7%) Copying 128 bytes 77500 times (even -> even offset) Old CopyMem : 0.80 secs New CopyMem : 0.90 secs (+11.2%) Copying 19 bytes 294000 times (long -> long offset) Old CopyMem : 0.40 secs New CopyMem : 0.86 secs (+115.0%) Copying 18 bytes 311000 times (long -> long offset) Old CopyMem : 0.41 secs New CopyMem : 0.71 secs (+73.2%) Copying 17 bytes 331500 times (long -> long offset) Old CopyMem : 0.43 secs New CopyMem : 0.81 secs (+88.4%) Copying 16 bytes 478000 times (long -> long offset) Old CopyMem : 0.56 secs New CopyMem : 1.03 secs (+82.1%) Old CopyMemQuick: 0.35 secs New CopyMemQuick: 0.53 secs (+51.4%) Copying 8 bytes 530000 times (long -> long offset) Old CopyMem : 0.43 secs New CopyMem : 0.90 secs (+107.0%) Old CopyMemQuick: 0.20 secs New CopyMemQuick: 0.35 secs (+75.0%) Copying 4 bytes 715000 times (long -> long offset) Old CopyMem : 0.43 secs New CopyMem : 0.58 secs (+34.9%) Old CopyMemQuick: 0.11 secs New CopyMemQuick: 0.30 secs (+163.6%) Copying 1 bytes 1095000 times (long -> long offset) Old CopyMem : 0.61 secs New CopyMem : 0.13 secs (-78.7%) Total timing: ------------- Old routines : 22.88 secs New routines : 29.83 secs Total slowdown : 30.37 % Code:
This test will compare the old CopyMem/CopyMemQuick routines with the new ones you have installed. A great variety of tests will be run, and this might take some time, especially if your system has a slow processor. Initiating test (please be patient...) Copying 65536 bytes 282 times (long -> long offset) Old CopyMem : 1.23 secs New CopyMem : 1.30 secs (+ 4.9%) Old CopyMemQuick: 1.24 secs New CopyMemQuick: 1.25 secs (+ 0.0%) Copying 65536 bytes 73 times (long -> long+1 offset) Old CopyMem : 0.33 secs New CopyMem : 0.71 secs (+115.1%) Copying 65536 bytes 206 times (long -> even offset) Old CopyMem : 0.96 secs New CopyMem : 1.31 secs (+36.4%) Copying 65536 bytes 73 times (long -> even+1 offset) Old CopyMem : 0.34 secs New CopyMem : 0.70 secs (+102.9%) Copying 65536 bytes 73 times (long+1 -> long offset) Old CopyMem : 0.36 secs New CopyMem : 0.71 secs (+97.2%) Copying 65536 bytes 191 times (long+1 -> long+1 offset) Old CopyMem : 0.83 secs New CopyMem : 0.88 secs (+ 6.0%) Copying 65536 bytes 73 times (long+1 -> even offset) Old CopyMem : 0.33 secs New CopyMem : 0.71 secs (+115.1%) Copying 65536 bytes 250 times (long+1 -> even+1 offset) Old CopyMem : 1.18 secs New CopyMem : 1.26 secs (+ 6.8%) Copying 65536 bytes 250 times (even -> long offset) Old CopyMem : 1.30 secs New CopyMem : 1.28 secs (- 0.8%) Copying 65536 bytes 73 times (even -> long+1 offset) Old CopyMem : 0.38 secs New CopyMem : 0.71 secs (+86.8%) Copying 65536 bytes 191 times (even -> even offset) Old CopyMem : 0.83 secs New CopyMem : 0.89 secs (+ 7.2%) Copying 65536 bytes 73 times (even -> even+1 offset) Old CopyMem : 0.33 secs New CopyMem : 0.71 secs (+115.1%) Copying 65536 bytes 73 times (even+1 -> long offset) Old CopyMem : 0.38 secs New CopyMem : 0.70 secs (+81.6%) Copying 65536 bytes 206 times (even+1 -> long+1 offset) Old CopyMem : 1.06 secs New CopyMem : 1.29 secs (+21.7%) Copying 65536 bytes 73 times (even+1 -> even offset) Old CopyMem : 0.38 secs New CopyMem : 0.71 secs (+86.8%) Copying 65536 bytes 282 times (even+1 -> even+1 offset) Old CopyMem : 1.23 secs New CopyMem : 1.30 secs (+ 4.9%) Copying 1024 bytes 16950 times (long -> long offset) Old CopyMem : 1.20 secs New CopyMem : 1.29 secs (+ 7.5%) Old CopyMemQuick: 1.15 secs New CopyMemQuick: 1.28 secs (+11.3%) Copying 1024 bytes 4700 times (long -> long+1 offset) Old CopyMem : 0.36 secs New CopyMem : 0.75 secs (+105.5%) Copying 1024 bytes 12000 times (even -> even offset) Old CopyMem : 0.86 secs New CopyMem : 0.93 secs (+ 7.0%) Copying 128 bytes 98000 times (long -> long offset) Old CopyMem : 0.98 secs New CopyMem : 1.01 secs (+ 3.1%) Old CopyMemQuick: 0.78 secs New CopyMemQuick: 0.91 secs (+16.7%) Copying 128 bytes 77500 times (even -> even offset) Old CopyMem : 0.80 secs New CopyMem : 0.90 secs (+11.2%) Copying 19 bytes 294000 times (long -> long offset) Old CopyMem : 0.40 secs New CopyMem : 0.86 secs (+115.0%) Copying 18 bytes 311000 times (long -> long offset) Old CopyMem : 0.41 secs New CopyMem : 0.71 secs (+73.2%) Copying 17 bytes 331500 times (long -> long offset) Old CopyMem : 0.43 secs New CopyMem : 0.81 secs (+88.4%) Copying 16 bytes 478000 times (long -> long offset) Old CopyMem : 0.56 secs New CopyMem : 1.03 secs (+82.1%) Old CopyMemQuick: 0.35 secs New CopyMemQuick: 0.53 secs (+51.4%) Copying 8 bytes 530000 times (long -> long offset) Old CopyMem : 0.43 secs New CopyMem : 0.90 secs (+107.0%) Old CopyMemQuick: 0.20 secs New CopyMemQuick: 0.35 secs (+75.0%) Copying 4 bytes 715000 times (long -> long offset) Old CopyMem : 0.43 secs New CopyMem : 0.58 secs (+34.9%) Old CopyMemQuick: 0.11 secs New CopyMemQuick: 0.30 secs (+163.6%) Copying 1 bytes 1095000 times (long -> long offset) Old CopyMem : 0.61 secs New CopyMem : 0.13 secs (-78.7%) Total timil timing: ------------- Old routines : 48.48 secs New routines : 77.88 secs Total slowdown : 60.64 % Last edited by SpeedGeek; 26 April 2021 at 14:14. |
26 January 2015, 15:39 | #3 |
Registered User
Join Date: Aug 2014
Location: Gdynia/Poland
Posts: 162
|
Thanks for good patch. What is your next project ?
|
27 January 2015, 12:32 | #4 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
** NEWS UPDATE **
CMQ&B040 v1.8 released v1.8 minor change - removed obsolete Copymemquick source address compare code @HanSolo When there's nothing more to do on this project maybe some scsi.device stuff... |
14 October 2016, 21:54 | #5 |
Amigan
Join Date: Feb 2012
Location: London
Posts: 1,317
|
Hey SpeedGeek,
Where can I find version 1.8? Doesn't seem to be on Aminet. |
15 October 2016, 12:16 | #6 |
Registered User
Join Date: Jul 2008
Location: Poland
Posts: 665
|
Shouldn't New CopyMem have shorter times?
|
15 October 2016, 16:27 | #7 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
Quote:
Testit always compares the old Copymem against Copymemquicker 2.8. So old Copymem = CMQ&B and new Copymem = Copymemquicker 2.8. Last edited by SpeedGeek; 15 October 2016 at 17:01. |
|
15 October 2016, 17:01 | #8 |
Amigan
Join Date: Feb 2012
Location: London
Posts: 1,317
|
|
16 October 2016, 10:47 | #9 |
Guru Meditating
Join Date: Jun 2014
Location: England
Posts: 2,356
|
|
04 July 2020, 13:58 | #10 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
** 2ND NEWS UPDATE **
CMQ&B040 1.9 released! -v1.9 New smart buffer copy code provides a BIG SPEED UP since the MOVE16 alignment restrictions are well handled! (See new Testit results). Last edited by SpeedGeek; 06 July 2020 at 14:56. |
11 July 2020, 03:10 | #11 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
** 3RD NEWS UPDATE **
CMQ&B040 2.0 released! v2.0 Fixed a seldom occuring but serious bug with internal Smart buffer usage. - Nested call large block copies (WHEN MISALIGNED!) could corrupt each others data when sharing the same buffer. This fix uses a stack based buffer solution which results in a private buffer for each call. |
15 December 2020, 21:38 | #12 | |
Registered User
Join Date: Nov 2018
Location: Belfast
Posts: 1,542
|
Quote:
I have CopyMem060 on mine. |
|
15 December 2020, 22:42 | #13 | |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
Quote:
BTW, the so called "060 Optimized" CMQ patches really don't offer much of a performance difference from the 040 CMQ patches. Last edited by SpeedGeek; 15 December 2020 at 23:20. |
|
15 December 2020, 23:52 | #14 |
Registered User
Join Date: Nov 2018
Location: Belfast
Posts: 1,542
|
Well I'm not getting a fail code. It just boots up and then that's it. Also the version I got was from 09 and off Aminet, so I defo have an old version I think. No matter I just followed the guide that said stick it in where ever and then invoke in your startup after setpatch somewhere and make sure you type the command Run before hand. But I dunno what it's doing or what performance enhancement I'm getting.
You have to pardon my ignorance BTW Well actually you don't but please do ****edit**** ok so I've just realised your patch is a different thing entirely. Perhaps I should scrap the copymem then and install yours! I downloaded to try! Last edited by rabidgerry; 15 December 2020 at 23:57. |
13 February 2021, 15:16 | #15 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
** 4TH NEWS UPDATE **
CMQ&B 1.7 released! v1.7 Updated Big loop code with faster instructions. Increased Big loop copy size to 112 bytes. Replaced Small loop copy code with new JMP copy code for <= 108 bytes (See new testit results for 1.7). |
26 April 2021, 14:10 | #16 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
** 5TH NEWS UPDATE **
CMQ&B040 2.1 released! v2.1 Many changes - Fixed a rarely occurring stack size bug when the stack was word aligned and offset by one word from a 16 byte aligned address. - Added code to test for the Move16 address bug and safely exit upon detection - Added code to restrict Smart buffer copy usage when the destination address is in Chip RAM. - Added code to change the default Block size |
12 May 2021, 13:54 | #17 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
** 6TH NEWS UPDATE **
CMQ&B040 2.2 released! v2.2 minor change - Removed "Move16 Bug" detection code. This was a blunder due to Ax = Ay meaning the same registers rather than the same addresses. |
07 June 2021, 21:26 | #18 |
Moderator
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 846
|
** 7TH NEWS UPDATE **
CMQ&B040 2.3 released! v2.3 minor change - Changed address register longword math to word math for the Smart buffer copy loop. This is a small optimization but we always want the fastest possible results |
29 November 2023, 10:32 | #19 |
Registered User
Join Date: Sep 2019
Location: Finland
Posts: 373
|
Did/does anyone ever notice a real improvement from these CopyMem-improvement patches? Or maybe measure how many calls and what kind of parameters would be generated when using the OS for some ordinary tasks?
There were a lot of these patches, I also did one back in the day and was happy with myself. Whether it made any difference, that's another matter. |
29 November 2023, 10:37 | #20 | |
Senior Member
Join Date: Jun 2001
Location: Germany
Posts: 1,667
|
As it says in the description:
Quote:
I doubt you find much, if any software, that will be much fast with that patch compared to other similar patches. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
CopyMem Quick & Small released! | SpeedGeek | Coders. System | 12 | 04 July 2020 14:49 |
Out of this Little Big World - Little Big Planet | s2325 | Retrogaming General Discussion | 3 | 05 April 2015 05:09 |
Quick question | zerohour1974 | project.WHDLoad | 2 | 18 March 2015 22:14 |
Big Big Boxes | BinoX | Hardware pics | 6 | 27 July 2006 02:35 |
BIG BIG BIG WINUAE CRASH (with .dmp file included) | The Rom Alien | support.WinUAE | 4 | 31 August 2004 20:26 |
|
|