English Amiga Board Amiga Lore


Go Back   English Amiga Board > Coders > Coders. System

 
 
Thread Tools
Old 04 October 2017, 16:30   #1
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
FastCache040+ Released!

FastCache040+ 1.4 ©SpeedGeek 2017

INTRODUCTION:
FastCache040+ is a patch to replace the CachePreDMA() and
CachePostDMA() functions of most 68040/060 libraries. While
the old functions are adequate they are far from optimal.
These old functions have 3x more code then the new ones
provided with this patch!

Also, the new functions implement a much more efficient method
of managing the Copyback cache for DMA. While every system
will have some CPU performance loss under DMA conditions, the
new functions keep this performance loss to a bare minimum.

FEATURES:
- Replaces CachePreDMA() and CachePostDMA() with smaller
and more efficient code
- Replaces complex MMU code with simple and fast DTTR code
- Temporarily changes Copyback mode to Write Through for DMA
(but only when required!). See MEMF_24BIT change for v1.3.
- Never flushes the ATC!
- Never flushes the DC for Chip RAM DMA!
- Uses 68040/060 library detection code
- Will not patch itself
- 100% Assembler code

CODE SIZE COMPARISONS:
- FastCache040+ 1.4 (NewFunc 140 bytes)
- 68060.library 46.7 (OldFunc 304 bytes)
- 68040.library 44.2 (OldFunc 414 bytes)

REQUIREMENTS:
- Amiga with 68040 or 68060 CPU and MMU
- 68040.library or 68060.library

WARNING:
Do NOT use this patch with GigaMEM, VMM or any similar
virtual memory software! Do NOT use this patch with any
code which uses the MMU to write protect or remap modified
data structures!

NOTES:
Remapping a mirror image of the Kickstart ROM with the MMU
is OK! The new functions still have one thing in common with
the old functions. They do NOT translate virtual addresses
as specified in the Amiga RKRM! For more info on the old
functions see the Enforcer.guide by Michael Sinz.

HISTORY:
v1.0 - First release
v1.1 - Fixed a bug which prevented the patch from installing
- Added code to use OldCachePreDMA for MEMF_24BIT
transfers (I don't know why errors occured here)
V1.2 - Added code to use OldCachePostDMA for MEMF_24BIT
transfers (So MMU Pages can be restored to original)
v1.3 - Added code to change MEMF_24BIT transfers to NoCache.
This eliminated all OldFunc calls. MEMF_24BIT
transfers may have some CPU performance loss but the
NewFunc code performance benefits should still justify
this.
v1.4 - Removed MEMF_24BIT code from PreDMA/PostDMA for the
case of 16 byte aligned transfers. This will allow
some MEMF_24BIT transfers to be cache enabled!
Code:
CachePostDMA:	
	MOVE.L  A0,D1
	ANDI.L  #$FFE00000,D1   ;Chip RAM
	BEQ.B	lbC00002A
	BTST	#3,D0		;ReadFromRam	
	BNE.B	lbC00002A
	MOVE.L  A0,D1
	OR.L    (A1),D1
	ANDI.B  #15,D1		;16 byte aligned
	BEQ.B	lbC00002A
lbC00001A	
	MOVE.L	A5,-(SP)			
	LEA	(lbC00004E,PC),A5
	JSR	(-$1E,A6)	;Call Supervisor
	MOVE.L	(SP)+,A5

lbC00002A
	RTS

lbC00004E	
        CINVA   NC       	;Support 060, 040 not sure?
	MOVEQ   #0,D1    
	MOVEC   D1,DTT0		;Disable DTT0
	RTE  

CachePreDMA:
	MOVEM.L	A0/A5,-(SP)		
	MOVE.L  A0,D1
	ANDI.L  #$FFE00000,D1   ;Chip RAM
	BEQ.B   lbC000068
	ANDI.B	#$A,D0		;Continue or ReadFromRam
	BNE.B	lbC000060
	MOVE.L  A0,D1
	OR.L    (A1),D1
	ANDI.B  #15,D1		;16 byte aligned
	BEQ.B	lbC000060
lbC000054		
	LEA	(lbC000074,PC),A5
	BRA.B   lbC000064
lbC000060
	LEA	(lbC000084,PC),A5
lbC000064	 
	JSR	(-$1E,A6)	;Call Supervisor
lbC000068
	MOVEM.L	(SP)+,A0/A5
	MOVE.L  A0,D0
	RTS

lbC000074
 	MOVE.L  A0,D1
	ANDI.L  #$FF000000,D1   ;MEMF_24BIT
	BNE.B	lbC000080
	ORI.B   #$40,D1		;NoCache mode + Serialized
lbC000080	       	
	ORI.W   #$8000,D1 	;Cache WT mode + User FC	
	MOVEC 	D1,DTT0		;Enable DTTO
	 
lbC000084
	MOVEC   CACR,D2
	BTST    #31,D2          ;Data cache enabled
	BEQ.B   lbC000090
	CPUSHA 	DC		;Flush dirty cache lines 
lbC000090
	RTE
Attached Files
File Type: lha FASTCACHE040+12.LHA (2.2 KB, 20 views)
File Type: lha FASTCACHE040+14.LHA (2.4 KB, 5 views)

Last edited by SpeedGeek; 15 October 2017 at 21:06.
SpeedGeek is offline  
AdSense AdSense  
Old 04 October 2017, 17:57   #2
a/b
Registered User

 
Join Date: Jun 2016
Location: europe
Posts: 56
A couple of suggestions for CachePreDMA (2 bytes shorter and one less branch):
Code:
CachePreDMA:
    MOVEM.L    A0/A5,-(SP)        
    MOVE.L  A0,D1
    ANDI.L  #$FFE00000,D1   ;Chip RAM
    BEQ.B   lbC000068
 LEA    (lbC000084,PC),A5 ; moved from below
    ANDI.B    #$A,D0        ;Continue or ReadFromRam
    BNE.B    lbC000060
    MOVE.L    A0,D1
    OR.L    (A1),D1
    ANDI.B    #15,D1        ;16 Byte aligned
    BEQ.B    lbC000060
    LEA    (lbC000074,PC),A5
; or alternatively (faster on 040, don't know about 060; maybe add.w works better overall):
; LEA    (lbC000074-lbC000084,A5),A5

;;    BRA.B   lbC000064
lbC000060
;;    LEA    (lbC000084,PC),A5
;;lbC000064         
    JSR    (-$1E,A6)    ;Call Supervisor
lbC000068
a/b is offline  
Old 05 October 2017, 17:20   #3
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
Quote:
Originally Posted by a/b View Post
A couple of suggestions for CachePreDMA (2 bytes shorter and one less branch):
Thanks for the suggestion but saving 2 bytes of code does not result in faster execution in this case. BRA.B is faster than LEA for both 040 and 060 (but for 060 it's even faster with branch prediction).

If you want to make a patch with your suggestion that's OK with me. This patch code obtains most of it's performance benefit from more efficient cache management so small changes in code size or execution speed won't make much of a performance difference anyway.

Last edited by SpeedGeek; 05 October 2017 at 18:21.
SpeedGeek is offline  
Old 05 October 2017, 18:01   #4
kgc210
Registered User

 
Join Date: Jun 2016
Location: Stoke-On-Trent, England
Posts: 142
SpeedGeek

What sort of real life situations would benefit from this patch?
Or does it speed up all uses of the CPU?
kgc210 is online now  
Old 05 October 2017, 18:36   #5
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
Quote:
Originally Posted by kgc210 View Post
SpeedGeek

What sort of real life situations would benefit from this patch?
Or does it speed up all uses of the CPU?
Any situation where a DMA controller transfers data to Fast RAM. Also, for Chip RAM when the driver doesn't handle the case that Chip RAM is non-cache-able memory (because it expects these old functions to handle it for them).

There are a few benchmark programs (e.g. RSCP, DiskSpeed) which test "CPU Availability" for SCSI DMA transfers. Unfortunately, they pre-date the 68040 CPU and really don't provide any reliable results here.

Last edited by SpeedGeek; 05 October 2017 at 18:50.
SpeedGeek is offline  
Old 05 October 2017, 18:53   #6
kgc210
Registered User

 
Join Date: Jun 2016
Location: Stoke-On-Trent, England
Posts: 142
Ahh ok thanks for the explanation.
Do you think it would help on the Warpengine and A4000T or are their drivers optimised anyway?
kgc210 is online now  
Old 06 October 2017, 03:17   #7
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
** NEWS UPDATE **

Sorry, there was a bug in v1.0 with the patch install code.

v1.1 - Fixed a bug which prevented the patch from installing
- Added code to use OldCachePreDMA for MEMF_24BIT
transfers (I don't know why errors occured here)

Last edited by SpeedGeek; 06 October 2017 at 08:20.
SpeedGeek is offline  
Old 06 October 2017, 15:11   #8
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
** 2ND NEWS UPDATE **

v1.2 released (updated patch size info)
- Added code to use OldCachePostDMA for MEMF_24BIT
transfers (So MMU Pages can be restored to original)
SpeedGeek is offline  
Old 06 October 2017, 15:15   #9
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
OK, I believe I have found a solution to the MEMF_24BIT transfer
error problem without OldPre/OldPost calls. Unfortunately, the cache mode would have to be changed to NoCache.

This would make the NewFunc code a little smaller but could reduce CPU performance a little for MEMF_24BIT transfers.

So it's a trade off situation... will give it some more thought!

Last edited by SpeedGeek; 06 October 2017 at 17:35.
SpeedGeek is offline  
Old 10 October 2017, 22:09   #10
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
** 3RD NEWS UPDATE **

v1.3 Released!
- Added code to change MEMF_24BIT transfers to NoCache.
This eliminated all OldFunc calls. MEMF_24BIT transfers may have
some CPU performance loss but the NewFunc code performance
benefits should still justify this.

NOTES: v1.2 will still be available for download for users if they
believe using OldFunc calls is still justified. The v1.2 NewFuncSrc
for lbC00004E should read as follows:
CINVA NC ;Support 060, 040 not sure?

EDIT:
v1.4 Released!
- Removed MEMF_24BIT code from PreDMA/PostDMA for the
case of 16 byte aligned transfers. This will allow
some MEMF_24BIT transfers to be cache enabled!

EDIT2:
The v1.4 NewFuncSrc for lbC000080 should read as follows:
ORI.W #$8000,D1 ;Cache WT mode + User FC

Last edited by SpeedGeek; 13 October 2017 at 17:29.
SpeedGeek is offline  
Old 14 October 2017, 14:35   #11
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
Ok guys, now it's your turn to post your compatibility results!

Please provide information on 68040.library or 68060.library vendor and version. Also, accelerator card type and vendor is requested too. Thank you!
SpeedGeek is offline  
Old 14 October 2017, 17:08   #12
daxb
Registered User
 
Join Date: Oct 2009
Location: Germany
Posts: 1,725
How can we test compatibility? Is there a benchmark tool or similar to see the benefits?
daxb is offline  
Old 14 October 2017, 18:03   #13
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
Quote:
Originally Posted by daxb View Post
How can we test compatibility? Is there a benchmark tool or similar to see the benefits?
EDIT: See post #16 for benchmark info.

I have already tested 68040.library 44.2 (H&P) with an A3640 and 68060.library 46.7 (Phase5) with an A3660. However, these libraries may configure themselves differently on other systems. Also, there are 3rd party libraries (e.g. GVP, Apollo, etc.) which should be tested as well.

Last edited by SpeedGeek; 16 October 2017 at 06:04.
SpeedGeek is offline  
Old 14 October 2017, 19:11   #14
daxb
Registered User
 
Join Date: Oct 2009
Location: Germany
Posts: 1,725
Quote:
Originally Posted by SpeedGeek View Post
I have already tested 68040.library 44.2 (H&P)...
How?
daxb is offline  
Old 15 October 2017, 13:06   #15
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
Quote:
Originally Posted by daxb View Post
How?
Simply install the patch, use your system normally and look for any DMA transfer errors.

I went a little further than that. I made an LHA loop script which extracts 12MB of archives to the RAM disk. I installed a 2MB Zorro2 memory board for MEMF_24BIT testing. I have another script which changes the priority of the Zorro2 memory so the archive files extract there first. I loaded programs which open a screen in Chip RAM to test Chip RAM DMA, but loading icons on the Workbench screen does the same thing unless you are using RTG.
SpeedGeek is offline  
Old 15 October 2017, 18:52   #16
SpeedGeek
Registered User
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 54
Posts: 342
** 4TH NEWS UPDATE **

The was another stupid version bug in v1.4 which has now been fixed (It was a just a fully functional v1.4 reporting itself as v1.3).

I now have a simple benchmark tool called "CacheDMAmips" (see attached image). I will probably release it when I am satisfied with the compatibility results.
Attached Thumbnails
Click image for larger version

Name:	FASTCACHE040+.PNG
Views:	10
Size:	10.7 KB
ID:	55053  

Last edited by SpeedGeek; 16 October 2017 at 05:52.
SpeedGeek is offline  
AdSense AdSense  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
WinUAE 2.3.3 released Toni Wilen News 26 18 November 2011 23:01
WHDLoad 17.0 released! Bamiga2002 News 28 16 September 2011 18:47
Never released??? tomcat666 project.aGTW 18 18 January 2010 14:44
16.6 Released alexh project.WHDLoad 6 09 June 2006 10:02
WinUAE 1.1 released... Joe Maroni News 18 05 October 2005 16:28

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 06:56.


Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Page generated in 0.18320 seconds with 14 queries