View Single Post
Old 04 October 2017, 16:30   #1
Registered User
SpeedGeek's Avatar
Join Date: Dec 2010
Location: Wisconsin USA
Age: 55
Posts: 470
FastCache040+ Released!

FastCache040+ 2.2 ©SpeedGeek 2018

FastCache040+ is a patch to replace the CachePreDMA() and
CachePostDMA() functions of most 68040/060 libraries. While
the old functions are adequate they are far from optimal.
These old functions have 2x more code then the new ones
provided with this patch!

Also, the new functions implement a much more efficient method
of managing the Copyback cache for DMA. While every system
will have some CPU performance loss under DMA conditions, the
new functions keep this performance loss to a bare minimum.

- Replaces CachePreDMA() and CachePostDMA() with smaller
and more efficient code
- Replaces complex MMU code with simple and fast DTTR code
- Temporarily changes Copyback mode to Write Through for DMA
(but only when required!). See MEMF_24BIT change for v1.3.
- Never flushes the ATC!
- Never flushes the DC for Chip RAM DMA!
- Uses 68040/060 library detection code
- Will not patch itself
- 100% Assembler code

- FastCache040+ 2.1 (NewFunc 174 bytes)
- 68060.library 46.7 (OldFunc 304 bytes)
- 68040.library 44.2 (OldFunc 414 bytes)

- Amiga with 68040 or 68060 CPU and MMU
- 68040.library or 68060.library

Do NOT use this patch with GigaMEM, VMM or any similar
virtual memory software! Do NOT use this patch with any
code which uses the MMU to write protect or remap modified
data structures!

Remapping a mirror image of the Kickstart ROM with the MMU
is OK! The new functions still have one thing in common with
the old functions. They do NOT translate virtual addresses
as specified in the Amiga RKRM! For more info on the old
functions see the by Michael Sinz.

FastCache040+ v1.7 has been removed. Phase5 68060.library
users should use FixMapP5 before using this patch.

v1.0 - First release
v1.1 - Fixed a bug which prevented the patch from installing
- Added code to use OldCachePreDMA for MEMF_24BIT
transfers (I don't know why errors occured here)
V1.2 - Added code to use OldCachePostDMA for MEMF_24BIT
transfers (So MMU Pages can be restored to original)
v1.3 - Added code to change MEMF_24BIT transfers to NoCache.
This eliminated all OldFunc calls. MEMF_24BIT
transfers may have some CPU performance loss but the
NewFunc code performance benefits should still justify
v1.4 - Removed MEMF_24BIT code from PreDMA/PostDMA for the
case of 16 byte aligned transfers. This will allow
some MEMF_24BIT transfers to be cache enabled!
v1.5 - Found an occasional Recoverable Alert bug which could
possibly result in a crash but only on 060 systems!
The simple fix was to move "CINVA NC" in PostDMA to the
end of the code.
- Removed the "+" character from the executable name due
to a unknown "Feature" of the Amiga Shell causing script
execution and version command problems.
v1.6 - Added code to PostDMA to Flush the cache conditionally
(if the Store buffer and cache are enabled). Added NOPs
to sync the pipelines before RTE (CINVA is now obsolete)
v1.6P5 Removed code to allow PostDMA cache Flush for the case
of 16 byte aligned transfers. Added code to skip PostDMA cache
Flush for the case of cache disabled MEMF_24BIT transfers.
v1.7 - Removed all v1.6P5 PostDMA cache flush code so users can run at full speed!
v1.8 - Reworked the code to eliminate a serious (but seldom
noticed) data transfer corruption bug for the case of multiple
DMA drivers in the same system. Special Thanks to
Ralph Babel for his excellent knowledge on this topic.
v1.9 - Fixed "D2 Register Not Preserved" coding bug in PreDMA.
Most DMA drivers don't seem to need it preserved but
Thanks to Cosmos for reporting it anyway. Moved PostDMA
Nest count code to user section of code. This eliminates
any calls to Supervisor when the count is more than 1.
v2.0 - Added code to enable only one DTTR when the Nest count
is one. Most systems have only one DMA driver and only need to
have 16MB of address space managed for this case.
Removed 1.9BR version which was over-rated due to most DMA
drivers operating at higher priority than typical user tasks.
v2.1 - Reworked the code to fix a problem with Snoopy 2.0
(Aminet). Sorry, this version no longer supports 16 byte aligned
cache enabled MEMF_24BIT transfers. NOTE: The original P5
library functions have problems with Snoopy too. I suppose
FastCache040+ 2.0 should remain available for the non-snoopers.
v2.2 - The Snoopy fix broke MEMF_24BIT transfers. So another
bug fix was required. Let's hope it's the last.
	MOVE.L  A0,D1
	ANDI.L  #$FFE00000,D1   ;Chip RAM
	BEQ.B	lbC00002A
	BTST	#3,D0		;ReadFromRam	
	BNE.B	lbC00002A
	LEA	Nest(PC),A1
	SUBQ.W  #1,(A1)
	BNE.B	lbC00002A		
	MOVE.L	A5,-(SP)			
	LEA	(lbC00004E,PC),A5
	JSR	(-$1E,A6)	;Call Supervisor
	MOVE.L	(SP)+,A5


	MOVEQ   #0,D1
	MOVEC	D1,DTT1		;Disable DTT1	    
	MOVEC   D1,DTT0		;Disable DTT0               

	MOVEM.L	A0/A5,-(SP)		
	MOVE.L  A0,D1
	ANDI.L  #$FFE00000,D1   ;Chip RAM
	BEQ.B   lbC000068
	ANDI.B	#$A,D0		;Continue or ReadFromRam
	BNE.B	lbC000060		
	LEA	(lbC000074,PC),A5
	BRA.B   lbC000064
	LEA	(lbC000084,PC),A5
	JSR	(-$1E,A6)	;Call Supervisor
	MOVEM.L	(SP)+,A0/A5
	MOVE.L  A0,D0

	LEA	Nest(PC),A1
	TST.W	(A1)
	BEQ.B   lbC000078
	MOVE.L  #$0000C040,D1	;NoCache mode + Serialized      		
	MOVEC	D1,DTT0		;Enable DTT0
	MOVE.L  A0,D1
	ANDI.L  #$FF000000,D1   ;MEMF_24BIT
	BEQ.B	lbC000082
	MOVE.L  #$00FFC000,D1 	;Cache WT mode + ignore FC
	MOVEC 	D1,DTT1		;Enable DTT1	
	BRA.B   lbC000082
 	MOVE.L  A0,D1
	ANDI.L  #$FF000000,D1   ;MEMF_24BIT
        BNE.B   lbC000080        
	ORI.B	#$40,D1 	;NoCache mode + Serialized
	ORI.W   #$C000,D1	;Cache WT mode + ignore FC
	ADDQ.W  #1,(A1)	
	BTST    #31,D1          ;Data cache enabled
	BEQ.B   lbC000090
	CPUSHA 	DC		;Flush dirty cache lines 
Nest:	DC.W	0
Attached Files
File Type: lha CACHEDMABENCH11.LHA (2.2 KB, 81 views)
File Type: lha FIXMAPP5_14.LHA (3.4 KB, 59 views)
File Type: lha FASTCACHE040+20.LHA (3.2 KB, 41 views)
File Type: lha FASTCACHE040+22.LHA (3.3 KB, 37 views)

Last edited by SpeedGeek; 24 December 2018 at 14:38.
SpeedGeek is offline  
Page generated in 0.04229 seconds with 11 queries