FastCache040+ 2.4 ©SpeedGeek 2020
INTRODUCTION:
FastCache040+ is a patch to replace the CachePreDMA() and
CachePostDMA() functions of most 68040/060 libraries. While
the old functions are adequate they are far from optimal.
These old functions have 2x more code then the new ones
provided with this patch!
Also, the new functions implement a much more efficient method
of managing the Copyback cache for DMA. While every system
will have some CPU performance loss under DMA conditions, the
new functions keep this performance loss to a bare minimum.
FEATURES:
- Replaces CachePreDMA() and CachePostDMA() with smaller
and more efficient code
- Replaces complex MMU code with simple and fast DTTR code
- Temporarily changes Copyback mode to Write Through for DMA
(but only when required!).
- Never flushes the ATC!
- Never flushes the DC for Chip RAM DMA!
- Uses 68040/060 library detection code
- Will not patch itself
- 100% Assembler code
CODE SIZE COMPARISONS:
- FastCache040+ 2.4 (NewFunc 190 bytes)
- 68060.library 46.7 (OldFunc 304 bytes)
- 68040.library 44.2 (OldFunc 414 bytes)
REQUIREMENTS:
- Amiga with 68040 or 68060 CPU and MMU
- 68040.library or 68060.library
WARNING:
Do NOT use this patch with GigaMEM, VMM or any similar
virtual memory software! Do NOT use this patch with any
code which uses the MMU to write protect or remap modified
data structures!
NOTES:
Remapping a mirror image of the Kickstart ROM with the MMU
is OK! The new functions still have one thing in common with
the old functions. They do NOT translate virtual addresses
as specified in the Amiga RKRM! For more info on the old
functions see the Enforcer.guide by Michael Sinz.
UPDATE:
FastCache040+ v1.7 has been removed. Phase5 68060.library
users can optionally use FixMapP5.
HISTORY:
(Pre 2.0 history deleted)
v2.0 - Added code to enable only one DTTR when the Nest count
is one. Most systems have only one DMA driver and only need to
have 16MB of address space managed for this case.
Removed 1.9BR version which was over-rated due to most DMA
drivers operating at higher priority than typical user tasks.
v2.1 - Reworked the code to fix a problem with Snoopy 2.0
(Aminet). Sorry, this version no longer supports 16 byte aligned
cache enabled MEMF_24BIT transfers. NOTE: The original P5
library functions have problems with Snoopy too.
v2.2 - The Snoopy fix broke MEMF_24BIT transfers. So another
bug fix was required. Let's hope it's the last.
v2.3 - The 16 byte alignment code is back and now avoids the
change of cache mode for this specific case. Removed
Continue case from PreDMA since the expected results are
the same as the Non-Continue case. The cache disable test
code was removed to save the overhead of this very
uncommon case.
v2.4 - Reworked PostDMA code to fix Nested call cache flush bugs.
We really don't want to forget about systems with multiple
DMA drivers do we?
Code:
CachePostDMA:
MOVE.L A0,D1
ANDI.L #$FFE00000,D1 ;Chip RAM
BEQ.B lbC00002A
BTST #3,D0 ;ReadFromRam
BNE.B lbC00002A
MOVE.L A5,-(SP)
MOVE.L A0,D1
OR.L (A1),D1
ANDI.B #15,D1 ;16 byte aligned
BEQ.B lbC000020
LEA Nest(PC),A1
SUBQ.W #1,(A1)
BEQ.B lbC000024
lbC000020
LEA (lbC000050,PC),A5
BRA.B lbC000028
lbC000024
LEA (lbC00004E,PC),A5
lbC000028
JSR (-$1E,A6) ;Call Supervisor
MOVE.L (SP)+,A5
lbC00002A
RTS
lbC00004E
MOVEQ #0,D1
MOVEC D1,DTT1 ;Disable DTT1
MOVEC D1,DTT0 ;Disable DTT0
lbC000050
CPUSHA DC
RTE
CachePreDMA:
MOVEM.L A0/A5,-(SP)
MOVE.L A0,D1
ANDI.L #$FFE00000,D1 ;Chip RAM
BEQ.B lbC000068
BTST #3,D0 ;ReadFromRam
BNE.B lbC000068
MOVE.L A0,D1
OR.L (A1),D1
ANDI.B #15,D1 ;16 byte aligned
BEQ.B lbC000060
LEA (lbC000074,PC),A5
BRA.B lbC000064
lbC000060
LEA (lbC000084,PC),A5
lbC000064
JSR (-$1E,A6) ;Call Supervisor
lbC000068
MOVEM.L (SP)+,A0/A5
MOVE.L A0,D0
RTS
lbC000074
LEA Nest(PC),A1
TST.W (A1)
BEQ.B lbC000078
MOVE.L #$0000C040,D1 ;NoCache mode + Serialized
MOVEC D1,DTT0 ;Enable DTT0
MOVE.L A0,D1
ANDI.L #$FF000000,D1 ;MEMF_24BIT
BEQ.B lbC000082
MOVE.L #$00FFC000,D1 ;Cache WT mode + ignore FC
MOVEC D1,DTT1 ;Enable DTT1
BRA.B lbC000082
lbC000078
MOVE.L A0,D1
ANDI.L #$FF000000,D1 ;MEMF_24BIT
BNE.B lbC000080
ORI.B #$40,D1 ;NoCache mode + Serialized
lbC000080
ORI.W #$C000,D1 ;Cache WT mode + ignore FC
MOVEC D1,DTT0
lbC000082
ADDQ.W #1,(A1)
lbC000084
CPUSHA DC ;Flush dirty cache lines
RTE
Nest: DC.W 0